* [PATCH v7 00/50] powerpc/powernv: PCI hotplug support
@ 2015-11-04 13:12 Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 01/50] PCI: Add pcibios_setup_bridge() Gavin Shan
                   ` (50 more replies)
  0 siblings, 51 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This series of patches rebases on powerpc/next branch, plus below additional
patches:
   https://patchwork.ozlabs.org/patch/534804/   (PATCH[1/1] Andrew's EEH fix)
   https://patchwork.ozlabs.org/patch/534154/   (PATCH[7/7] Richard's SRIOV Rework)
   commit 3b0e21e Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.
The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 27] are related to this part.
>From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. Conversely,
the device-tree node should be removed from the system when the PCI adapter is going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[28 - 43] are
doing the related work.
The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[44 - 49].
The last one, PATCH[50], is the standalone PCI hotplug driver for PowerPC PowerNV
platform. 
Changelog
=========
v7:
   * Reworked revision to some extent.
   * Rebased to powerpc/next repository.
   * Reorder/split/merge/drop according - Alexey.
   * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
   * Merged 3 files to one for the hotplug driver - Alexey.
   * As part of OPAL API, defined macros for PCI slot power state, hotplug
     message type. Defined macros for PCI slot power confirmed state in
     hotplug driver.
   * Misc comments from Alexey.
   * Reworked unflatten_dt_node() to avoid recursive function calls.
   * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
v6:
   * Patch reorder, split, squash - Alexey.
   * Minor coding style - Alexey.
   * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
   * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
   * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
   * Replace overlay with of_changeset - Grant
v5:
   * Rebased to 4.1.rc6 and some unmerged patches as below:
     Alexey's DDW patchset (v11);
     Gavin's EEH error injection support (in mpe's next branch);
     Richard's EEH cleanup patches (in mpe's next branch);
     Richard's EEH support for VF (v7);
     Gavin's misc EEH fixes for 4.2;
   * The revision bases on skiboot corresponding patches (v7):
     https://patchwork.ozlabs.org/patch/480437/
   * Utilize OF overlay to update device-tree with help of newly introduced
     OPAL API opal_get_overlay_dt().
   * Split patches for easy review according to aik's comments.
   * Fix coding style from checkpatchc.pl as pointed by aik.
   * Code cleanup and misc fixup according to aik's input.
v4:
   * Rebased to 4.1.RC1
   * Added API to unflatten FDT blob to device node sub-tree, which is attached
     the indicated parent device node. The original mechanism based on formatted
     string stream has been dropped.
   * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
     was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
     Support" depends on that.
v3:
   * Rebased to 4.1.RC0
   * PowerNV PCI infrasturcture is total refactored in order to support PCI
     hotplug. The PowerNV hotplug driver is also reworked a lot because of
     the changes in skiboot in order to support PCI hotplug.
Gavin Shan (50):
  PCI: Add pcibios_setup_bridge()
  powerpc/pci: Override pcibios_setup_bridge()
  powerpc/pci: Cleanup on struct pci_controller_ops
  powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
  powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  powerpc/powernv: Drop phb->bdfn_to_pe()
  powerpc/powernv: Reorder fields in struct pnv_phb
  powerpc/powernv: Rename PE# fields in struct pnv_phb
  powerpc/powernv: Fix initial IO and M32 segmap
  powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  powerpc/powernv: IO and M32 mapping based on PCI device resources
  powerpc/powernv: Track M64 segment consumption
  powerpc/powernv: Rename M64 related functions
  powerpc/powernv: M64 support on P7IOC
  powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe()
  powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE
  powerpc/powernv: Avoid calculating DMA32 segments on PHB3
  powerpc/powernv: Remove DMA32 PE list
  powerpc/powernv: Track DMA32 segment consumption
  powerpc/powernv: Improve DMA32 segment calculation
  powerpc/powernv: Increase PE# capacity
  powerpc/powernv: Introduce pnv_ioda_init_pe()
  powerpc/powernv: Use PE instead of number during setup and release
  powerpc/powernv: Allocate PE# in reverse order
  powerpc/powernv: Reserve PE for root bus
  powerpc/powernv: Create PEs at PCI hot plugging time
  powerpc/powernv: Dynamically release PEs
  powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  powerpc/pci: Rename pcibios_find_pci_bus()
  powerpc/pci: Move pci_find_bus_by_node() around
  powerpc/pci: Export pci_add_device_node_info()
  powerpc/pci: Introduce pci_remove_device_node_info()
  powerpc/pci: Export pci_traverse_device_nodes()
  powerpc/pci: Delay populating pdn
  powerpc/pci: Don't scan empty slot
  powerpc/pci: Update bridge windows on PCI plug
  powerpc/powernv: Simplify pnv_eeh_reset()
  powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  powerpc/powernv: Support PCI slot ID
  powerpc/powernv: Use firmware PCI slot reset infrastructure
  powerpc/powernv: Functions to get/set PCI slot status
  powerpc/powernv: Select OF_DYNAMIC
  drivers/of: Split unflatten_dt_node()
  drivers/of: Avoid recursively calling unflatten_dt_node()
  drivers/of: Rename unflatten_dt_node()
  drivers/of: Specify parent node in of_fdt_unflatten_tree()
  drivers/of: Return allocated memory from of_fdt_unflatten_tree()
  drivers/of: Export OF changeset functions
  PCI/hotplug: PowerPC PowerNV PCI hotplug driver
 MAINTAINERS                                    |    6 +
 arch/powerpc/include/asm/eeh.h                 |    2 +-
 arch/powerpc/include/asm/opal-api.h            |   17 +-
 arch/powerpc/include/asm/opal.h                |    8 +-
 arch/powerpc/include/asm/pci-bridge.h          |   25 +-
 arch/powerpc/include/asm/pnv-pci.h             |    7 +
 arch/powerpc/include/asm/ppc-pci.h             |    8 +-
 arch/powerpc/kernel/eeh_dev.c                  |   19 +-
 arch/powerpc/kernel/eeh_driver.c               |   12 +-
 arch/powerpc/kernel/pci-common.c               |   16 +-
 arch/powerpc/kernel/pci-hotplug.c              |   47 +-
 arch/powerpc/kernel/pci_dn.c                   |   85 +-
 arch/powerpc/platforms/maple/pci.c             |   34 +-
 arch/powerpc/platforms/pasemi/pci.c            |    3 -
 arch/powerpc/platforms/powermac/pci.c          |   38 +-
 arch/powerpc/platforms/powernv/Kconfig         |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c   |  173 ++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 1251 +++++++++++++++---------
 arch/powerpc/platforms/powernv/pci.c           |   92 +-
 arch/powerpc/platforms/powernv/pci.h           |   62 +-
 arch/powerpc/platforms/pseries/msi.c           |    4 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
 arch/powerpc/platforms/pseries/setup.c         |    8 +-
 drivers/of/dynamic.c                           |   65 +-
 drivers/of/fdt.c                               |  378 ++++---
 drivers/of/of_private.h                        |    2 +
 drivers/of/overlay.c                           |    8 +-
 drivers/of/unittest.c                          |    6 +-
 drivers/pci/hotplug/Kconfig                    |   12 +
 drivers/pci/hotplug/Makefile                   |    3 +
 drivers/pci/hotplug/pnv_php.c                  |  866 ++++++++++++++++
 drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
 drivers/pci/hotplug/rpaphp_core.c              |    4 +-
 drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
 drivers/pci/setup-bus.c                        |    5 +
 include/linux/of_fdt.h                         |    5 +-
 include/linux/pci.h                            |    1 +
 38 files changed, 2389 insertions(+), 932 deletions(-)
 create mode 100644 drivers/pci/hotplug/pnv_php.c
-- 
2.1.0
^ permalink raw reply	[flat|nested] 157+ messages in thread
* [PATCH v7 01/50] PCI: Add pcibios_setup_bridge()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                   ` (49 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
which is called for once after PCI probing and resource assignment
are completed, to allocate platform required resources for PCI devices:
PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
Obviously, it's not hotplug friendly.
This adds weak function pcibios_setup_bridge(), which is called by
pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
to assign above platform required resources to newly added PCI devices,
in order to support PCI hotplug in subsequent patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-bus.c | 5 +++++
 include/linux/pci.h     | 1 +
 2 files changed, 6 insertions(+)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 508cc56..a69eae1 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -696,11 +696,16 @@ static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
+void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+}
+
 void pci_setup_bridge(struct pci_bus *bus)
 {
 	unsigned long type = IORESOURCE_IO | IORESOURCE_MEM |
 				  IORESOURCE_PREFETCH;
 
+	pcibios_setup_bridge(bus, type);
 	__pci_setup_bridge(bus, type);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..41343bb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -831,6 +831,7 @@ void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
 void pci_stop_root_bus(struct pci_bus *bus);
 void pci_remove_root_bus(struct pci_bus *bus);
 void pci_setup_cardbus(struct pci_bus *bus);
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == &pci_bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 01/50] PCI: Add pcibios_setup_bridge() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-05 22:27   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
                   ` (48 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This overrides pcibios_setup_bridge() that is called to update PCI
bridge windows when PCI resource assignment is completed, to assign
PE and setup various (resource) mapping for the PE in subsequent
patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h | 2 ++
 arch/powerpc/kernel/pci-common.c      | 8 ++++++++
 2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 843dd3a2..6076116 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -33,6 +33,8 @@ struct pci_controller_ops {
 
 	/* Called during PCI resource reassignment */
 	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	void		(*setup_bridge)(struct pci_bus *bus,
+					unsigned long type);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
 
 #ifdef CONFIG_PCI_MSI
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..40df3a5 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -124,6 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 	return 1;
 }
 
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+
+	if (hose->controller_ops.setup_bridge)
+		hose->controller_ops.setup_bridge(bus, type);
+}
+
 void pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *phb = pci_bus_to_host(dev->bus);
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 01/50] PCI: Add pcibios_setup_bridge() Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-05 22:32   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops Gavin Shan
                   ` (47 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Each PHB has one instance of "struct pci_controller_ops", which
includes various callbacks called by PCI subsystem. In the definition
of this struct, some callbacks have explicit names for its arguments,
but the left don't have.
This adds all explicit names of the arguments to the callbacks in
"struct pci_controller_ops" so that the code looks consistent.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 6076116..0f2ff3a 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -21,18 +21,19 @@ struct pci_controller_ops {
 	void		(*dma_dev_setup)(struct pci_dev *dev);
 	void		(*dma_bus_setup)(struct pci_bus *bus);
 
-	int		(*probe_mode)(struct pci_bus *);
+	int		(*probe_mode)(struct pci_bus *bus);
 
 	/* Called when pci_enable_device() is called. Returns true to
 	 * allow assignment/enabling of the device. */
-	bool		(*enable_device_hook)(struct pci_dev *);
+	bool		(*enable_device_hook)(struct pci_dev *dev);
 
-	void		(*disable_device)(struct pci_dev *);
+	void		(*disable_device)(struct pci_dev *dev);
 
-	void		(*release_device)(struct pci_dev *);
+	void		(*release_device)(struct pci_dev *dev);
 
 	/* Called during PCI resource reassignment */
-	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	resource_size_t (*window_alignment)(struct pci_bus *bus,
+					    unsigned long type);
 	void		(*setup_bridge)(struct pci_bus *bus,
 					unsigned long type);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
@@ -46,7 +47,7 @@ struct pci_controller_ops {
 	int             (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
 	u64		(*dma_get_required_mask)(struct pci_dev *dev);
 
-	void		(*shutdown)(struct pci_controller *);
+	void		(*shutdown)(struct pci_controller *hose);
 };
 
 /*
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (2 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-05 22:28   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 05/50] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
                   ` (46 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This cleans up on pnv_pci_ioda_controller_ops struct to use tab
instead of space indent of statement to avoid complains from
scripts/checkpatch.pl. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2e2bedb..aa3645c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3064,17 +3064,17 @@ static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 }
 
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
-       .dma_dev_setup = pnv_pci_dma_dev_setup,
+	.dma_dev_setup		= pnv_pci_dma_dev_setup,
 #ifdef CONFIG_PCI_MSI
-       .setup_msi_irqs = pnv_setup_msi_irqs,
-       .teardown_msi_irqs = pnv_teardown_msi_irqs,
+	.setup_msi_irqs		= pnv_setup_msi_irqs,
+	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 #endif
-       .enable_device_hook = pnv_pci_enable_device_hook,
-       .window_alignment = pnv_pci_window_alignment,
-       .reset_secondary_bus = pnv_pci_reset_secondary_bus,
-       .dma_set_mask = pnv_pci_ioda_dma_set_mask,
-       .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
-       .shutdown = pnv_pci_ioda_shutdown,
+	.enable_device_hook	= pnv_pci_enable_device_hook,
+	.window_alignment	= pnv_pci_window_alignment,
+	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
+	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
+	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
+	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 05/50] powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (3 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 06/50] powerpc/powernv: Drop phb->bdfn_to_pe() Gavin Shan
                   ` (45 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Nobody uses this function and this just drops it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 71 -------------------------------
 1 file changed, 71 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index aa3645c..74288ab 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -918,77 +918,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 }
 #endif /* CONFIG_PCI_IOV */
 
-#if 0
-static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct pnv_ioda_pe *pe;
-	int pe_num;
-
-	if (!pdn) {
-		pr_err("%s: Device tree node not associated properly\n",
-			   pci_name(dev));
-		return NULL;
-	}
-	if (pdn->pe_number != IODA_INVALID_PE)
-		return NULL;
-
-	/* PE#0 has been pre-set */
-	if (dev->bus->number == 0)
-		pe_num = 0;
-	else
-		pe_num = pnv_ioda_alloc_pe(phb);
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available, disabling device\n",
-			   pci_name(dev));
-		return NULL;
-	}
-
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
-	 *
-	 * At some point we want to remove the PDN completely anyways
-	 */
-	pe = &phb->ioda.pe_array[pe_num];
-	pci_dev_get(dev);
-	pdn->pcidev = dev;
-	pdn->pe_number = pe_num;
-	pe->pdev = dev;
-	pe->pbus = NULL;
-	pe->tce32_seg = -1;
-	pe->mve_number = -1;
-	pe->rid = dev->bus->number << 8 | pdn->devfn;
-
-	pe_info(pe, "Associated device to PE\n");
-
-	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
-		pdn->pe_number = IODA_INVALID_PE;
-		pe->pdev = NULL;
-		pci_dev_put(dev);
-		return NULL;
-	}
-
-	/* Assign a DMA weight to the device */
-	pe->dma_weight = pnv_ioda_dma_weight(dev);
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
-
-	return pe;
-}
-#endif /* Useful for SRIOV case */
-
 static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *dev;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 06/50] powerpc/powernv: Drop phb->bdfn_to_pe()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (4 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 05/50] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 07/50] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
                   ` (44 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This drops struct pnv_phb::bdfn_to_pe() as nobody uses it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 ---------
 arch/powerpc/platforms/powernv/pci.h      | 1 -
 2 files changed, 10 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 74288ab..968da91 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2978,12 +2978,6 @@ static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 	return true;
 }
 
-static u32 pnv_ioda_bdfn_to_pe(struct pnv_phb *phb, struct pci_bus *bus,
-			       u32 devfn)
-{
-	return phb->ioda.pe_rmap[(bus->number << 8) | devfn];
-}
-
 static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -3144,9 +3138,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->freeze_pe = pnv_ioda_freeze_pe;
 	phb->unfreeze_pe = pnv_ioda_unfreeze_pe;
 
-	/* Setup RID -> PE mapping function */
-	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
-
 	/* Setup TCEs */
 	phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index c8ff50e..49e3baf 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -106,7 +106,6 @@ struct pnv_phb {
 			 unsigned int is_64, struct msi_msg *msg);
 	void (*dma_dev_setup)(struct pnv_phb *phb, struct pci_dev *pdev);
 	void (*fixup_phb)(struct pci_controller *hose);
-	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pci_bus *bus,
 			       unsigned long *pe_bitmap, bool all);
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 07/50] powerpc/powernv: Reorder fields in struct pnv_phb
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (5 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 06/50] powerpc/powernv: Drop phb->bdfn_to_pe() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 08/50] powerpc/powernv: Rename PE# " Gavin Shan
                   ` (43 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This moves those fields in struct pnv_phb that are related to PE
allocation around. No logical change.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 49e3baf..d655769 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -142,15 +142,14 @@ struct pnv_phb {
 			unsigned int		io_segsize;
 			unsigned int		io_pci_base;
 
-			/* PE allocation bitmap */
-			unsigned long		*pe_alloc;
-			/* PE allocation mutex */
+			/* PE allocation */
 			struct mutex		pe_alloc_mutex;
+			unsigned long		*pe_alloc;
+			struct pnv_ioda_pe	*pe_array;
 
 			/* M32 & IO segment maps */
 			unsigned int		*m32_segmap;
 			unsigned int		*io_segmap;
-			struct pnv_ioda_pe	*pe_array;
 
 			/* IRQ chip */
 			int			irq_chip_init;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 08/50] powerpc/powernv: Rename PE# fields in struct pnv_phb
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (6 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 07/50] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-16  8:01   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 09/50] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
                   ` (42 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames the fields related to PE number in "struct pnv_phb"
for better reflecting of their usages as Alexey suggested. No
logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c    | 56 ++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.c         |  2 +-
 arch/powerpc/platforms/powernv/pci.h         |  4 +-
 4 files changed, 32 insertions(+), 32 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e1c9072..861a7d2 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
 		 * and P7IOC separately. So we should regard
 		 * PE#0 as valid for PHB3 and P7IOC.
 		 */
-		if (phb->ioda.reserved_pe != 0)
+		if (phb->ioda.reserved_pe_idx != 0)
 			eeh_add_flag(EEH_VALID_PE_ZERO);
 
 		break;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 968da91..b4932c3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -134,7 +134,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 
 static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
-	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
 		pr_warn("%s: Invalid PE %d on PHB#%x\n",
 			__func__, pe_no, phb->hose->global_number);
 		return;
@@ -154,8 +154,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 
 	do {
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe, 0);
-		if (pe >= phb->ioda.total_pe)
+					phb->ioda.total_pe_num, 0);
+		if (pe >= phb->ioda.total_pe_num)
 			return IODA_INVALID_PE;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
@@ -209,13 +209,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	 * expected to be 0 or last one of PE capabicity.
 	 */
 	r = &phb->hose->mem_resources[1];
-	if (phb->ioda.reserved_pe == 0)
+	if (phb->ioda.reserved_pe_idx == 0)
 		r->start += phb->ioda.m64_segsize;
-	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
 		r->end -= phb->ioda.m64_segsize;
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
-			phb->ioda.reserved_pe);
+			phb->ioda.reserved_pe_idx);
 
 	return 0;
 
@@ -284,7 +284,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 		return IODA_INVALID_PE;
 
 	/* Allocate bitmap */
-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	pe_alloc = kzalloc(size, GFP_KERNEL);
 	if (!pe_alloc) {
 		pr_warn("%s: Out of memory !\n",
@@ -300,7 +300,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	 * contributed by its child buses. For the case, we needn't
 	 * pick M64 dependent PE#.
 	 */
-	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
+	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
 		kfree(pe_alloc);
 		return IODA_INVALID_PE;
 	}
@@ -311,8 +311,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	 */
 	master_pe = NULL;
 	i = -1;
-	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
-		phb->ioda.total_pe) {
+	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
+		phb->ioda.total_pe_num) {
 		pe = &phb->ioda.pe_array[i];
 
 		if (!master_pe) {
@@ -364,7 +364,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	hose->mem_offset[1] = res->start - pci_addr;
 
 	phb->ioda.m64_size = resource_size(res);
-	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
+	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
 	phb->ioda.m64_base = pci_addr;
 
 	pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
@@ -465,7 +465,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 	s64 rc;
 
 	/* Sanity check on PE number */
-	if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
+	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
 		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
 
 	/*
@@ -1394,9 +1394,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		} else {
 			mutex_lock(&phb->ioda.pe_alloc_mutex);
 			*pdn->pe_num_map = bitmap_find_next_zero_area(
-				phb->ioda.pe_alloc, phb->ioda.total_pe,
+				phb->ioda.pe_alloc, phb->ioda.total_pe_num,
 				0, num_vfs, 0);
-			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
+			if (*pdn->pe_num_map >= phb->ioda.total_pe_num) {
 				mutex_unlock(&phb->ioda.pe_alloc_mutex);
 				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
 				kfree(pdn->pe_num_map);
@@ -2670,7 +2670,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	pdn->m64_single_mode = false;
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
-	mul = phb->ioda.total_pe;
+	mul = phb->ioda.total_pe_num;
 	total_vf_bar_sz = 0;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
@@ -2772,7 +2772,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 			region.end   = res->end - phb->ioda.io_pci_base;
 			index = region.start / phb->ioda.io_segsize;
 
-			while (index < phb->ioda.total_pe &&
+			while (index < phb->ioda.total_pe_num &&
 			       region.start <= region.end) {
 				phb->ioda.io_segmap[index] = pe->pe_number;
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
@@ -2797,7 +2797,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 				       phb->ioda.m32_pci_base;
 			index = region.start / phb->ioda.m32_segsize;
 
-			while (index < phb->ioda.total_pe &&
+			while (index < phb->ioda.total_pe_num &&
 			       region.start <= region.end) {
 				phb->ioda.m32_segmap[index] = pe->pe_number;
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
@@ -3067,13 +3067,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		pr_err("  Failed to map registers !\n");
 
 	/* Initialize more IODA stuff */
-	phb->ioda.total_pe = 1;
+	phb->ioda.total_pe_num = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
 	if (prop32)
-		phb->ioda.total_pe = be32_to_cpup(prop32);
+		phb->ioda.total_pe_num = be32_to_cpup(prop32);
 	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
 	if (prop32)
-		phb->ioda.reserved_pe = be32_to_cpup(prop32);
+		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
 
 	/* Parse 64-bit MMIO range */
 	pnv_ioda_parse_m64_window(phb);
@@ -3082,29 +3082,29 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	/* FW Has already off top 64k of M32 space (MSI space) */
 	phb->ioda.m32_size += 0x10000;
 
-	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe;
+	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe_num;
 	phb->ioda.m32_pci_base = hose->mem_resources[0].start - hose->mem_offset[0];
 	phb->ioda.io_size = hose->pci_io_size;
-	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe;
+	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	m32map_off = size;
-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
+	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
 	if (phb->type == PNV_PHB_IODA1) {
 		iomap_off = size;
-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
+		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
 	}
 	pemap_off = size;
-	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
+	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.m32_segmap = aux + m32map_off;
 	if (phb->type == PNV_PHB_IODA1)
 		phb->ioda.io_segmap = aux + iomap_off;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
+	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
@@ -3123,7 +3123,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 #endif
 
 	pr_info("  %03d (%03d) PE's M32: 0x%x [segment=0x%x]\n",
-		phb->ioda.total_pe, phb->ioda.reserved_pe,
+		phb->ioda.total_pe_num, phb->ioda.reserved_pe_idx,
 		phb->ioda.m32_size, phb->ioda.m32_segsize);
 	if (phb->ioda.m64_size)
 		pr_info("                 M64: 0x%lx [segment=0x%lx]\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index f2dd772..fa99daf 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -385,7 +385,7 @@ static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 		if (phb->type == PNV_PHB_P5IOC2)
 			pe_no = 0;
 		else
-			pe_no = phb->ioda.reserved_pe;
+			pe_no = phb->ioda.reserved_pe_idx;
 	}
 
 	/*
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index d655769..d11f0a5 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -122,8 +122,8 @@ struct pnv_phb {
 
 		struct {
 			/* Global bridge info */
-			unsigned int		total_pe;
-			unsigned int		reserved_pe;
+			unsigned int		total_pe_num;
+			unsigned int		reserved_pe_idx;
 
 			/* 32-bit MMIO window */
 			unsigned int		m32_size;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 09/50] powerpc/powernv: Fix initial IO and M32 segmap
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (7 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 08/50] powerpc/powernv: Rename PE# " Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
                   ` (41 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
There are two arrays for IO and M32 segment maps on every PHB.
The index of the arrays are segment number and the value stored
in the corresponding element is PE number, indicating the segment
is assigned to the PE. Initially, all elements in those two arrays
are zeroes, meaning all segments are assigned to PE#0. It's wrong.
This fixes the initial values in the elements of those two arrays
to IODA_INVALID_PE, meaning all segments aren't assigned to any
PE. In order to use IODA_INVALID_PE (-1) to represent invalid PE
number, the types of those two arrays are changed from "unsigned int"
to "int".
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++++--
 arch/powerpc/platforms/powernv/pci.h      | 4 ++--
 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index b4932c3..7ee7cfe 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3008,7 +3008,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
-	int len;
+	int i, len;
 	u64 phb_id;
 	void *aux;
 	long rc;
@@ -3101,8 +3101,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.m32_segmap = aux + m32map_off;
-	if (phb->type == PNV_PHB_IODA1)
+	for (i = 0; i < phb->ioda.total_pe_num; i++)
+		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+	if (phb->type == PNV_PHB_IODA1) {
 		phb->ioda.io_segmap = aux + iomap_off;
+		for (i = 0; i < phb->ioda.total_pe_num; i++)
+			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+	}
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index d11f0a5..2e01edd 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -148,8 +148,8 @@ struct pnv_phb {
 			struct pnv_ioda_pe	*pe_array;
 
 			/* M32 & IO segment maps */
-			unsigned int		*m32_segmap;
-			unsigned int		*io_segmap;
+			int			*m32_segmap;
+			int			*io_segmap;
 
 			/* IRQ chip */
 			int			irq_chip_init;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (8 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 09/50] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-05 22:56   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
                   ` (40 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
The original implementation of pnv_ioda_setup_pe_seg() configures
IO and M32 segments by separate logics, which can be merged by
by caching @segmap, @seg_size, @win in advance. This shouldn't
cause any behavioural changes.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
 1 file changed, 28 insertions(+), 34 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7ee7cfe..553d3f3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2752,8 +2752,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	struct pnv_phb *phb = hose->private_data;
 	struct pci_bus_region region;
 	struct resource *res;
-	int i, index;
-	int rc;
+	unsigned int segsize;
+	int *segmap, index, i;
+	uint16_t win;
+	int64_t rc;
 
 	/*
 	 * NOTE: We only care PCI bus based PE for now. For PCI
@@ -2770,23 +2772,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 		if (res->flags & IORESOURCE_IO) {
 			region.start = res->start - phb->ioda.io_pci_base;
 			region.end   = res->end - phb->ioda.io_pci_base;
-			index = region.start / phb->ioda.io_segsize;
-
-			while (index < phb->ioda.total_pe_num &&
-			       region.start <= region.end) {
-				phb->ioda.io_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping IO "
-					       "segment #%d to PE#%d\n",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
-
-				region.start += phb->ioda.io_segsize;
-				index++;
-			}
+			segsize      = phb->ioda.io_segsize;
+			segmap       = phb->ioda.io_segmap;
+			win          = OPAL_IO_WINDOW_TYPE;
 		} else if ((res->flags & IORESOURCE_MEM) &&
 			   !pnv_pci_is_mem_pref_64(res->flags)) {
 			region.start = res->start -
@@ -2795,23 +2783,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 			region.end   = res->end -
 				       hose->mem_offset[0] -
 				       phb->ioda.m32_pci_base;
-			index = region.start / phb->ioda.m32_segsize;
-
-			while (index < phb->ioda.total_pe_num &&
-			       region.start <= region.end) {
-				phb->ioda.m32_segmap[index] = pe->pe_number;
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping M32 "
-					       "segment#%d to PE#%d",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+			segsize      = phb->ioda.m32_segsize;
+			segmap       = phb->ioda.m32_segmap;
+			win          = OPAL_M32_WINDOW_TYPE;
+		} else {
+			continue;
+		}
 
-				region.start += phb->ioda.m32_segsize;
-				index++;
+		index = region.start / segsize;
+		while (index < phb->ioda.total_pe_num &&
+		       region.start <= region.end) {
+			segmap[index] = pe->pe_number;
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					pe->pe_number, win, 0, index);
+			if (rc != OPAL_SUCCESS) {
+				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
+					__func__, rc, win, index,
+					pe->phb->hose->global_number,
+					pe->pe_number);
+				break;
 			}
+
+			region.start += segsize;
+			index++;
 		}
 	}
 }
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (9 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-12  3:30   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption Gavin Shan
                   ` (39 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Currently, the IO and M32 segments are mapped to the corresponding
PE based on the windows of the parent bridge of PE's primary bus.
It's not going to work when the windows of root port or upstream
port of the PCIe switch behind root port are extended to PHB's
aperatuses in order to support hotplug in subsequent patch.
This fixes the issue by mapping IO and M32 segments based on the
resources of the PCI devices included in the PE, instead of the
windows of the parent bridge of the PE's primary bus.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 131 +++++++++++++++++-------------
 1 file changed, 75 insertions(+), 56 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 553d3f3..4ab93f8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2741,71 +2741,90 @@ truncate_iov:
 }
 #endif /* CONFIG_PCI_IOV */
 
-/*
- * This function is supposed to be called on basis of PE from top
- * to bottom style. So the the I/O or MMIO segment assigned to
- * parent PE could be overrided by its child PEs if necessary.
- */
-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
-				  struct pnv_ioda_pe *pe)
+static int pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
+				  struct resource *res)
 {
-	struct pnv_phb *phb = hose->private_data;
+	struct pnv_phb *phb = pe->phb;
 	struct pci_bus_region region;
-	struct resource *res;
-	unsigned int segsize;
-	int *segmap, index, i;
+	unsigned int index, segsize;
+	int *segmap;
 	uint16_t win;
 	int64_t rc;
 
-	/*
-	 * NOTE: We only care PCI bus based PE for now. For PCI
-	 * device based PE, for example SRIOV sensitive VF should
-	 * be figured out later.
-	 */
-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+	if (!res->parent || !res->flags || res->start > res->end)
+		return 0;
 
-	pci_bus_for_each_resource(pe->pbus, res, i) {
-		if (!res || !res->flags ||
-		    res->start > res->end)
-			continue;
+	if (res->flags & IORESOURCE_IO) {
+		region.start = res->start - phb->ioda.io_pci_base;
+		region.end   = res->end - phb->ioda.io_pci_base;
+		segsize      = phb->ioda.io_segsize;
+		segmap       = phb->ioda.io_segmap;
+		win          = OPAL_IO_WINDOW_TYPE;
+	} else if ((res->flags & IORESOURCE_MEM) &&
+		   !pnv_pci_is_mem_pref_64(res->flags)) {
+		region.start = res->start -
+			       phb->hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		region.end   = res->end -
+			       phb->hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		segsize      = phb->ioda.m32_segsize;
+		segmap       = phb->ioda.m32_segmap;
+		win          = OPAL_M32_WINDOW_TYPE;
+	} else {
+		return 0;
+	}
 
-		if (res->flags & IORESOURCE_IO) {
-			region.start = res->start - phb->ioda.io_pci_base;
-			region.end   = res->end - phb->ioda.io_pci_base;
-			segsize      = phb->ioda.io_segsize;
-			segmap       = phb->ioda.io_segmap;
-			win          = OPAL_IO_WINDOW_TYPE;
-		} else if ((res->flags & IORESOURCE_MEM) &&
-			   !pnv_pci_is_mem_pref_64(res->flags)) {
-			region.start = res->start -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			region.end   = res->end -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			segsize      = phb->ioda.m32_segsize;
-			segmap       = phb->ioda.m32_segmap;
-			win          = OPAL_M32_WINDOW_TYPE;
-		} else {
-			continue;
+	region.start = _ALIGN_DOWN(region.start, segsize);
+	region.end   = _ALIGN_UP(region.end, segsize);
+	index = region.start / segsize;
+	while (index < phb->ioda.total_pe_num && region.start < region.end) {
+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+				pe->pe_number, win, 0, index);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
+				__func__, rc, win, index,
+				phb->hose->global_number,
+				pe->pe_number);
+			return -EIO;
 		}
 
-		index = region.start / segsize;
-		while (index < phb->ioda.total_pe_num &&
-		       region.start <= region.end) {
-			segmap[index] = pe->pe_number;
-			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, win, 0, index);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
-					__func__, rc, win, index,
-					pe->phb->hose->global_number,
-					pe->pe_number);
-				break;
-			}
+		segmap[index] = pe->pe_number;
+		region.start += segsize;
+		index++;
+	}
+
+	return 0;
+}
+
+static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *pdev;
+	struct resource *res;
+	int i;
+
+	/* This function only works for bus dependent PE */
+	WARN_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+
+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
+			res = &pdev->resource[i];
+			if (pnv_ioda_setup_one_res(pe, res))
+				return;
+		}
+
+		/*
+		 * If the PE contains all subordinate PCI buses, the
+		 * windows of the child bridges should be mapped to
+		 * the PE as well.
+		 */
+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL && pci_is_bridge(pdev)))
+			continue;
 
-			region.start += segsize;
-			index++;
+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
+			if (pnv_ioda_setup_one_res(pe, res))
+				return;
 		}
 	}
 }
@@ -2819,7 +2838,7 @@ static void pnv_pci_ioda_setup_seg(void)
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(hose, pe);
+			pnv_ioda_setup_pe_seg(pe);
 		}
 	}
 }
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (10 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-12  4:18   ` Daniel Axtens
  2015-11-16  8:01   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 13/50] powerpc/powernv: Rename M64 related functions Gavin Shan
                   ` (38 subsequent siblings)
  50 siblings, 2 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
As we track M32 segment consumption, this introduces an array to
the PHB to track the mapping between M64 segment and PE number.
The information is going to be used to find M64 segment from the
PE number during PCI unplugging time in subsequent patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++--
 arch/powerpc/platforms/powernv/pci.h      |  3 ++-
 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 4ab93f8..76ce694 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -315,6 +315,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 		phb->ioda.total_pe_num) {
 		pe = &phb->ioda.pe_array[i];
 
+		phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
 		if (!master_pe) {
 			pe->flags |= PNV_IODA_PE_MASTER;
 			INIT_LIST_HEAD(&pe->slaves);
@@ -3018,7 +3019,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int i, len;
@@ -3103,6 +3104,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
+	m64map_off = size;
+	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
 	m32map_off = size;
 	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
 	if (phb->type == PNV_PHB_IODA1) {
@@ -3113,9 +3116,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
+	phb->ioda.m64_segmap = aux + m64map_off;
 	phb->ioda.m32_segmap = aux + m32map_off;
-	for (i = 0; i < phb->ioda.total_pe_num; i++)
+	for (i = 0; i < phb->ioda.total_pe_num; i++) {
+		phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
 		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+	}
 	if (phb->type == PNV_PHB_IODA1) {
 		phb->ioda.io_segmap = aux + iomap_off;
 		for (i = 0; i < phb->ioda.total_pe_num; i++)
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 2e01edd..671fd13 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -147,7 +147,8 @@ struct pnv_phb {
 			unsigned long		*pe_alloc;
 			struct pnv_ioda_pe	*pe_array;
 
-			/* M32 & IO segment maps */
+			/* M64/M32/IO segment maps */
+			int			*m64_segmap;
 			int			*m32_segmap;
 			int			*io_segmap;
 
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 13/50] powerpc/powernv: Rename M64 related functions
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (11 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC Gavin Shan
                   ` (37 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames those functions picking PE number based on consumed
M64 segments, mapping M64 segments to PEs as those functions are
going to be shared by IODA1/IODA2 in next patch. No logical changes
introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 76ce694..1f7d985 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -229,7 +229,7 @@ fail:
 	return -EIO;
 }
 
-static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
+static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 					 unsigned long *pe_bitmap)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
@@ -256,22 +256,22 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
 	}
 }
 
-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
-				     unsigned long *pe_bitmap,
-				     bool all)
+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
+				    unsigned long *pe_bitmap,
+				    bool all)
 {
 	struct pci_dev *pdev;
 
 	list_for_each_entry(pdev, &bus->devices, bus_list) {
-		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
+		pnv_ioda_reserve_dev_m64_pe(pdev, pe_bitmap);
 
 		if (all && pdev->subordinate)
-			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
-						 pe_bitmap, all);
+			pnv_ioda_reserve_m64_pe(pdev->subordinate,
+						pe_bitmap, all);
 	}
 }
 
-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -293,7 +293,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	}
 
 	/* Figure out reserved PE numbers by the PE */
-	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
+	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
 
 	/*
 	 * the current bus might not own M64 window and that's all
@@ -374,8 +374,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
 	phb->init_m64 = pnv_ioda2_init_m64;
-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
 }
 
 static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (12 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 13/50] powerpc/powernv: Rename M64 related functions Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-16  8:01   ` Alexey Kardashevskiy
                     ` (2 more replies)
  2015-11-04 13:12 ` [PATCH v7 15/50] powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
                   ` (36 subsequent siblings)
  50 siblings, 3 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have same code to support M64 on P7IOC and
PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
of them is pinned to the fixed PE# by bypassing the function of
M64DT. In turn, we just need different phb->init_m64() for P7IOC
and PHB3 to support M64.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.h      |  3 ++
 2 files changed, 86 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1f7d985..bfe69f1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 	}
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+	struct resource *r;
+	int index;
+
+	/*
+	 * There are 16 M64 BARs, each of which has 8 segments. So
+	 * there are as many M64 segments as the maximum number of
+	 * PEs, which is 128.
+	 */
+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
+		unsigned long base, segsz = phb->ioda.m64_segsize;
+		int64_t rc;
+
+		base = phb->ioda.m64_base +
+		       index * PNV_IODA1_M64_SEGS * segsz;
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, index, base, 0,
+				PNV_IODA1_M64_SEGS * segsz);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, index);
+			goto fail;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, index,
+				OPAL_ENABLE_M64_SPLIT);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, index);
+			goto fail;
+		}
+	}
+
+	/*
+	 * Exclude the segment used by the reserved PE, which
+	 * is expected to be 0 or last supported PE#.
+	 */
+	r = &phb->hose->mem_resources[1];
+	if (phb->ioda.reserved_pe_idx == 0)
+		r->start += phb->ioda.m64_segsize;
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
+		r->end -= phb->ioda.m64_segsize;
+	else
+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
+			phb->ioda.reserved_pe_idx);
+
+	return 0;
+
+fail:
+	for ( ; index >= 0; index--)
+		opal_pci_phb_mmio_enable(phb->opal_id,
+			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
+
+	return -EIO;
+}
+
 static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
 				    unsigned long *pe_bitmap,
 				    bool all)
@@ -325,6 +383,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 			pe->master = master_pe;
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
+
+		/*
+		 * P7IOC supports M64DT, which helps mapping M64 segment
+		 * to one particular PE#. However, PHB3 has fixed mapping
+		 * between M64 segment and PE#. In order to have same logic
+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
+		 * segment and PE# on P7IOC.
+		 */
+		if (phb->type == PNV_PHB_IODA1) {
+			int64_t rc;
+
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					pe->pe_number, OPAL_M64_WINDOW_TYPE,
+					pe->pe_number / PNV_IODA1_M64_SEGS,
+					pe->pe_number % PNV_IODA1_M64_SEGS);
+			if (rc != OPAL_SUCCESS)
+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
+					__func__, rc, phb->hose->global_number,
+					pe->pe_number);
+		}
 	}
 
 	kfree(pe_alloc);
@@ -339,8 +417,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	const u32 *r;
 	u64 pci_addr;
 
-	/* FIXME: Support M64 for P7IOC */
-	if (phb->type != PNV_PHB_IODA2) {
+	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
 		pr_info("  Not support M64 window\n");
 		return;
 	}
@@ -373,7 +450,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
-	phb->init_m64 = pnv_ioda2_init_m64;
+	if (phb->type == PNV_PHB_IODA1)
+		phb->init_m64 = pnv_ioda1_init_m64;
+	else
+		phb->init_m64 = pnv_ioda2_init_m64;
 	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
 	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 671fd13..c4019ac 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -78,6 +78,9 @@ struct pnv_ioda_pe {
 	struct list_head	list;
 };
 
+#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
+#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
+
 #define PNV_PHB_FLAG_EEH	(1 << 0)
 
 struct pnv_phb {
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 15/50] powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (13 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 16/50] powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE Gavin Shan
                   ` (35 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index bfe69f1..8a19454 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1845,9 +1845,10 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
 	.free = pnv_ioda2_table_free,
 };
 
-static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
-				      struct pnv_ioda_pe *pe, unsigned int base,
-				      unsigned int segs)
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+				       struct pnv_ioda_pe *pe,
+				       unsigned int base,
+				       unsigned int segs)
 {
 
 	struct page *tce_mem = NULL;
@@ -2435,7 +2436,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 		if (phb->type == PNV_PHB_IODA1) {
 			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
 				pe->dma_weight, segs);
-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
+			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
 		} else {
 			pe_info(pe, "Assign DMA32 space\n");
 			segs = 0;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 16/50] powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (14 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 15/50] powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3 Gavin Shan
                   ` (34 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.
This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page
has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 27 ++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |  1 +
 2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8a19454..5a08e20 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -48,9 +48,6 @@
 #include "powernv.h"
 #include "pci.h"
 
-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
-
 #define POWERNV_IOMMU_DEFAULT_LEVELS	1
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
@@ -1853,7 +1850,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 
 	struct page *tce_mem = NULL;
 	struct iommu_table *tbl;
-	unsigned int i;
+	unsigned int tce32_segsz, i;
 	int64_t rc;
 	void *addr;
 
@@ -1873,29 +1870,31 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	/* Grab a 32-bit TCE table */
 	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-		(base << 28), ((base + segs) << 28) - 1);
+		base * PNV_IODA1_DMA32_SEGSIZE,
+		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
 
 	/* XXX Currently, we allocate one big contiguous table for the
 	 * TCEs. We only really need one chunk per 256M of TCE space
 	 * (ie per segment) but that's an optimization for later, it
 	 * requires some added smarts with our get/put_tce implementation
 	 */
+	tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-				   get_order(TCE32_TABLE_SIZE * segs));
+				   get_order(tce32_segsz * segs));
 	if (!tce_mem) {
 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
 		goto fail;
 	}
 	addr = page_address(tce_mem);
-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
+	memset(addr, 0, tce32_segsz * segs);
 
 	/* Configure HW */
 	for (i = 0; i < segs; i++) {
 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
 					      pe->pe_number,
 					      base + i, 1,
-					      __pa(addr) + TCE32_TABLE_SIZE * i,
-					      TCE32_TABLE_SIZE, 0x1000);
+					      __pa(addr) + tce32_segsz * i,
+					      tce32_segsz, 0x1000);
 		if (rc) {
 			pe_err(pe, " Failed to configure 32-bit TCE table,"
 			       " err %ld\n", rc);
@@ -1904,8 +1903,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
-				  base << 28, IOMMU_PAGE_SHIFT_4K);
+	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
+				  base * PNV_IODA1_DMA32_SEGSIZE,
+				  IOMMU_PAGE_SHIFT_4K);
 
 	/* OPAL variant of P7IOC SW invalidated TCEs */
 	if (phb->ioda.tce_inval_reg)
@@ -1935,7 +1935,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	if (pe->tce32_seg >= 0)
 		pe->tce32_seg = -1;
 	if (tce_mem)
-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+		__free_pages(tce_mem, get_order(tce32_segsz * segs));
 	if (tbl) {
 		pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
 		iommu_free_table(tbl, "pnv");
@@ -3216,7 +3216,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
+				PNV_IODA1_DMA32_SEGSIZE;
 
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index c4019ac..46927ff 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -80,6 +80,7 @@ struct pnv_ioda_pe {
 
 #define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
 #define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
+#define PNV_IODA1_DMA32_SEGSIZE	0x10000000
 
 #define PNV_PHB_FLAG_EEH	(1 << 0)
 
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (15 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 16/50] powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  1:07   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list Gavin Shan
                   ` (33 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
In pnv_ioda_setup_dma(), it's unnecessary to calculate the DMA32
segments for PEs on PHB3 as the whole available DMA32 space can
be assigned to one specific PE on PHB3.
This splits pnv_ioda_setup_dma() to pnv_pci_ioda1_setup_dma() and
pnv_pci_ioda2_setup_dma() in order to avoid calculating DMA32
segments for PEs on PHB3. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 41 ++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5a08e20..4c2e023 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2383,7 +2383,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
+static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 {
 	struct pci_controller *hose = phb->hose;
 	unsigned int residual, remaining, segs, tw, base;
@@ -2428,26 +2428,30 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 				segs = remaining;
 		}
 
-		/*
-		 * For IODA2 compliant PHB3, we needn't care about the weight.
-		 * The all available 32-bits DMA space will be assigned to
-		 * the specific PE.
-		 */
-		if (phb->type == PNV_PHB_IODA1) {
-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-				pe->dma_weight, segs);
-			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
-		} else {
-			pe_info(pe, "Assign DMA32 space\n");
-			segs = 0;
-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
-		}
+		pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
+			pe->dma_weight, segs);
+		pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
 
 		remaining -= segs;
 		base += segs;
 	}
 }
 
+static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
+{
+	struct pnv_ioda_pe *pe;
+
+	pnv_pci_ioda_setup_opal_tce_kill(phb);
+
+	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
+		if (!pe->dma_weight)
+			continue;
+
+		pe_info(pe, "Assign DMA32 space\n");
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+	}
+}
+
 #ifdef CONFIG_PCI_MSI
 static void pnv_ioda2_msi_eoi(struct irq_data *d)
 {
@@ -2931,10 +2935,13 @@ static void pnv_pci_ioda_setup_DMA(void)
 	struct pnv_phb *phb;
 
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		pnv_ioda_setup_dma(hose->private_data);
+		phb = hose->private_data;
+		if (phb->type == PNV_PHB_IODA1)
+			pnv_pci_ioda1_setup_dma(phb);
+		else
+			pnv_pci_ioda2_setup_dma(phb);
 
 		/* Mark the PHB initialization done */
-		phb = hose->private_data;
 		phb->initialized = 1;
 	}
 }
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (16 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3 Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  1:54   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption Gavin Shan
                   ` (32 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
to their DMA32 weight. The PEs on the list are iterated to setup
their TCE32 tables at system booting time. The list is used for
once and there is no good reason for it to survive.
This moves the logic calculating DMA32 weight of PHB and PE to
pnv_pci_ioda1_setup_dma() to drop PHB's DMA32 list.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 150 ++++++++++++++----------------
 arch/powerpc/platforms/powernv/pci.h      |  19 ----
 2 files changed, 68 insertions(+), 101 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 4c2e023..20ebe6e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -891,44 +891,6 @@ out:
 	return 0;
 }
 
-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
-				       struct pnv_ioda_pe *pe)
-{
-	struct pnv_ioda_pe *lpe;
-
-	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
-		if (lpe->dma_weight < pe->dma_weight) {
-			list_add_tail(&pe->dma_link, &lpe->dma_link);
-			return;
-		}
-	}
-	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
-}
-
-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
-{
-	/* This is quite simplistic. The "base" weight of a device
-	 * is 10. 0 means no DMA is to be accounted for it.
-	 */
-
-	/* If it's a bridge, no DMA */
-	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-		return 0;
-
-	/* Reduce the weight of slow USB controllers */
-	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
-	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
-	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		return 3;
-
-	/* Increase the weight of RAID (includes Obsidian) */
-	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-		return 15;
-
-	/* Default */
-	return 10;
-}
-
 #ifdef CONFIG_PCI_IOV
 static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 {
@@ -1009,7 +971,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 			continue;
 		}
 		pdn->pe_number = pe->pe_number;
-		pe->dma_weight += pnv_ioda_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
 	}
@@ -1046,10 +1007,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
-	pe->tce32_seg = -1;
 	pe->mve_number = -1;
 	pe->rid = bus->busn_res.start << 8;
-	pe->dma_weight = 0;
 
 	if (all)
 		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
@@ -1071,17 +1030,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
-
-	/* Account for one DMA PE if at least one DMA capable device exist
-	 * below the bridge
-	 */
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
 }
 
 static void pnv_ioda_setup_PEs(struct pci_bus *bus)
@@ -1389,7 +1337,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->flags = PNV_IODA_PE_VF;
 		pe->pbus = NULL;
 		pe->parent_dev = pdev;
-		pe->tce32_seg = -1;
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
@@ -1842,6 +1789,47 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
 	.free = pnv_ioda2_table_free,
 };
 
+static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
+{
+	unsigned int *weight = (unsigned int *)data;
+
+	/* This is quite simplistic. The "base" weight of a device
+	 * is 10. 0 means no DMA is to be accounted for it.
+	 */
+
+	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
+		return 0;
+
+	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
+	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
+	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
+		*weight += 3;
+	else if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
+		*weight += 15;
+	else
+		*weight += 10;
+
+	return 0;
+}
+
+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
+{
+	unsigned int weight = 0;
+
+	if ((pe->flags & PNV_IODA_PE_DEV) && pe->pdev) {
+		pnv_pci_ioda_dev_dma_weight(pe->pdev, &weight);
+	} else if ((pe->flags & PNV_IODA_PE_BUS) && pe->pbus) {
+		struct pci_dev *pdev;
+
+		list_for_each_entry(pdev, &pe->pbus->devices, bus_list)
+			pnv_pci_ioda_dev_dma_weight(pdev, &weight);
+	} else if ((pe->flags & PNV_IODA_PE_BUS_ALL) && pe->pbus) {
+		pci_walk_bus(pe->pbus, pnv_pci_ioda_dev_dma_weight, &weight);
+	}
+
+	return weight;
+}
+
 static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 				       struct pnv_ioda_pe *pe,
 				       unsigned int base,
@@ -1858,17 +1846,12 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
-	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
-		return;
-
 	tbl = pnv_pci_table_alloc(phb->hose->node);
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
 			pe->pe_number);
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
 
 	/* Grab a 32-bit TCE table */
-	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		base * PNV_IODA1_DMA32_SEGSIZE,
 		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
@@ -1932,8 +1915,6 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
 	if (tce_mem)
 		__free_pages(tce_mem, get_order(tce32_segsz * segs));
 	if (tbl) {
@@ -2344,10 +2325,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 {
 	int64_t rc;
 
-	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
-		return;
-
 	/* TVE #1 is selected by PCI address bit 59 */
 	pe->tce_bypass_base = 1ull << 59;
 
@@ -2355,7 +2332,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 			pe->pe_number);
 
 	/* The PE will reserve all possible 32-bits space */
-	pe->tce32_seg = 0;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		phb->ioda.m32_pci_base);
 
@@ -2371,11 +2347,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 #endif
 
 	rc = pnv_pci_ioda2_setup_default_config(pe);
-	if (rc) {
-		if (pe->tce32_seg >= 0)
-			pe->tce32_seg = -1;
+	if (rc)
 		return;
-	}
 
 	if (pe->flags & PNV_IODA_PE_DEV)
 		iommu_add_device(&pe->pdev->dev);
@@ -2386,24 +2359,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 {
 	struct pci_controller *hose = phb->hose;
-	unsigned int residual, remaining, segs, tw, base;
+	unsigned int weight, total_weight, dma_pe_count;
+	unsigned int residual, remaining, segs, base;
 	struct pnv_ioda_pe *pe;
 
+	total_weight = 0;
+	dma_pe_count = 0;
+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+		weight = pnv_pci_ioda_pe_dma_weight(pe);
+		if (weight > 0)
+			dma_pe_count++;
+
+		total_weight += weight;
+	}
+
 	/* If we have more PE# than segments available, hand out one
 	 * per PE until we run out and let the rest fail. If not,
 	 * then we assign at least one segment per PE, plus more based
 	 * on the amount of devices under that PE
 	 */
-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
+	if (dma_pe_count > phb->ioda.tce32_count)
 		residual = 0;
 	else
-		residual = phb->ioda.tce32_count -
-			phb->ioda.dma_pe_count;
+		residual = phb->ioda.tce32_count - dma_pe_count;
 
 	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
 		hose->global_number, phb->ioda.tce32_count);
 	pr_info("PCI: %d PE# for a total weight of %d\n",
-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
+		dma_pe_count, total_weight);
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
@@ -2412,24 +2395,26 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 	 * weight
 	 */
 	remaining = phb->ioda.tce32_count;
-	tw = phb->ioda.dma_weight;
 	base = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma_weight)
+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+		weight = pnv_pci_ioda_pe_dma_weight(pe);
+		if (!weight)
 			continue;
+
 		if (!remaining) {
 			pe_warn(pe, "No DMA32 resources available\n");
 			continue;
 		}
 		segs = 1;
 		if (residual) {
-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
+			segs += ((weight * residual)  + (total_weight / 2)) /
+				total_weight;
 			if (segs > remaining)
 				segs = remaining;
 		}
 
 		pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-			pe->dma_weight, segs);
+			weight, segs);
 		pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
 
 		remaining -= segs;
@@ -2440,11 +2425,13 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
 {
 	struct pnv_ioda_pe *pe;
+	unsigned int weight;
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma_weight)
+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+		weight = pnv_pci_ioda_pe_dma_weight(pe);
+		if (!weight)
 			continue;
 
 		pe_info(pe, "Assign DMA32 space\n");
@@ -3218,7 +3205,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
-	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 	mutex_init(&phb->ioda.pe_list_mutex);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 46927ff..2038ef2 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -49,14 +49,7 @@ struct pnv_ioda_pe {
 	/* PE number */
 	unsigned int		pe_number;
 
-	/* "Weight" assigned to the PE for the sake of DMA resource
-	 * allocations
-	 */
-	unsigned int		dma_weight;
-
 	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
-	int			tce32_seg;
-	int			tce32_segcount;
 	struct iommu_table_group table_group;
 
 	/* 64-bit TCE bypass region */
@@ -74,7 +67,6 @@ struct pnv_ioda_pe {
 	struct list_head	slaves;
 
 	/* Link in list of PE#s */
-	struct list_head	dma_link;
 	struct list_head	list;
 };
 
@@ -175,17 +167,6 @@ struct pnv_phb {
 			/* 32-bit TCE tables allocation */
 			unsigned long		tce32_count;
 
-			/* Total "weight" for the sake of DMA resources
-			 * allocation
-			 */
-			unsigned int		dma_weight;
-			unsigned int		dma_pe_count;
-
-			/* Sorted list of used PE's, sorted at
-			 * boot for resource allocation purposes
-			 */
-			struct list_head	pe_dma_list;
-
 			/* TCE cache invalidate registers (physical and
 			 * remapped)
 			 */
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (17 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  0:28   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation Gavin Shan
                   ` (31 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Similar to the mechanism tracking consumed IO/M32/M64 segments,
this introduces an array for each PHB to track the consumed DMA32
segments, which are going to be released on PCI unplugging time.
The index of the array is the DMA32 segment number while the value
stored in the element is the assigned PE number.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++++++---------
 arch/powerpc/platforms/powernv/pci.h      |  7 ++++---
 2 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 20ebe6e..fdbc012 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1885,6 +1885,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 		}
 	}
 
+	/* Setup TCE32 segment mapping */
+	for (i = base; i < base + segs; i++)
+		phb->ioda.dma32_segmap[i] = pe->pe_number;
+
 	/* Setup linux iommu table */
 	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
 				  base * PNV_IODA1_DMA32_SEGSIZE,
@@ -2378,13 +2382,13 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 	 * then we assign at least one segment per PE, plus more based
 	 * on the amount of devices under that PE
 	 */
-	if (dma_pe_count > phb->ioda.tce32_count)
+	if (dma_pe_count > phb->ioda.dma32_count)
 		residual = 0;
 	else
-		residual = phb->ioda.tce32_count - dma_pe_count;
+		residual = phb->ioda.dma32_count - dma_pe_count;
 
 	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.tce32_count);
+		hose->global_number, phb->ioda.dma32_count);
 	pr_info("PCI: %d PE# for a total weight of %d\n",
 		dma_pe_count, total_weight);
 
@@ -2394,7 +2398,7 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 	 * out one base segment plus any residual segments based on
 	 * weight
 	 */
-	remaining = phb->ioda.tce32_count;
+	remaining = phb->ioda.dma32_count;
 	base = 0;
 	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
 		weight = pnv_pci_ioda_pe_dma_weight(pe);
@@ -3094,7 +3098,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, m64map_off, m32map_off, pemap_off;
+	unsigned long iomap_off = 0, dma32map_off = 0;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int i, len;
@@ -3177,6 +3182,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
 
+	/* Calculate how many 32-bit TCE segments we have */
+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
+				PNV_IODA1_DMA32_SEGSIZE;
+
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	m64map_off = size;
@@ -3186,6 +3195,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->type == PNV_PHB_IODA1) {
 		iomap_off = size;
 		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
+		dma32map_off = size;
+		size += phb->ioda.dma32_count *
+			sizeof(phb->ioda.dma32_segmap[0]);
 	}
 	pemap_off = size;
 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
@@ -3201,6 +3213,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		phb->ioda.io_segmap = aux + iomap_off;
 		for (i = 0; i < phb->ioda.total_pe_num; i++)
 			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+
+		phb->ioda.dma32_segmap = aux + dma32map_off;
+		for (i = 0; i < phb->ioda.dma32_count; i++)
+			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
 	}
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
@@ -3208,10 +3224,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 	mutex_init(&phb->ioda.pe_list_mutex);
 
-	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
-				PNV_IODA1_DMA32_SEGSIZE;
-
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
 					 window_type,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 2038ef2..0802fcd 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -148,6 +148,10 @@ struct pnv_phb {
 			int			*m32_segmap;
 			int			*io_segmap;
 
+			/* DMA32 segment maps - IODA1 only */
+			unsigned long		dma32_count;
+			int			*dma32_segmap;
+
 			/* IRQ chip */
 			int			irq_chip_init;
 			struct irq_chip		irq_chip;
@@ -164,9 +168,6 @@ struct pnv_phb {
 			 */
 			unsigned char		pe_rmap[0x10000];
 
-			/* 32-bit TCE tables allocation */
-			unsigned long		tce32_count;
-
 			/* TCE cache invalidate registers (physical and
 			 * remapped)
 			 */
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (18 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-20  3:14   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity Gavin Shan
                   ` (30 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
In current implementation, the DMA32 segments required by one specific
PE isn't calculated with the information hold in the PE independently.
It conflicts with the PCI hotplug design: PE centralized, meaning the
PE's DMA32 segments should be calculated from the information hold in
the PE independently.
This moves the logic calculating PE's consumed DMA32 segments from
pnv_pci_ioda1_setup_dma() to pnv_pci_ioda1_setup_dma_pe() so that PE's
DMA32 segments are calculated/allocated from the information hold in
the PE (DMA32 weight). Also the logic is improved: we try to allocate
as much DMA32 segments as we can. It's acceptable that number of DMA32
segments less than the expected number are allocated.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 119 ++++++++++++++----------------
 1 file changed, 57 insertions(+), 62 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fdbc012..0e66c4d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1830,15 +1830,23 @@ static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
 	return weight;
 }
 
+static unsigned int pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
+{
+	unsigned int weight = 0;
+
+	pci_walk_bus(phb->hose->bus, pnv_pci_ioda_dev_dma_weight, &weight);
+	return weight;
+}
+
 static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
-				       struct pnv_ioda_pe *pe,
-				       unsigned int base,
-				       unsigned int segs)
+				       struct pnv_ioda_pe *pe)
 {
 
 	struct page *tce_mem = NULL;
 	struct iommu_table *tbl;
-	unsigned int tce32_segsz, i;
+	unsigned int weight, total_weight;
+	unsigned int tce32_segsz, base, segs, i;
+	bool found;
 	int64_t rc;
 	void *addr;
 
@@ -1846,12 +1854,55 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
+	total_weight = pnv_pci_ioda_total_dma_weight(phb);
+	weight = pnv_pci_ioda_pe_dma_weight(pe);
+	if (!total_weight || !weight)
+		return;
+
+	segs = (weight * phb->ioda.dma32_count) / total_weight;
+	if (!segs)
+		segs = 1;
+
+	/*
+	 * Allocate continuous DMA32 segments. We begin with the expected
+	 * number of segments. With one more attempt, the number of DMA32
+	 * segments to be allocated is decreased by one until one segment
+	 * is allocated successfully.
+	 */
+	while (segs) {
+		found = false;
+		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
+			for (i = base; i < base + segs; i++) {
+				if (phb->ioda.dma32_segmap[i] !=
+				    IODA_INVALID_PE)
+					break;
+			}
+
+			if (i >= base + segs) {
+				found = true;
+				break;
+			}
+		}
+
+		if (found)
+			break;
+
+		segs--;
+	}
+
+	if (!segs) {
+		pe_warn(pe, "No available DMA32 resource\n");
+		return;
+	}
+
 	tbl = pnv_pci_table_alloc(phb->hose->node);
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
 			pe->pe_number);
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
 
 	/* Grab a 32-bit TCE table */
+	pe_info(pe, "DMA weight %d (%d), assigned %d DMA32 segments\n",
+		weight, total_weight, segs);
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		base * PNV_IODA1_DMA32_SEGSIZE,
 		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
@@ -2362,68 +2413,12 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
 {
-	struct pci_controller *hose = phb->hose;
-	unsigned int weight, total_weight, dma_pe_count;
-	unsigned int residual, remaining, segs, base;
 	struct pnv_ioda_pe *pe;
 
-	total_weight = 0;
-	dma_pe_count = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-		weight = pnv_pci_ioda_pe_dma_weight(pe);
-		if (weight > 0)
-			dma_pe_count++;
-
-		total_weight += weight;
-	}
-
-	/* If we have more PE# than segments available, hand out one
-	 * per PE until we run out and let the rest fail. If not,
-	 * then we assign at least one segment per PE, plus more based
-	 * on the amount of devices under that PE
-	 */
-	if (dma_pe_count > phb->ioda.dma32_count)
-		residual = 0;
-	else
-		residual = phb->ioda.dma32_count - dma_pe_count;
-
-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.dma32_count);
-	pr_info("PCI: %d PE# for a total weight of %d\n",
-		dma_pe_count, total_weight);
-
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
-	/* Walk our PE list and configure their DMA segments, hand them
-	 * out one base segment plus any residual segments based on
-	 * weight
-	 */
-	remaining = phb->ioda.dma32_count;
-	base = 0;
-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-		weight = pnv_pci_ioda_pe_dma_weight(pe);
-		if (!weight)
-			continue;
-
-		if (!remaining) {
-			pe_warn(pe, "No DMA32 resources available\n");
-			continue;
-		}
-		segs = 1;
-		if (residual) {
-			segs += ((weight * residual)  + (total_weight / 2)) /
-				total_weight;
-			if (segs > remaining)
-				segs = remaining;
-		}
-
-		pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
-			weight, segs);
-		pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
-
-		remaining -= segs;
-		base += segs;
-	}
+	list_for_each_entry(pe, &phb->ioda.pe_list, list)
+		pnv_pci_ioda1_setup_dma_pe(phb, pe);
 }
 
 static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (19 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  0:29   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe() Gavin Shan
                   ` (29 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Each PHB maintains an array helping to translate 2-bytes Request
ID (RID) to PE# with the assumption that PE# takes one byte, meaning
that we can't have more than 256 PEs. However, pci_dn->pe_number
already had 4-bytes for the PE#.
This extends the PE# capacity so that each of them will be 4-bytes
long. Then we can reuse IODA_INVALID_PE to check the PE# stored in
phb->pe_rmap[] is valid or not.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
 arch/powerpc/platforms/powernv/pci.h      | 7 ++-----
 2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0e66c4d..ef93a01 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -766,7 +766,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	/* Clear the reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = 0;
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
 	/* Release from all parents PELT-V */
 	while (parent) {
@@ -3164,6 +3164,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (prop32)
 		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
 
+	/* Invalidate RID to PE# mapping */
+	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
+		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
+
 	/* Parse 64-bit MMIO range */
 	pnv_ioda_parse_m64_window(phb);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 0802fcd..5df945f 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -162,11 +162,8 @@ struct pnv_phb {
 			struct list_head	pe_list;
 			struct mutex            pe_list_mutex;
 
-			/* Reverse map of PEs, will have to extend if
-			 * we are to support more than 256 PEs, indexed
-			 * bus { bus, devfn }
-			 */
-			unsigned char		pe_rmap[0x10000];
+			/* Reverse map of PEs, indexed by {bus, devfn} */
+			int			pe_rmap[0x10000];
 
 			/* TCE cache invalidate registers (physical and
 			 * remapped)
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (20 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  0:30   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
                   ` (28 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This introduces pnv_ioda_init_pe() to initialize the specified PE
instance (phb->ioda.pe_array[x]). It's used by pnv_ioda_alloc_pe()
and pnv_ioda_reserve_pe(). No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index ef93a01..488e0f8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -129,6 +129,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
+{
+	phb->ioda.pe_array[pe_no].phb = phb;
+	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+
+	return &phb->ioda.pe_array[pe_no];
+}
+
 static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
 	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
@@ -141,8 +149,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 		pr_debug("%s: PE %d was reserved on PHB#%x\n",
 			 __func__, pe_no, phb->hose->global_number);
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+	pnv_ioda_init_pe(phb, pe_no);
 }
 
 static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
@@ -156,8 +163,7 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 			return IODA_INVALID_PE;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
-	phb->ioda.pe_array[pe].phb = phb;
-	phb->ioda.pe_array[pe].pe_number = pe;
+	pnv_ioda_init_pe(phb, pe);
 	return pe;
 }
 
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (21 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  5:08   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 24/50] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
                   ` (27 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
In current implementation, the PEs that are allocated or picked
from the reserved list are identified by PE number. The PE instance
has to be picked according to the PE number eventually. We have
same issue when PE is released.
For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
returns the reserved/allocated PE instance to be used in subsequent
patches. On the other hand, pnv_ioda_free_pe() uses PE instance
(not number) as its argument. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.h      |  2 +-
 2 files changed, 46 insertions(+), 37 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 488e0f8..ae82df1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -152,7 +152,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 	pnv_ioda_init_pe(phb, pe_no);
 }
 
-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe;
 
@@ -160,19 +160,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
 					phb->ioda.total_pe_num, 0);
 		if (pe >= phb->ioda.total_pe_num)
-			return IODA_INVALID_PE;
+			return NULL;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
-	pnv_ioda_init_pe(phb, pe);
-	return pe;
+	return pnv_ioda_init_pe(phb, pe);
 }
 
-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
 {
-	WARN_ON(phb->ioda.pe_array[pe].pdev);
+	struct pnv_phb *phb = pe->phb;
+
+	WARN_ON(pe->pdev);
 
-	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
-	clear_bit(pe, phb->ioda.pe_alloc);
+	memset(pe, 0, sizeof(struct pnv_ioda_pe));
+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
 }
 
 /* The default M64 BAR is shared by all PEs */
@@ -332,7 +333,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
 	}
 }
 
-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -342,7 +343,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 
 	/* Root bus shouldn't use M64 */
 	if (pci_is_root_bus(bus))
-		return IODA_INVALID_PE;
+		return NULL;
 
 	/* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
@@ -350,7 +351,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	if (!pe_alloc) {
 		pr_warn("%s: Out of memory !\n",
 			__func__);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* Figure out reserved PE numbers by the PE */
@@ -363,7 +364,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	 */
 	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
 		kfree(pe_alloc);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/*
@@ -409,7 +410,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	}
 
 	kfree(pe_alloc);
-	return master_pe->pe_number;
+	return master_pe;
 }
 
 static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
@@ -988,28 +989,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
  * subordinate PCI devices and buses. The second type of PE is normally
  * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
  */
-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
-	int pe_num = IODA_INVALID_PE;
+	struct pnv_ioda_pe *pe = NULL;
 
 	/* Check if PE is determined by M64 */
 	if (phb->pick_m64_pe)
-		pe_num = phb->pick_m64_pe(bus, all);
+		pe = phb->pick_m64_pe(bus, all);
 
 	/* The PE number isn't pinned by M64 */
-	if (pe_num == IODA_INVALID_PE)
-		pe_num = pnv_ioda_alloc_pe(phb);
+	if (!pe)
+		pe = pnv_ioda_alloc_pe(phb);
 
-	if (pe_num == IODA_INVALID_PE) {
+	if (!pe) {
 		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
-		return;
+		return NULL;
 	}
 
-	pe = &phb->ioda.pe_array[pe_num];
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
@@ -1018,17 +1017,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	if (all)
 		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
-			bus->busn_res.start, bus->busn_res.end, pe_num);
+			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
 	else
 		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
-			bus->busn_res.start, pe_num);
+			bus->busn_res.start, pe->pe_number);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
+		pnv_ioda_free_pe(pe);
 		pe->pbus = NULL;
-		return;
+		return NULL;
 	}
 
 	/* Associate it with all child devices */
@@ -1036,6 +1034,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
+
+	return pe;
 }
 
 static void pnv_ioda_setup_PEs(struct pci_bus *bus)
@@ -1267,7 +1267,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 
 		pnv_ioda_deconfigure_pe(phb, pe);
 
-		pnv_ioda_free_pe(phb, pe->pe_number);
+		pnv_ioda_free_pe(pe);
 	}
 }
 
@@ -1276,6 +1276,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe;
 	struct pci_dn         *pdn;
 	struct pci_sriov      *iov;
 	u16                    num_vfs, i;
@@ -1300,8 +1301,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 		/* Release PE numbers */
 		if (pdn->m64_single_mode) {
 			for (i = 0; i < num_vfs; i++) {
-				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
-					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
+					continue;
+
+				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
+				pnv_ioda_free_pe(pe);
 			}
 		} else
 			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
@@ -1354,9 +1358,8 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 		if (pnv_ioda_configure_pe(phb, pe)) {
 			/* XXX What do we do here ? */
-			if (pe_num)
-				pnv_ioda_free_pe(phb, pe_num);
 			pe->pdev = NULL;
+			pnv_ioda_free_pe(pe);
 			continue;
 		}
 
@@ -1374,6 +1377,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe;
 	struct pci_dn         *pdn;
 	int                    ret;
 	u16                    i;
@@ -1416,11 +1420,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		/* Calculate available PE for required VFs */
 		if (pdn->m64_single_mode) {
 			for (i = 0; i < num_vfs; i++) {
-				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
-				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
+				pe = pnv_ioda_alloc_pe(phb);
+				if (!pe) {
 					ret = -EBUSY;
 					goto m64_failed;
 				}
+
+				pdn->pe_num_map[i] = pe->pe_number;
 			}
 		} else {
 			mutex_lock(&phb->ioda.pe_alloc_mutex);
@@ -1465,8 +1471,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 m64_failed:
 	if (pdn->m64_single_mode) {
 		for (i = 0; i < num_vfs; i++) {
-			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
-				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
+				continue;
+
+			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
+			pnv_ioda_free_pe(pe);
 		}
 	} else
 		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 5df945f..e55ab0e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -105,7 +105,7 @@ struct pnv_phb {
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pci_bus *bus,
 			       unsigned long *pe_bitmap, bool all);
-	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
+	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
 	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 24/50] powerpc/powernv: Allocate PE# in reverse order
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (22 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus Gavin Shan
                   ` (26 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
PE number for one particular PE can be allocated dynamically or
reserved according to the consumed M64 (64-bits prefetchable)
segments of the PE. The M64 resources, and hence their segments
and PE number are assigned/reserved in ascending order. The PE
numbers are allocated dynamically in ascending order as well.
It's not a problem as the PE numbers are reserved and then
allocated all at once in fine order. However, it will introduce
conflicts when PCI hotplug is supported: the PE number to be
reserved for newly added PE might have been assigned.
To resolve above conflicts, this forces the PE number to be
allocated dynamically in reverse order. With this patch applied,
the PE numbers are reserved in ascending order, but allocated
dynamically in reverse order.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index ae82df1..eea1c96 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -154,16 +154,14 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 
 static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
-	unsigned long pe;
+	unsigned long pe = phb->ioda.total_pe_num - 1;
 
-	do {
-		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe_num, 0);
-		if (pe >= phb->ioda.total_pe_num)
-			return NULL;
-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
+	for (pe = phb->ioda.total_pe_num - 1; pe >= 0; pe--) {
+		if (!test_and_set_bit(pe, phb->ioda.pe_alloc))
+			return pnv_ioda_init_pe(phb, pe);
+	}
 
-	return pnv_ioda_init_pe(phb, pe);
+	return NULL;
 }
 
 static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (23 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 24/50] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  6:04   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
                   ` (25 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
We're going to reserve/assign PEs when pcibios_setup_bridge() is
called. The function won't be called for root bus as it doesn't
have parent bridge. However, the root bus still needs a PE to be
covered.
This reserves PE numbers that are adjacent to the reserved one
for root buses.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++++++++++++++++++++++---------
 arch/powerpc/platforms/powernv/pci.h      |  1 +
 2 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index eea1c96..5e6745f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -207,14 +207,14 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	set_bit(phb->ioda.m64_bar_idx, &phb->ioda.m64_bar_alloc);
 
 	/*
-	 * Strip off the segment used by the reserved PE, which is
-	 * expected to be 0 or last one of PE capabicity.
+	 * Exclude the segments for reserved and root bus PE, which
+	 * are first or last two PEs.
 	 */
 	r = &phb->hose->mem_resources[1];
 	if (phb->ioda.reserved_pe_idx == 0)
-		r->start += phb->ioda.m64_segsize;
+		r->start += (2 * phb->ioda.m64_segsize);
 	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
-		r->end -= phb->ioda.m64_segsize;
+		r->end -= (2 * phb->ioda.m64_segsize);
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe_idx);
@@ -294,14 +294,14 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 	}
 
 	/*
-	 * Exclude the segment used by the reserved PE, which
-	 * is expected to be 0 or last supported PE#.
+	 * Exclude the segments for reserved and root bus PE, which
+	 * are first or last two PEs.
 	 */
 	r = &phb->hose->mem_resources[1];
 	if (phb->ioda.reserved_pe_idx == 0)
-		r->start += phb->ioda.m64_segsize;
+		r->start += (2 * phb->ioda.m64_segsize);
 	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
-		r->end -= phb->ioda.m64_segsize;
+		r->end -= (2 * phb->ioda.m64_segsize);
 	else
 		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe_idx);
@@ -3231,7 +3231,22 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
 	}
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
+
+	/*
+	 * Choose PE number for root bus, which shouldn't have
+	 * M64 resources consumed by its child devices. To pick
+	 * the PE number adjacent to the reserved one if possible.
+	 */
+	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe_idx);
+	if (phb->ioda.reserved_pe_idx == 0) {
+		phb->ioda.root_pe_idx = 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
+	} else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1)) {
+		phb->ioda.root_pe_idx = phb->ioda.reserved_pe_idx - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
+	} else {
+		phb->ioda.root_pe_idx = IODA_INVALID_PE;
+	}
 
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 	mutex_init(&phb->ioda.pe_list_mutex);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index e55ab0e..a8ba97f 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -120,6 +120,7 @@ struct pnv_phb {
 			/* Global bridge info */
 			unsigned int		total_pe_num;
 			unsigned int		reserved_pe_idx;
+			unsigned int		root_pe_idx;
 
 			/* 32-bit MMIO window */
 			unsigned int		m32_size;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (24 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-17  7:57   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs Gavin Shan
                   ` (24 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The
function is called for once after PCI probing and resources
assignment is completed. So it isn't hotplug friendly.
This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
is called on the event during system bootup and PCI hotplug: updating
PCI bridge's windows after resource assignment/reassignment are done.
For partial hotplug case, where not all PCI devices belonging to the
PE are unplugged and plugged again, we just need unbinding/binding
the affected PCI devices with the corresponding PE without creating
new one.
As there is no upstream bridge for root bus that needs to be covered
by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
before any other PEs can be created, as PE for root bus is the ancestor
to anyone else.
On the other hand, the windows of root port or the upstream port
of PCIe switch behind root port are extended to be PHB's aperatuses
to accommodate the additonal resources needed by newly plugged devices
based on the fact: hotpluggable slot is behind root port or downstream
port of the PCIe switch behind root port. The extension for those
PCI brdiges' windows is done in ppc_md.pcibios_setup_bridge() as
well.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 240 +++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.h      |   1 +
 2 files changed, 138 insertions(+), 103 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5e6745f..0bb0056 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -975,6 +975,15 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 				pci_name(dev));
 			continue;
 		}
+
+		/*
+		 * In partial hotplug case, the PCI device might be still
+		 * associated with the PE and needn't be attached to the
+		 * PE again.
+		 */
+		if (pdn->pe_number != IODA_INVALID_PE)
+			continue;
+
 		pdn->pe_number = pe->pe_number;
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
@@ -992,9 +1001,26 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
 	struct pnv_ioda_pe *pe = NULL;
+	int pe_num;
+
+	/*
+	 * In partial hotplug case, the PE instance might be still alive.
+	 * We should reuse it instead of allocating a new one.
+	 */
+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
+	if (pe_num != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pe_num];
+		pnv_ioda_setup_same_PE(bus, pe);
+		return NULL;
+	}
+
+	/* PE number for root bus should have been reserved */
+	if (pci_is_root_bus(bus) &&
+	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
 
 	/* Check if PE is determined by M64 */
-	if (phb->pick_m64_pe)
+	if (!pe && phb->pick_m64_pe)
 		pe = phb->pick_m64_pe(bus, all);
 
 	/* The PE number isn't pinned by M64 */
@@ -1036,46 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	return pe;
 }
 
-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	pnv_ioda_setup_bus_PE(bus, false);
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (dev->subordinate) {
-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
-				pnv_ioda_setup_bus_PE(dev->subordinate, true);
-			else
-				pnv_ioda_setup_PEs(dev->subordinate);
-		}
-	}
-}
-
-/*
- * Configure PEs so that the downstream PCI buses and devices
- * could have their associated PE#. Unfortunately, we didn't
- * figure out the way to identify the PLX bridge yet. So we
- * simply put the PCI bus and the subordinate behind the root
- * port to PE# here. The game rule here is expected to be changed
- * as soon as we can detected PLX bridge correctly.
- */
-static void pnv_pci_ioda_setup_PEs(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-
-		/* M64 layout might affect PE allocation */
-		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(hose->bus, NULL, true);
-
-		pnv_ioda_setup_PEs(hose->bus);
-	}
-}
-
 #ifdef CONFIG_PCI_IOV
 static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
 {
@@ -2391,8 +2377,13 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 				       struct pnv_ioda_pe *pe)
 {
+	unsigned int weight;
 	int64_t rc;
 
+	weight = pnv_pci_ioda_pe_dma_weight(pe);
+	if (!weight)
+		return;
+
 	/* TVE #1 is selected by PCI address bit 59 */
 	pe->tce_bypass_base = 1ull << 59;
 
@@ -2424,33 +2415,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 }
 
-static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
-{
-	struct pnv_ioda_pe *pe;
-
-	pnv_pci_ioda_setup_opal_tce_kill(phb);
-
-	list_for_each_entry(pe, &phb->ioda.pe_list, list)
-		pnv_pci_ioda1_setup_dma_pe(phb, pe);
-}
-
-static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
-{
-	struct pnv_ioda_pe *pe;
-	unsigned int weight;
-
-	pnv_pci_ioda_setup_opal_tce_kill(phb);
-
-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-		weight = pnv_pci_ioda_pe_dma_weight(pe);
-		if (!weight)
-			continue;
-
-		pe_info(pe, "Assign DMA32 space\n");
-		pnv_pci_ioda2_setup_dma_pe(phb, pe);
-	}
-}
-
 #ifdef CONFIG_PCI_MSI
 static void pnv_ioda2_msi_eoi(struct irq_data *d)
 {
@@ -2914,37 +2878,6 @@ static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
 	}
 }
 
-static void pnv_pci_ioda_setup_seg(void)
-{
-	struct pci_controller *tmp, *hose;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(pe);
-		}
-	}
-}
-
-static void pnv_pci_ioda_setup_DMA(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-		if (phb->type == PNV_PHB_IODA1)
-			pnv_pci_ioda1_setup_dma(phb);
-		else
-			pnv_pci_ioda2_setup_dma(phb);
-
-		/* Mark the PHB initialization done */
-		phb->initialized = 1;
-	}
-}
-
 static void pnv_pci_ioda_create_dbgfs(void)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -2955,6 +2888,9 @@ static void pnv_pci_ioda_create_dbgfs(void)
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 
+		/* Notify initialization of PHB done */
+		phb->initialized = 1;
+
 		sprintf(name, "PCI%04x", hose->global_number);
 		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
 		if (!phb->dbgfs)
@@ -2966,10 +2902,6 @@ static void pnv_pci_ioda_create_dbgfs(void)
 
 static void pnv_pci_ioda_fixup(void)
 {
-	pnv_pci_ioda_setup_PEs();
-	pnv_pci_ioda_setup_seg();
-	pnv_pci_ioda_setup_DMA();
-
 	pnv_pci_ioda_create_dbgfs();
 
 #ifdef CONFIG_EEH
@@ -3019,6 +2951,104 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+/*
+ * We are updating root port or the upstream port of the
+ * bridge behind the root port with PHB's windows in order
+ * to accommodate the changes on required resources during
+ * PCI (slot) hotplug, which is connected to either root
+ * port or the downstream ports of PCIe switch behind the
+ * root port.
+ */
+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
+					   unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct resource *r, *w;
+	int i;
+
+	/* Check if we need apply fixup to the bridge's windows */
+	if (!pci_is_root_bus(bridge->bus) &&
+	    !pci_is_root_bus(bridge->bus->self->bus))
+		return;
+
+	/* Fixup the resoureces */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
+		if (!r->flags || !r->parent)
+			continue;
+
+		w = NULL;
+		if (r->flags & type & IORESOURCE_IO)
+			w = &hose->io_resource;
+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
+			 (type & IORESOURCE_PREFETCH) &&
+			 phb->ioda.m64_segsize)
+			w = &hose->mem_resources[1];
+		else if (r->flags & type & IORESOURCE_MEM)
+			w = &hose->mem_resources[0];
+
+		r->start = w->start;
+		r->end = w->end;
+	}
+}
+
+static void pnv_pci_setup_bridge(struct pci_bus *bus,
+				 unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct pnv_ioda_pe *pe;
+	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
+
+	 /* The PE for root bus should be realized before any one else */
+	if (!phb->ioda.root_pe_populated) {
+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
+		if (pe) {
+			phb->ioda.root_pe_idx = pe->pe_number;
+			phb->ioda.root_pe_populated = true;
+		}
+	}
+
+	/* Extend bridge's windows if necessary */
+	pnv_pci_fixup_bridge_resources(bus, type);
+
+	/* Don't assign PE to PCI bus, which doesn't have subordinate devices */
+	if (list_empty(&bus->devices))
+		return;
+
+	/* Reserve PEs according to used M64 resources */
+	if (phb->reserve_m64_pe)
+		phb->reserve_m64_pe(bus, NULL, all);
+
+	/*
+	 * Assign PE. We might run here because of partial hotplug.
+	 * For the case, we just pick up the existing PE and should
+	 * not allocate resources again.
+	 */
+	pe = pnv_ioda_setup_bus_PE(bus, all);
+	if (!pe)
+		return;
+
+	/* Setup MMIO mapping */
+	pnv_ioda_setup_pe_seg(pe);
+
+	/* Setup DMA */
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		pnv_pci_ioda1_setup_dma_pe(phb, pe);
+		break;
+	case PNV_PHB_IODA2:
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+		break;
+	default:
+		pr_warn("%s: No DMA for PHB#%d (type %d)\n",
+			__func__, phb->hose->global_number, phb->type);
+	}
+}
+
 #ifdef CONFIG_PCI_IOV
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
@@ -3095,6 +3125,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 #endif
 	.enable_device_hook	= pnv_pci_enable_device_hook,
 	.window_alignment	= pnv_pci_window_alignment,
+	.setup_bridge		= pnv_pci_setup_bridge,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
 	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
@@ -3168,6 +3199,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->regs == NULL)
 		pr_err("  Failed to map registers !\n");
 
+	/* Initialize TCE kill register */
+	pnv_pci_ioda_setup_opal_tce_kill(phb);
+
 	/* Initialize more IODA stuff */
 	phb->ioda.total_pe_num = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index a8ba97f..ef5271a 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -121,6 +121,7 @@ struct pnv_phb {
 			unsigned int		total_pe_num;
 			unsigned int		reserved_pe_idx;
 			unsigned int		root_pe_idx;
+			bool			root_pe_populated;
 
 			/* 32-bit MMIO window */
 			unsigned int		m32_size;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (25 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-18  2:23   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
                   ` (23 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This adds a reference count of PE, representing the number of PCI
devices associated with the PE. The reference count is increased
or decreased when PCI devices join or leave the PE. Once it becomes
zero, the PE together with its used resources (IO, MMIO, DMA, PELTM,
PELTV) are released to support PCI hot unplug.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 245 ++++++++++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.h      |   1 +
 2 files changed, 218 insertions(+), 28 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0bb0056..dcffce5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -129,6 +129,215 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct iommu_table *tbl;
+	int start, count, i;
+	int64_t rc;
+
+	/* Search for the used DMA32 segments */
+	start = -1;
+	count = 0;
+	for (i = 0; i < phb->ioda.dma32_count; i++) {
+		if (phb->ioda.dma32_segmap[i] != pe->pe_number)
+			continue;
+
+		count++;
+		if (start < 0)
+			start = i;
+	}
+
+	if (!count)
+		return;
+
+	/* Unlink IOMMU table from group */
+	tbl = pe->table_group.tables[0];
+	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
+	if (pe->table_group.group) {
+		iommu_group_put(pe->table_group.group);
+		WARN_ON(pe->table_group.group);
+	}
+
+	/* Release IOMMU table */
+	pnv_pci_ioda2_table_free_pages(tbl);
+	iommu_free_table(tbl, of_node_full_name(pci_bus_to_OF_node(pe->pbus)));
+
+	/* Disable TVE */
+	for (i = start; i < start + count; i++) {
+		rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
+						i, 0, 0ul, 0ul, 0ul);
+		if (rc)
+			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
+				rc, i);
+
+		phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
+	}
+}
+
+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
+static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
+		int num);
+static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
+
+static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+	struct iommu_table *tbl;
+	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
+	int64_t rc;
+
+	if (!weight)
+		return;
+
+	tbl = pe->table_group.tables[0];
+	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
+	if (rc)
+		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
+
+	pnv_pci_ioda2_set_bypass(pe, false);
+	if (pe->table_group.group) {
+		iommu_group_put(pe->table_group.group);
+		WARN_ON(pe->table_group.group);
+	}
+
+	pnv_pci_ioda2_table_free_pages(tbl);
+	iommu_free_table(tbl, "pnv");
+}
+
+static void pnv_ioda_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		pnv_pci_ioda1_release_dma_pe(pe);
+		break;
+	case PNV_PHB_IODA2:
+		pnv_pci_ioda2_release_dma_pe(pe);
+		break;
+	default:
+		WARN_ON(1);
+	}
+}
+
+static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win)
+{
+	struct pnv_phb *phb = pe->phb;
+	int index, *segmap = NULL;
+	int64_t rc;
+
+	switch (win) {
+	case OPAL_IO_WINDOW_TYPE:
+		segmap = phb->ioda.io_segmap;
+		break;
+	case OPAL_M32_WINDOW_TYPE:
+		segmap = phb->ioda.m32_segmap;
+		break;
+	case OPAL_M64_WINDOW_TYPE:
+		if (phb->type != PNV_PHB_IODA1)
+			return;
+		segmap = phb->ioda.m64_segmap;
+		break;
+	default:
+		return;
+	}
+
+	for (index = 0; index < phb->ioda.total_pe_num; index++) {
+		if (segmap[index] != pe->pe_number)
+			continue;
+
+		if (win == OPAL_M64_WINDOW_TYPE)
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					phb->ioda.reserved_pe_idx, win,
+					index / PNV_IODA1_M64_SEGS,
+					index % PNV_IODA1_M64_SEGS);
+		else
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					phb->ioda.reserved_pe_idx, win,
+					0, index);
+
+		if (rc != OPAL_SUCCESS)
+			pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
+				rc, win, index);
+
+		segmap[index] = IODA_INVALID_PE;
+	}
+}
+
+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	int win;
+
+	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {
+		if (phb->type == PNV_PHB_IODA2 && win == OPAL_IO_WINDOW_TYPE)
+			continue;
+
+		pnv_ioda_release_window(pe, win);
+	}
+}
+
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb,
+				   struct pnv_ioda_pe *pe);
+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe);
+static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_ioda_pe *tmp, *slave;
+
+	/* Release slave PEs in compound PE */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
+			pnv_ioda_release_pe(slave);
+	}
+
+	/* Remove the PE from the list */
+	list_del(&pe->list);
+
+	/* Release resources */
+	pnv_ioda_release_dma_pe(pe);
+	pnv_ioda_release_pe_seg(pe);
+	pnv_ioda_deconfigure_pe(pe->phb, pe);
+
+	pnv_ioda_free_pe(pe);
+}
+
+static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
+{
+	if (!pe)
+		return NULL;
+
+	pe->device_count++;
+	return pe;
+}
+
+static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
+{
+	if (!pe)
+		return;
+
+	pe->device_count--;
+	WARN_ON(pe->device_count < 0);
+	if (pe->device_count == 0)
+		pnv_ioda_release_pe(pe);
+}
+
+static void pnv_pci_release_device(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pnv_ioda_pe *pe;
+
+	if (pdev->is_virtfn)
+		return;
+
+	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
+		return;
+
+	pe = &phb->ioda.pe_array[pdn->pe_number];
+	pnv_ioda_pe_put(pe);
+}
+
 static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
 {
 	phb->ioda.pe_array[pe_no].phb = phb;
@@ -724,7 +933,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 	return 0;
 }
 
-#ifdef CONFIG_PCI_IOV
 static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
@@ -759,9 +967,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	} else {
+#ifdef CONFIG_PCI_IOV
 		if (pe->flags & PNV_IODA_PE_VF)
 			parent = pe->parent_dev;
 		else
+#endif
 			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
@@ -799,11 +1009,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	pe->pbus = NULL;
 	pe->pdev = NULL;
+#ifdef CONFIG_PCI_IOV
 	pe->parent_dev = NULL;
+#endif
 
 	return 0;
 }
-#endif /* CONFIG_PCI_IOV */
 
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
@@ -985,6 +1196,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 			continue;
 
 		pdn->pe_number = pe->pe_number;
+		pnv_ioda_pe_get(pe);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
 	}
@@ -1047,9 +1259,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 			bus->busn_res.start, pe->pe_number);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		pnv_ioda_free_pe(pe);
 		pe->pbus = NULL;
+		pnv_ioda_release_pe(pe);
 		return NULL;
 	}
 
@@ -1199,29 +1410,6 @@ m64_failed:
 	return -EBUSY;
 }
 
-static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
-		int num);
-static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
-
-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
-{
-	struct iommu_table    *tbl;
-	int64_t               rc;
-
-	tbl = pe->table_group.tables[0];
-	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
-	if (rc)
-		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
-
-	pnv_pci_ioda2_set_bypass(pe, false);
-	if (pe->table_group.group) {
-		iommu_group_put(pe->table_group.group);
-		BUG_ON(pe->table_group.group);
-	}
-	pnv_pci_ioda2_table_free_pages(tbl);
-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
-}
-
 static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 {
 	struct pci_bus        *bus;
@@ -1242,7 +1430,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 		if (pe->parent_dev != pdev)
 			continue;
 
-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
+		pnv_pci_ioda2_release_dma_pe(pe);
 
 		/* Remove from list */
 		mutex_lock(&phb->ioda.pe_list_mutex);
@@ -3124,6 +3312,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 #endif
 	.enable_device_hook	= pnv_pci_enable_device_hook,
+	.release_device		= pnv_pci_release_device,
 	.window_alignment	= pnv_pci_window_alignment,
 	.setup_bridge		= pnv_pci_setup_bridge,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index ef5271a..3bb10de 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -30,6 +30,7 @@ struct pnv_phb;
 struct pnv_ioda_pe {
 	unsigned long		flags;
 	struct pnv_phb		*phb;
+	int			device_count;
 
 	/* A PE can be associated with a single device or an
 	 * entire bus (& children). In the former case, pdev
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add, remove}_pci_devices()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (26 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-18  2:43   ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
                   ` (22 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
with names of the weak functions in PCI subsystem, which have the
prefix "pcibios". No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  4 ++--
 arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
 arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
 drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
 drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
 drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
 6 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 0f2ff3a..c2360c8 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -260,10 +260,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
+extern void pci_remove_pci_devices(struct pci_bus *bus);
 
 /** Discover new pci devices under this bus, and add them */
-extern void pcibios_add_pci_devices(struct pci_bus *bus);
+extern void pci_add_pci_devices(struct pci_bus *bus);
 
 
 extern void isa_bridge_find_early(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 80dfe89..f884aa7 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -560,12 +560,12 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 * We don't remove the corresponding PE instances because
 	 * we need the information afterwords. The attached EEH
 	 * devices are expected to be attached soon when calling
-	 * into pcibios_add_pci_devices().
+	 * into pci_add_pci_devices().
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
 		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
+		pci_remove_pci_devices(bus);
 		pci_unlock_rescan_remove();
 	} else if (frozen_bus) {
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
@@ -617,13 +617,13 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * rebuilt when adding PCI devices.
 		 */
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+		pci_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+		pci_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -812,7 +812,7 @@ perm_error:
 		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
 		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
+		pci_remove_pci_devices(frozen_bus);
 		pci_unlock_rescan_remove();
 	}
 }
@@ -895,7 +895,7 @@ static void eeh_handle_special_event(void)
 				bus = eeh_pe_bus_get(phb_pe);
 				eeh_pe_dev_traverse(pe,
 					eeh_report_failure, NULL);
-				pcibios_remove_pci_devices(bus);
+				pci_remove_pci_devices(bus);
 			}
 			pci_unlock_rescan_remove();
 		}
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 7f9ed0c..3f62821 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
 }
 
 /**
- * pcibios_remove_pci_devices - remove all devices under this bus
+ * pci_remove_pci_devices - remove all devices under this bus
  * @bus: the indicated PCI bus
  *
  * Remove all of the PCI devices under this bus both from the
  * linux pci device tree, and from the powerpc EEH address cache.
  */
-void pcibios_remove_pci_devices(struct pci_bus *bus)
+void pci_remove_pci_devices(struct pci_bus *bus)
 {
 	struct pci_dev *dev, *tmp;
 	struct pci_bus *child_bus;
 
 	/* First go down child busses */
 	list_for_each_entry(child_bus, &bus->children, node)
-		pcibios_remove_pci_devices(child_bus);
+		pci_remove_pci_devices(child_bus);
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
@@ -60,11 +60,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 		pci_stop_and_remove_bus_device(dev);
 	}
 }
-
-EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
+EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
 
 /**
- * pcibios_add_pci_devices - adds new pci devices to bus
+ * pci_add_pci_devices - adds new pci devices to bus
  * @bus: the indicated PCI bus
  *
  * This routine will find and fixup new pci devices under
@@ -74,7 +73,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
  * is how this routine differs from other, similar pcibios
  * routines.)
  */
-void pcibios_add_pci_devices(struct pci_bus * bus)
+void pci_add_pci_devices(struct pci_bus *bus)
 {
 	int slotno, mode, pass, max;
 	struct pci_dev *dev;
@@ -114,4 +113,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	}
 	pcibios_finish_adding_to_bus(bus);
 }
-EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
+EXPORT_SYMBOL_GPL(pci_add_pci_devices);
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index e12bafd..ebd283b 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -381,7 +381,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 	}
 
 	/* Remove all devices below slot */
-	pcibios_remove_pci_devices(bus);
+	pci_remove_pci_devices(bus);
 
 	/* Unmap PCI IO space */
 	if (pcibios_unmap_io_space(bus)) {
diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index f2945fa..3034693 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -405,7 +405,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
 
 	if (state == PRESENT) {
 		pci_lock_rescan_remove();
-		pcibios_add_pci_devices(slot->bus);
+		pci_add_pci_devices(slot->bus);
 		pci_unlock_rescan_remove();
 		slot->state = CONFIGURED;
 	} else if (state == EMPTY) {
@@ -427,7 +427,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
 		return -EINVAL;
 
 	pci_lock_rescan_remove();
-	pcibios_remove_pci_devices(slot->bus);
+	pci_remove_pci_devices(slot->bus);
 	pci_unlock_rescan_remove();
 	vm_unmap_aliases();
 
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 9243f3e7..256066c 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
 		}
 
 		if (list_empty(&bus->devices))
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 
 		if (!list_empty(&bus->devices)) {
 			info->adapter_status = CONFIGURED;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (27 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-18  3:59   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 30/50] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
                   ` (21 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
avoid conflicts with those PCI subsystem weak function names, which
have prefix "pcibios". No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h      | 2 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c | 5 ++---
 drivers/pci/hotplug/rpadlpar_core.c        | 6 +++---
 drivers/pci/hotplug/rpaphp_pci.c           | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c2360c8..28385cb 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -257,7 +257,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 #endif
 
 /** Find the bus corresponding to the indicated device node */
-extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
+extern struct pci_bus *pci_find_bus_by_node(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
 extern void pci_remove_pci_devices(struct pci_bus *bus);
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 5d4a3df..aee22b4 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -54,8 +54,7 @@ find_bus_among_children(struct pci_bus *bus,
 	return child;
 }
 
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
 {
 	struct pci_dn *pdn = dn->data;
 
@@ -64,7 +63,7 @@ pcibios_find_pci_bus(struct device_node *dn)
 
 	return find_bus_among_children(pdn->phb->bus, dn);
 }
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
 
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index ebd283b..9aa392b 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -176,7 +176,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
 	struct pci_dev *dev;
 	struct pci_controller *phb;
 
-	if (pcibios_find_pci_bus(dn))
+	if (pci_find_bus_by_node(dn))
 		return -EINVAL;
 
 	/* Add pci bus */
@@ -213,7 +213,7 @@ static int dlpar_remove_phb(char *drc_name, struct device_node *dn)
 	struct pci_dn *pdn;
 	int rc = 0;
 
-	if (!pcibios_find_pci_bus(dn))
+	if (!pci_find_bus_by_node(dn))
 		return -EINVAL;
 
 	/* If pci slot is hotpluggable, use hotplug to remove it */
@@ -357,7 +357,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 
 	pci_lock_rescan_remove();
 
-	bus = pcibios_find_pci_bus(dn);
+	bus = pci_find_bus_by_node(dn);
 	if (!bus) {
 		ret = -EINVAL;
 		goto out;
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 256066c..e7dd573 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
 	if (rc)
 		return rc;
 
-	bus = pcibios_find_pci_bus(slot->dn);
+	bus = pci_find_bus_by_node(slot->dn);
 	if (!bus) {
 		err("%s: no pci_bus for dn %s\n", __func__, slot->dn->full_name);
 		return -EINVAL;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 30/50] powerpc/pci: Move pci_find_bus_by_node() around
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (28 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 31/50] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
                   ` (20 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This moves pci_find_bus_by_node() from arch/powerpc/platforms/
pseries/pci_dlpar.c to arch/powerpc/kernel/pci-hotplug.c so that
the function can be used by pSeries and PowerNV platform at the
same time. Also, below cleanup applied. No functional changes
introduced.
   * Remove variable "busdn" in find_bus_among_children()
   * Use PCI_DN() to convert device node to pci_dn
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/pci-hotplug.c          | 29 ++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pci_dlpar.c | 31 ------------------------------
 2 files changed, 29 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 3f62821..96e2cc3 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,35 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+static struct pci_bus *find_bus_among_children(struct pci_bus *bus,
+					       struct device_node *dn)
+{
+	struct pci_bus *child = NULL;
+	struct pci_bus *tmp;
+
+	if (pci_bus_to_OF_node(bus) == dn)
+		return bus;
+
+	list_for_each_entry(tmp, &bus->children, node) {
+		child = find_bus_among_children(tmp, dn);
+		if (child)
+			break;
+	}
+
+	return child;
+}
+
+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
+{
+	struct pci_dn *pdn = PCI_DN(dn);
+
+	if (!pdn  || !pdn->phb || !pdn->phb->bus)
+		return NULL;
+
+	return find_bus_among_children(pdn->phb->bus, dn);
+}
+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index aee22b4..906dbaa 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,37 +34,6 @@
 
 #include "pseries.h"
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-                        struct device_node *dn)
-{
-	struct pci_bus *child = NULL;
-	struct pci_bus *tmp;
-	struct device_node *busdn;
-
-	busdn = pci_bus_to_OF_node(bus);
-	if (busdn == dn)
-		return bus;
-
-	list_for_each_entry(tmp, &bus->children, node) {
-		child = find_bus_among_children(tmp, dn);
-		if (child)
-			break;
-	};
-	return child;
-}
-
-struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
-{
-	struct pci_dn *pdn = dn->data;
-
-	if (!pdn  || !pdn->phb || !pdn->phb->bus)
-		return NULL;
-
-	return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 31/50] powerpc/pci: Export pci_add_device_node_info()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (29 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 30/50] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 32/50] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
                   ` (19 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames update_dn_pci_info() to pci_add_device_node_info()
with corresponding adjustment on the parameter type and exports it.
The function is used to create pdn (struct pci_dn) for the indicated
device node. Another function add_pdn(), almost wrapper of
pci_add_device_node_info(), to be used in traverse_pci_devices(). No
logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h  |  3 ++-
 arch/powerpc/kernel/pci_dn.c           | 30 +++++++++++++++++++-----------
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 3 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 28385cb..7e0c67d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -235,7 +235,8 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
 extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
-extern void *update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
+					       struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..36ae515 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -265,13 +265,9 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 #endif /* CONFIG_PCI_IOV */
 }
 
-/*
- * Traverse_func that inits the PCI fields of the device node.
- * NOTE: this *must* be done before read/write config to the device.
- */
-void *update_dn_pci_info(struct device_node *dn, void *data)
+struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
+					struct device_node *dn)
 {
-	struct pci_controller *phb = data;
 	const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL);
 	const __be32 *regs;
 	struct device_node *parent;
@@ -282,7 +278,7 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 		return NULL;
 	dn->data = pdn;
 	pdn->node = dn;
-	pdn->phb = phb;
+	pdn->phb = hose;
 #ifdef CONFIG_PPC_POWERNV
 	pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -314,8 +310,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	if (pdn->parent)
 		list_add_tail(&pdn->list, &pdn->parent->child_list);
 
-	return NULL;
+	return pdn;
 }
+EXPORT_SYMBOL_GPL(pci_add_device_node_info);
 
 /*
  * Traverse a device tree stopping each PCI device in the tree.
@@ -415,6 +412,18 @@ void *traverse_pci_dn(struct pci_dn *root,
 	return NULL;
 }
 
+static void *add_pdn(struct device_node *dn, void *data)
+{
+	struct pci_controller *hose = data;
+	struct pci_dn *pdn;
+
+	pdn = pci_add_device_node_info(hose, dn);
+	if (!pdn)
+		return ERR_PTR(-ENOMEM);
+
+	return NULL;
+}
+
 /** 
  * pci_devs_phb_init_dynamic - setup pci devices under this PHB
  * phb: pci-to-host bridge (top-level bridge connecting to cpu)
@@ -429,8 +438,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	struct pci_dn *pdn;
 
 	/* PHB nodes themselves must not match */
-	update_dn_pci_info(dn, phb);
-	pdn = dn->data;
+	pdn = pci_add_device_node_info(phb, dn);
 	if (pdn) {
 		pdn->devfn = pdn->busno = -1;
 		pdn->vendor_id = pdn->device_id = pdn->class_code = 0;
@@ -439,7 +447,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, update_dn_pci_info, phb);
+	traverse_pci_devices(dn, add_pdn, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 9e524c2..6c274cb 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -264,7 +264,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 		pdn = parent ? PCI_DN(parent) : NULL;
 		if (pdn) {
 			/* Create pdn and EEH device */
-			update_dn_pci_info(np, pdn->phb);
+			pci_add_device_node_info(pdn->phb, np);
 			eeh_dev_init(PCI_DN(np), pdn->phb);
 		}
 
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 32/50] powerpc/pci: Introduce pci_remove_device_node_info()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (30 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 31/50] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
                   ` (18 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This implements and exports pci_remove_device_node_info(). It's
used to remove the pdn (struct pci_dn) for the indicated device
node. The function is going to be used by PowerNV PCI hotplug
driver.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  1 +
 arch/powerpc/kernel/pci_dn.c          | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 7e0c67d..b8ce4f4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -237,6 +237,7 @@ extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
 extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 					       struct device_node *dn);
+extern void pci_remove_device_node_info(struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 36ae515..7f877a4 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -314,6 +314,29 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 }
 EXPORT_SYMBOL_GPL(pci_add_device_node_info);
 
+void pci_remove_device_node_info(struct device_node *dn)
+{
+	struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+
+	if (edev)
+		edev->pdn = NULL;
+#endif
+
+	if (!pdn)
+		return;
+
+	WARN_ON(!list_empty(&pdn->child_list));
+	list_del(&pdn->list);
+	if (pdn->parent)
+		of_node_put(pdn->parent->node);
+
+	dn->data = NULL;
+	kfree(pdn);
+}
+EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
+
 /*
  * Traverse a device tree stopping each PCI device in the tree.
  * This is done depth first.  As each node is processed, a "pre"
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (31 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 32/50] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-18  3:14   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 34/50] powerpc/pci: Delay populating pdn Gavin Shan
                   ` (17 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames traverse_pci_devices() to pci_traverse_device_nodes().
The function traverses all subordinate device nodes of the specified
one. Also, below cleanup applied to the function. No logical changes
introduced.
   * Rename "pre" to "fn".
   * Avoid assignment in if condition reported from checkpatch.pl.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
 arch/powerpc/kernel/pci_dn.c         | 14 +++++++++-----
 arch/powerpc/platforms/pseries/msi.c |  4 ++--
 3 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index ca0c5bf..8753e4e 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
 struct device_node;
 struct pci_dn;
 
-typedef void *(*traverse_func)(struct device_node *me, void *data);
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data);
+void *pci_traverse_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *),
+				void *data);
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 7f877a4..aa4110f 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -355,8 +355,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
  * one of these nodes we also assume its siblings are non-pci for
  * performance.
  */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data)
+void *pci_traverse_device_nodes(struct device_node *start,
+				void *(*fn)(struct device_node *, void *),
+				void *data)
 {
 	struct device_node *dn, *nextdn;
 	void *ret;
@@ -371,8 +372,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 		if (classp)
 			class = of_read_number(classp, 1);
 
-		if (pre && ((ret = pre(dn, data)) != NULL))
-			return ret;
+		if (fn) {
+			ret = fn(dn, data);
+			if (ret)
+				return ret;
+		}
 
 		/* If we are a PCI bridge, go down */
 		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
@@ -470,7 +474,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, add_pdn, phb);
+	pci_traverse_device_nodes(dn, add_pdn, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 272e9ec..543a638 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	memset(&counts, 0, sizeof(struct msi_counts));
 
 	/* Work out how many devices we have below this PE */
-	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
+	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
 
 	if (counts.num_devices == 0) {
 		pr_err("rtas_msi: found 0 devices under PE for %s\n",
@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
 	/* else, we have some more calculating to do */
 	counts.requestor = pci_device_to_OF_node(dev);
 	counts.request = request;
-	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
+	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
 
 	/* If the quota isn't an integer multiple of the total, we can
 	 * use the remainder as spare MSIs for anyone that wants them. */
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 34/50] powerpc/pci: Delay populating pdn
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (32 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-18  4:24   ` Alexey Kardashevskiy
  2015-11-04 13:12 ` [PATCH v7 35/50] powerpc/pci: Don't scan empty slot Gavin Shan
                   ` (16 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
The pdn (struct pci_dn) instances are allocated from memblock or
bootmem when creating PCI controller (hoses) in setup_arch(). PCI
hotplug, which will be supported by proceeding patches, release
PCI device nodes and their corresponding pdn on unplugging event.
The memory chunks for pdn instances allocated from memblock or
bootmem are hard to reused after being released.
This delays creating pdn in core_initcall_sync(eeh_dev_phb_init) so
that they are allocated from slab. In turn, the memory chunks for
them can be reused after being released without problem. Since the
pdn and eeh_dev has same life cycle, the eeh_dev is created when
pdn is populated. We needn't create eeh_dev with another initcall.
The time to create PHB PEs is delayed a bit from core_initcall() to
core_initcall_sync().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h         |  2 +-
 arch/powerpc/include/asm/ppc-pci.h     |  2 --
 arch/powerpc/kernel/eeh_dev.c          | 19 ++++-------------
 arch/powerpc/kernel/pci_dn.c           | 20 ++++++++++++++++--
 arch/powerpc/platforms/maple/pci.c     | 34 ++++++++++++++++++------------
 arch/powerpc/platforms/pasemi/pci.c    |  3 ---
 arch/powerpc/platforms/powermac/pci.c  | 38 +++++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci.c   |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  6 +-----
 9 files changed, 69 insertions(+), 58 deletions(-)
diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c5eb86f..27352f4 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -268,7 +268,7 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-void *eeh_dev_init(struct pci_dn *pdn, void *data);
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 8753e4e..0f73de0 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -39,8 +39,6 @@ void *pci_traverse_device_nodes(struct device_node *start,
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
-
-extern void pci_devs_phb_init(void);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index aabba94..1c4bc35 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -44,14 +44,13 @@
 /**
  * eeh_dev_init - Create EEH device according to OF node
  * @pdn: PCI device node
- * @data: PHB
  *
  * It will create EEH device according to the given OF node. The function
  * might be called by PCI emunation, DR, PHB hotplug.
  */
-void *eeh_dev_init(struct pci_dn *pdn, void *data)
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
 {
-	struct pci_controller *phb = data;
+	struct pci_controller *phb = pdn->phb;
 	struct eeh_dev *edev;
 
 	/* Allocate EEH device */
@@ -68,7 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
 	edev->phb = phb;
 	INIT_LIST_HEAD(&edev->list);
 
-	return NULL;
+	return edev;
 }
 
 /**
@@ -80,16 +79,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
  */
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
 {
-	struct pci_dn *root = phb->pci_data;
-
 	/* EEH PE for PHB */
 	eeh_phb_pe_create(phb);
-
-	/* EEH device for PHB */
-	eeh_dev_init(root, phb);
-
-	/* EEH devices for children OF nodes */
-	traverse_pci_dn(root, eeh_dev_init, phb);
 }
 
 /**
@@ -105,9 +96,7 @@ static int __init eeh_dev_phb_init(void)
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		eeh_dev_phb_init_dynamic(phb);
 
-	pr_info("EEH: devices created\n");
-
 	return 0;
 }
 
-core_initcall(eeh_dev_phb_init);
+core_initcall_sync(eeh_dev_phb_init);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index aa4110f..581612c 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -272,8 +272,11 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 	const __be32 *regs;
 	struct device_node *parent;
 	struct pci_dn *pdn;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev;
+#endif
 
-	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
 		return NULL;
 	dn->data = pdn;
@@ -302,6 +305,15 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
 	/* Extended config space */
 	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
 
+	/* Create EEH device */
+#ifdef CONFIG_EEH
+	edev = eeh_dev_init(pdn);
+	if (!edev) {
+		kfree(pdn);
+		return NULL;
+	}
+#endif
+
 	/* Attach to parent node */
 	INIT_LIST_HEAD(&pdn->child_list);
 	INIT_LIST_HEAD(&pdn->list);
@@ -486,15 +498,19 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
  * pci device found underneath.  This routine runs once,
  * early in the boot sequence.
  */
-void __init pci_devs_phb_init(void)
+static int __init pci_devs_phb_init(void)
 {
 	struct pci_controller *phb, *tmp;
 
 	/* This must be done first so the device nodes have valid pci info! */
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		pci_devs_phb_init_dynamic(phb);
+
+	return 0;
 }
 
+core_initcall(pci_devs_phb_init);
+
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
 {
 	struct pci_dn *pdn;
diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
index a923230..a2f89e6 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
 	DBG(" <- maple_pci_irq_fixup\n");
 }
 
+static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions hopefully.
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+
 void __init maple_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -605,19 +625,7 @@ void __init maple_pci_init(void)
 	if (ht && maple_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */ 
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions hopefully.
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
+	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
 
 	/* Tell pci.c to not change any resource allocations.  */
 	pci_add_flags(PCI_PROBE_ONLY);
diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
index f3a68a0..10c4e8f 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -229,9 +229,6 @@ void __init pas_pci_init(void)
 			of_node_get(np);
 
 	of_node_put(root);
-
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
 }
 
 void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
index 59ab16f..6e06c3b 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
 #endif /* CONFIG_PPC32 */
 }
 
+#ifdef CONFIG_PPC64
+static int pmac_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions for now. We should do something better in the
+	 * future though
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+#endif /* CONFIG_PPC64 */
+
 void __init pmac_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -914,20 +937,7 @@ void __init pmac_pci_init(void)
 	if (ht && pmac_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions for now. We should do something better in the
-	 * future though
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
+	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
 	/* pmac_check_ht_link(); */
 
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index fa99daf..d8832ea 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -807,9 +807,6 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
 		pnv_pci_init_ioda2_phb(np);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
 	/* Configure IOMMU DMA hooks */
 	set_pci_dma_ops(&dma_iommu_ops);
 }
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 6c274cb..bdf93a1 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -262,11 +262,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	case OF_RECONFIG_ATTACH_NODE:
 		parent = of_get_parent(np);
 		pdn = parent ? PCI_DN(parent) : NULL;
-		if (pdn) {
-			/* Create pdn and EEH device */
+		if (pdn)
 			pci_add_device_node_info(pdn->phb, np);
-			eeh_dev_init(PCI_DN(np), pdn->phb);
-		}
 
 		of_node_put(parent);
 		break;
@@ -489,7 +486,6 @@ static void __init find_and_init_phbs(void)
 	}
 
 	of_node_put(root);
-	pci_devs_phb_init();
 
 	/*
 	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 35/50] powerpc/pci: Don't scan empty slot
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (33 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 34/50] powerpc/pci: Delay populating pdn Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 36/50] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
                   ` (15 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
In hotplug case, function pcibios_add_pci_devices() is called to
rescan the specified PCI bus, which might not have any child devices.
Access to the PCI bus's child device node will cause kernel crash
without exception.
This adds condition to skip scanning PCI bus that doesn't have any
subordinate devices, in order to avoid kernel crash.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 96e2cc3..825b39c 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -120,7 +120,8 @@ void pci_add_pci_devices(struct pci_bus *bus)
 	if (mode == PCI_PROBE_DEVTREE) {
 		/* use ofdt-based probe */
 		of_rescan_bus(dn, bus);
-	} else if (mode == PCI_PROBE_NORMAL) {
+	} else if (mode == PCI_PROBE_NORMAL &&
+		   dn->child && PCI_DN(dn->child)) {
 		/*
 		 * Use legacy probe. In the partial hotplug case, we
 		 * probably have grandchildren devices unplugged. So
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 36/50] powerpc/pci: Update bridge windows on PCI plug
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (34 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 35/50] powerpc/pci: Don't scan empty slot Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
                   ` (14 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
On the PCI plugging event, PCI slot's subordinate devices are
scanned and their (IO and MMIO) resources are assigned. Platform
dependent resources (PE#, IO/MMIO/DMA windows) are allocated or
created on updating windows of the slot's upstream bridge.
This updates the windows of the hot plugged slot's upstream bridge
in pcibios_finish_adding_to_bus() so that the platform resources
(PE#, IO/MMIO/DMA segments) are allocated or created accordingly.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 40df3a5..be9e515 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1444,8 +1444,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
 	/* Allocate bus and devices resources */
 	pcibios_allocate_bus_resources(bus);
 	pcibios_claim_one_bus(bus);
-	if (!pci_has_flag(PCI_PROBE_ONLY))
-		pci_assign_unassigned_bus_resources(bus);
+	if (!pci_has_flag(PCI_PROBE_ONLY)) {
+		if (bus->self)
+			pci_assign_unassigned_bridge_resources(bus->self);
+		else
+			pci_assign_unassigned_bus_resources(bus);
+	}
 
 	/* Fixup EEH */
 	eeh_add_device_tree_late(bus);
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (35 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 36/50] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-12  5:11   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
                   ` (13 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This drops unnecessary nested if statements in pnv_eeh_reset() to
improve the code readability. After the changes, the unused local
variable "ret" is dropped as well. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 61 ++++++++++++----------------
 1 file changed, 27 insertions(+), 34 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 861a7d2..a7d84a4 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -907,8 +907,9 @@ void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 {
 	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb;
 	struct pci_bus *bus;
-	int ret;
+	int64_t rc;
 
 	/*
 	 * For PHB reset, we always have complete reset. For those PEs whose
@@ -924,43 +925,35 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 	 * reset. The side effect is that EEH core has to clear the frozen
 	 * state explicitly after BAR restore.
 	 */
-	if (pe->type & EEH_PE_PHB) {
-		ret = pnv_eeh_phb_reset(hose, option);
-	} else {
-		struct pnv_phb *phb;
-		s64 rc;
+	if (pe->type & EEH_PE_PHB)
+		return pnv_eeh_phb_reset(hose, option);
 
-		/*
-		 * The frozen PE might be caused by PAPR error injection
-		 * registers, which are expected to be cleared after hitting
-		 * frozen PE as stated in the hardware spec. Unfortunately,
-		 * that's not true on P7IOC. So we have to clear it manually
-		 * to avoid recursive EEH errors during recovery.
-		 */
-		phb = hose->private_data;
-		if (phb->model == PNV_PHB_MODEL_P7IOC &&
-		    (option == EEH_RESET_HOT ||
-		    option == EEH_RESET_FUNDAMENTAL)) {
-			rc = opal_pci_reset(phb->opal_id,
-					    OPAL_RESET_PHB_ERROR,
-					    OPAL_ASSERT_RESET);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld clearing "
-					"error injection registers\n",
-					__func__, rc);
-				return -EIO;
-			}
+	/*
+	 * The frozen PE might be caused by PAPR error injection
+	 * registers, which are expected to be cleared after hitting
+	 * frozen PE as stated in the hardware spec. Unfortunately,
+	 * that's not true on P7IOC. So we have to clear it manually
+	 * to avoid recursive EEH errors during recovery.
+	 */
+	phb = hose->private_data;
+	if (phb->model == PNV_PHB_MODEL_P7IOC &&
+	    (option == EEH_RESET_HOT || option == EEH_RESET_FUNDAMENTAL)) {
+		rc = opal_pci_reset(phb->opal_id,
+				    OPAL_RESET_PHB_ERROR,
+				    OPAL_ASSERT_RESET);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld clearing error injection\n",
+				__func__, rc);
+			return -EIO;
 		}
-
-		bus = eeh_pe_bus_get(pe);
-		if (pci_is_root_bus(bus) ||
-			pci_is_root_bus(bus->parent))
-			ret = pnv_eeh_root_reset(hose, option);
-		else
-			ret = pnv_eeh_bridge_reset(bus->self, option);
 	}
 
-	return ret;
+	bus = eeh_pe_bus_get(pe);
+	if (pci_is_root_bus(bus) ||
+	    pci_is_root_bus(bus->parent))
+		return pnv_eeh_root_reset(hose, option);
+
+	return pnv_eeh_bridge_reset(bus->self, option);
 }
 
 /**
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (36 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-12 22:59   ` Daniel Axtens
  2015-11-04 13:12 ` [PATCH v7 39/50] powerpc/powernv: Fundamental reset " Gavin Shan
                   ` (12 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
When pnv_pci_reset_secondary_bus() is called to issue reset on
the indicated secondary bus, the bus can't be root bus. So we
needn't consider root bus in the function.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index a7d84a4..c69b6a1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -880,16 +880,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-	struct pci_controller *hose;
-
-	if (pci_is_root_bus(dev->bus)) {
-		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
-	} else {
-		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
-	}
+	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
 /**
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 39/50] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (37 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-12  6:15   ` Gavin Shan
                     ` (2 more replies)
  2015-11-04 13:12 ` [PATCH v7 40/50] powerpc/powernv: Support PCI slot ID Gavin Shan
                   ` (11 subsequent siblings)
  50 siblings, 3 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
In pnv_pci_reset_secondary_bus(), we should issue fundamental
reset if any one subordinate device of the specified is requesting
that. Otherwise, the device might not come up after the reset.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c69b6a1..ab8b93e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -878,9 +878,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
+{
+	int *freset = data;
+
+	/*
+	 * Stop the iteration immediately if there has any one
+	 * PCI device requesting fundamental reset.
+	 */
+	*freset |= pdev->needs_freset;
+	return *freset;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+	int option, freset = 0;
+
+	if (dev->subordinate)
+		pci_walk_bus(dev->subordinate,
+			     pnv_pci_dev_reset_type, &freset);
+
+	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
+	pnv_eeh_bridge_reset(dev, option);
 	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 40/50] powerpc/powernv: Support PCI slot ID
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (38 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 39/50] powerpc/powernv: Fundamental reset " Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 41/50] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
                   ` (10 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
PowerNV platforms runs on top of skiboot firmware that includes
changes to support PCI slots. PCI slots are identified by PHB's
ID or the combo of that and PCI slot ID.
This changes the EEH PowerNV backend to support PCI slots:
   * Rename arguments of opal_pci_reset() and opal_pci_poll().
   * One more argument (PCI slot's state) added to opal_pci_poll().
   * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
     function pnv_pci_poll() that will be used by PowerNV hotplug
     backends.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h              |  4 +--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++++++----------------------
 arch/powerpc/platforms/powernv/pci.c         | 21 ++++++++++++++
 arch/powerpc/platforms/powernv/pci.h         |  1 +
 4 files changed, 32 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 8001159..11ee20e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -130,7 +130,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
@@ -147,7 +147,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 			    __be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *state);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ab8b93e..e533535 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -729,28 +729,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
-{
-	s64 rc = OPAL_HARDWARE;
-
-	while (1) {
-		rc = opal_pci_poll(phb->opal_id);
-		if (rc <= 0)
-			break;
-
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * rc);
-		else
-			msleep(rc);
-	}
-
-	return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc = OPAL_HARDWARE;
+	int ret;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
@@ -765,8 +748,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 		rc = opal_pci_reset(phb->opal_id,
 				    OPAL_RESET_PHB_COMPLETE,
 				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
 
 	/*
 	 * Poll state of the PHB until the request is done
@@ -774,24 +755,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 	 * reset followed by hot reset on root bus. So we also
 	 * need the PCI bus settlement delay.
 	 */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE) {
+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+	if (option == EEH_RESET_DEACTIVATE && !ret) {
 		if (system_state < SYSTEM_RUNNING)
 			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
 		else
 			msleep(EEH_PE_RST_SETTLE_TIME);
 	}
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
+	return ret;
 }
 
 static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc = OPAL_HARDWARE;
+	int ret;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
@@ -813,18 +792,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 		rc = opal_pci_reset(phb->opal_id,
 				    OPAL_RESET_PCI_HOT,
 				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
 
 	/* Poll state of the PHB until the request is done */
-	rc = pnv_eeh_phb_poll(phb);
-	if (option == EEH_RESET_DEACTIVATE)
+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+	if (option == EEH_RESET_DEACTIVATE && !ret)
 		msleep(EEH_PE_RST_SETTLE_TIME);
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
+	return ret;
 }
 
 static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index d8832ea..ae0f0c1 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -44,6 +44,27 @@
 #define cfg_dbg(fmt...)	do { } while(0)
 //#define cfg_dbg(fmt...)	printk(fmt)
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
+{
+	while (rval > 0) {
+		if (system_state < SYSTEM_RUNNING)
+			udelay(1000 * rval);
+		else
+			msleep(rval);
+
+		rval = opal_pci_poll(id, state);
+	}
+
+	/*
+	 * The caller expects to retrieve additional
+	 * information if the last argument isn't NULL.
+	 */
+	if (rval == OPAL_SUCCESS && state)
+		rval = opal_pci_poll(id, state);
+
+	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
+}
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3bb10de..30b0bfc 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -195,6 +195,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
 		unsigned long *hpa, enum dma_data_direction *direction);
 extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state);
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
 				unsigned char *log_buff);
 int pnv_pci_cfg_read(struct pci_dn *pdn,
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 41/50] powerpc/powernv: Use firmware PCI slot reset infrastructure
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (39 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 40/50] powerpc/powernv: Support PCI slot ID Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 42/50] powerpc/powernv: Functions to get/set PCI slot status Gavin Shan
                   ` (9 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
The skiboot firmware might provide the PCI slot reset capability
which is identified by property "ibm,reset-by-firmware" on the
PCI slot associated device node.
This checks the property. If it exists, the reset request is routed
to firmware. Otherwise, the reset is done by kernel as before.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e533535..086d153 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -801,7 +801,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 	return ret;
 }
 
-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 {
 	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -852,6 +852,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
+	uint64_t id = (0x1ul << 60);
+	uint8_t scope;
+	int64_t rc;
+
+	/*
+	 * If the firmware can't handle it, we will issue hot reset
+	 * on the secondary bus despite the requested reset type.
+	 */
+	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+		return __pnv_eeh_bridge_reset(pdev, option);
+
+	/* The firmware can handle the request */
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	default:
+		dev_warn(&pdev->dev, "%s: Unsupported reset %d\n",
+			 __func__, option);
+		return -EINVAL;
+	}
+
+	hose = pci_bus_to_host(pdev->bus);
+	phb = hose->private_data;
+	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
+	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+	return pnv_pci_poll(id, rc, NULL);
+}
+
 static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
 {
 	int *freset = data;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 42/50] powerpc/powernv: Functions to get/set PCI slot status
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (40 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 41/50] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 43/50] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
                   ` (8 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This exports 4 functins, which base on the corresponding OPAL
APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:
   pnv_pci_get_device_tree()    opal_get_device_tree()
   pnv_pci_get_presence_state() opal_pci_get_presence_state()
   pnv_pci_get_power_state()    opal_pci_get_power_state()
   pnv_pci_set_power_state()    opal_pci_set_power_state()
Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from
skiboot firmware in PowerNV PCI hotplug driver.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h            | 17 ++++++-
 arch/powerpc/include/asm/opal.h                |  4 ++
 arch/powerpc/include/asm/pnv-pci.h             |  7 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
 arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 8374afe..fe3e458 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -157,7 +157,11 @@
 #define OPAL_LEDS_GET_INDICATOR			114
 #define OPAL_LEDS_SET_INDICATOR			115
 #define OPAL_CEC_REBOOT2			116
-#define OPAL_LAST				116
+#define OPAL_GET_DEVICE_TREE			117
+#define OPAL_PCI_GET_PRESENCE_STATE		118
+#define OPAL_PCI_GET_POWER_STATE		119
+#define OPAL_PCI_SET_POWER_STATE		120
+#define OPAL_LAST				120
 
 /* Device tree flags */
 
@@ -343,6 +347,16 @@ enum OpalPciResetState {
 	OPAL_ASSERT_RESET   = 1
 };
 
+enum OpalPciSlotPresentenceState {
+	OPAL_PCI_SLOT_EMPTY	= 0,
+	OPAL_PCI_SLOT_PRESENT	= 1
+};
+
+enum OpalPciSlotPowerState {
+	OPAL_PCI_SLOT_POWER_OFF	= 0,
+	OPAL_PCI_SLOT_POWER_ON	= 1
+};
+
 enum OpalSlotLedType {
 	OPAL_SLOT_LED_TYPE_ID = 0,	/* IDENTIFY LED */
 	OPAL_SLOT_LED_TYPE_FAULT = 1,	/* FAULT LED */
@@ -377,6 +391,7 @@ enum opal_msg_type {
 	OPAL_MSG_DPO,
 	OPAL_MSG_PRD,
 	OPAL_MSG_OCC,
+	OPAL_MSG_PCI_HOTPLUG,
 	OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 11ee20e..47f200e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -208,6 +208,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
 		uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 		uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..d9d095b 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
 #include <linux/pci.h>
 #include <misc/cxl-base.h>
 
+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 			   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index b7a464f..55f1fd4 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -301,3 +301,7 @@ OPAL_CALL(opal_flash_erase,			OPAL_FLASH_ERASE);
 OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
 OPAL_CALL(opal_leds_get_ind,			OPAL_LEDS_GET_INDICATOR);
 OPAL_CALL(opal_leds_set_ind,			OPAL_LEDS_SET_INDICATOR);
+OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_state,		OPAL_PCI_GET_PRESENCE_STATE);
+OPAL_CALL(opal_pci_get_power_state,		OPAL_PCI_GET_POWER_STATE);
+OPAL_CALL(opal_pci_set_power_state,		OPAL_PCI_SET_POWER_STATE);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index ae0f0c1..71c648e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -65,6 +65,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *state)
 	return (rval == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
+int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
+		return -ENXIO;
+
+	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
+
+int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_get_presence_state(id, state);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_state);
+
+int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_get_power_state(id, state);
+	return pnv_pci_poll(id, rc, state);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_state);
+
+int pnv_pci_set_power_state(uint64_t id, uint8_t state)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATE))
+		return -ENXIO;
+
+	rc = opal_pci_set_power_state(id, state);
+	return pnv_pci_poll(id, rc, NULL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
+
+int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
+{
+	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
+
+int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
+{
+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 43/50] powerpc/powernv: Select OF_DYNAMIC
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (41 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 42/50] powerpc/powernv: Functions to get/set PCI slot status Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 44/50] drivers/of: Split unflatten_dt_node() Gavin Shan
                   ` (7 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
The device tree will change dynamically in PowerNV PCI hotplug
driver. This enables CONFIG_OF_DYNAMIC to support that.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 604190c..e7b1ad7 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,6 +18,7 @@ config PPC_POWERNV
 	select CPU_FREQ_GOV_ONDEMAND
 	select CPU_FREQ_GOV_CONSERVATIVE
 	select PPC_DOORBELL
+	select OF_DYNAMIC
 	default y
 
 config OPAL_PRD
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 44/50] drivers/of: Split unflatten_dt_node()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (42 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 43/50] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 18:43   ` Rob Herring
  2015-11-04 13:12 ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
                   ` (6 subsequent siblings)
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
The function unflatten_dt_node() is called recursively to unflatten
device nodes and properties in the FDT blob. It looks complicated
and hard to be understood.
This splits the function into 3 functions: populate_properties(),
populate_node() and unflatten_dt_node(). populate_properties(),
which is called by populate_node(), creates properties for the
indicated device node. The later one creates the device nodes
from FDT blob. populate_node() gets the offset in FDT blob for
next device nodes and then calls populate_node(). No logical
changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 275 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 160 insertions(+), 115 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 6e82bc42..173b036 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -160,39 +160,127 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
 	return res;
 }
 
-/**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
- * @blob: The parent device tree blob
- * @mem: Memory chunk to use for allocating device nodes and properties
- * @poffset: pointer to node in flat tree
- * @dad: Parent struct device_node
- * @nodepp: The device_node tree created by the call
- * @fpsize: Size of the node path up at the current depth.
- * @dryrun: If true, do not allocate device nodes but still calculate needed
- * memory size
- */
-static void * unflatten_dt_node(const void *blob,
-				void *mem,
-				int *poffset,
-				struct device_node *dad,
-				struct device_node **nodepp,
-				unsigned long fpsize,
+static void populate_properties(const void *blob,
+				int offset,
+				void **mem,
+				struct device_node *np,
+				const char *nodename,
 				bool dryrun)
 {
-	const __be32 *p;
+	struct property *pp, **pprev = NULL;
+	int cur;
+	bool has_name = false;
+
+	pprev = &np->properties;
+	cur = fdt_first_property_offset(blob, offset);
+	while (cur >= 0) {
+		const __be32 *val;
+		const char *pname;
+		u32 sz;
+
+		val = fdt_getprop_by_offset(blob, cur, &pname, &sz);
+		if (!val) {
+			pr_warn("%s: Cannot locate property at 0x%x\n",
+				__func__, cur);
+			goto next;
+		}
+
+		if (!pname) {
+			pr_warn("%s: Cannot find property name at 0x%x\n",
+				__func__, cur);
+			goto next;
+		} else if (!strcmp(pname, "name")) {
+			has_name = true;
+		}
+
+		pp = unflatten_dt_alloc(mem, sizeof(struct property),
+					__alignof__(struct property));
+		if (!dryrun) {
+			/* We accept flattened tree phandles either in
+			 * ePAPR-style "phandle" properties, or the
+			 * legacy "linux,phandle" properties.  If both
+			 * appear and have different values, things
+			 * will get weird. Don't do that.
+			 */
+			if (!strcmp(pname, "phandle") ||
+			    !strcmp(pname, "linux,phandle")) {
+				if (!np->phandle)
+					np->phandle = be32_to_cpup(val);
+			}
+
+			/* And we process the "ibm,phandle" property
+			 * used in pSeries dynamic device tree
+			 * stuff
+			 */
+			if (!strcmp(pname, "ibm,phandle"))
+				np->phandle = be32_to_cpup(val);
+
+			pp->name   = (char *)pname;
+			pp->length = sz;
+			pp->value  = (__be32 *)val;
+			*pprev     = pp;
+			pprev      = &pp->next;
+		}
+next:
+		cur = fdt_next_property_offset(blob, cur);
+	}
+
+	/* With version 0x10 we may not have the name property,
+	 * recreate it here from the unit name if absent
+	 */
+	if (!has_name) {
+		const char *p = nodename, *ps = p, *pa = NULL;
+		int len;
+
+		while (*p) {
+			if ((*p) == '@')
+				pa = p;
+			else if ((*p) == '/')
+				ps = p + 1;
+			p++;
+		}
+
+		if (pa < ps)
+			pa = p;
+		len = (pa - ps) + 1;
+		pp = unflatten_dt_alloc(mem, sizeof(struct property) + len,
+					__alignof__(struct property));
+		if (!dryrun) {
+			pp->name   = "name";
+			pp->length = len;
+			pp->value  = pp + 1;
+			*pprev     = pp;
+			pprev      = &pp->next;
+			memcpy(pp->value, ps, len - 1);
+			((char *)pp->value)[len - 1] = 0;
+			pr_debug("fixed up name for %s -> %s\n",
+				 nodename, (char *)pp->value);
+		}
+	}
+
+	if (!dryrun)
+		*pprev = NULL;
+}
+
+static unsigned long populate_node(const void *blob,
+				   int offset,
+				   void **mem,
+				   struct device_node *dad,
+				   unsigned long fpsize,
+				   struct device_node **pnp,
+				   bool dryrun)
+{
 	struct device_node *np;
-	struct property *pp, **prev_pp = NULL;
 	const char *pathp;
 	unsigned int l, allocl;
-	static int depth = 0;
-	int old_depth;
-	int offset;
-	int has_name = 0;
-	int new_format = 0;
+	bool new_format = false;
+	char *fname;
 
-	pathp = fdt_get_name(blob, *poffset, &l);
-	if (!pathp)
-		return mem;
+	pathp = fdt_get_name(blob, offset, &l);
+	if (!pathp) {
+		*pnp = NULL;
+		return 0;
+	}
 
 	allocl = ++l;
 
@@ -202,7 +290,7 @@ static void * unflatten_dt_node(const void *blob,
 	 * not '/'.
 	 */
 	if ((*pathp) != '/') {
-		new_format = 1;
+		new_format = true;
 		if (fpsize == 0) {
 			/* root node: special case. fpsize accounts for path
 			 * plus terminating zero. root node only has '/', so
@@ -222,112 +310,38 @@ static void * unflatten_dt_node(const void *blob,
 		}
 	}
 
-	np = unflatten_dt_alloc(&mem, sizeof(struct device_node) + allocl,
+	np = unflatten_dt_alloc(mem, sizeof(struct device_node) + allocl,
 				__alignof__(struct device_node));
 	if (!dryrun) {
-		char *fn;
 		of_node_init(np);
-		np->full_name = fn = ((char *)np) + sizeof(*np);
+		np->full_name = fname = ((char *)np) + sizeof(*np);
 		if (new_format) {
-			/* rebuild full path for new format */
+			/* Rebuild full path for new format */
 			if (dad && dad->parent) {
-				strcpy(fn, dad->full_name);
+				strcpy(fname, dad->full_name);
 #ifdef DEBUG
-				if ((strlen(fn) + l + 1) != allocl) {
+				if ((strlen(fname) + l + 1) != allocl) {
 					pr_debug("%s: p: %d, l: %d, a: %d\n",
-						pathp, (int)strlen(fn),
-						l, allocl);
+						 pathp, (int)strlen(fn),
+						 l, allocl);
 				}
 #endif
-				fn += strlen(fn);
+				fname += strlen(fname);
 			}
-			*(fn++) = '/';
+			*(fname++) = '/';
 		}
-		memcpy(fn, pathp, l);
+		memcpy(fname, pathp, l);
 
-		prev_pp = &np->properties;
-		if (dad != NULL) {
+		if (dad) {
 			np->parent = dad;
 			np->sibling = dad->child;
 			dad->child = np;
 		}
 	}
-	/* process properties */
-	for (offset = fdt_first_property_offset(blob, *poffset);
-	     (offset >= 0);
-	     (offset = fdt_next_property_offset(blob, offset))) {
-		const char *pname;
-		u32 sz;
 
-		if (!(p = fdt_getprop_by_offset(blob, offset, &pname, &sz))) {
-			offset = -FDT_ERR_INTERNAL;
-			break;
-		}
-
-		if (pname == NULL) {
-			pr_info("Can't find property name in list !\n");
-			break;
-		}
-		if (strcmp(pname, "name") == 0)
-			has_name = 1;
-		pp = unflatten_dt_alloc(&mem, sizeof(struct property),
-					__alignof__(struct property));
-		if (!dryrun) {
-			/* We accept flattened tree phandles either in
-			 * ePAPR-style "phandle" properties, or the
-			 * legacy "linux,phandle" properties.  If both
-			 * appear and have different values, things
-			 * will get weird.  Don't do that. */
-			if ((strcmp(pname, "phandle") == 0) ||
-			    (strcmp(pname, "linux,phandle") == 0)) {
-				if (np->phandle == 0)
-					np->phandle = be32_to_cpup(p);
-			}
-			/* And we process the "ibm,phandle" property
-			 * used in pSeries dynamic device tree
-			 * stuff */
-			if (strcmp(pname, "ibm,phandle") == 0)
-				np->phandle = be32_to_cpup(p);
-			pp->name = (char *)pname;
-			pp->length = sz;
-			pp->value = (__be32 *)p;
-			*prev_pp = pp;
-			prev_pp = &pp->next;
-		}
-	}
-	/* with version 0x10 we may not have the name property, recreate
-	 * it here from the unit name if absent
-	 */
-	if (!has_name) {
-		const char *p1 = pathp, *ps = pathp, *pa = NULL;
-		int sz;
-
-		while (*p1) {
-			if ((*p1) == '@')
-				pa = p1;
-			if ((*p1) == '/')
-				ps = p1 + 1;
-			p1++;
-		}
-		if (pa < ps)
-			pa = p1;
-		sz = (pa - ps) + 1;
-		pp = unflatten_dt_alloc(&mem, sizeof(struct property) + sz,
-					__alignof__(struct property));
-		if (!dryrun) {
-			pp->name = "name";
-			pp->length = sz;
-			pp->value = pp + 1;
-			*prev_pp = pp;
-			prev_pp = &pp->next;
-			memcpy(pp->value, ps, sz - 1);
-			((char *)pp->value)[sz - 1] = 0;
-			pr_debug("fixed up name for %s -> %s\n", pathp,
-				(char *)pp->value);
-		}
-	}
+	/* Populate the properties */
+	populate_properties(blob, offset, mem, np, pathp, dryrun);
 	if (!dryrun) {
-		*prev_pp = NULL;
 		np->name = of_get_property(np, "name", NULL);
 		np->type = of_get_property(np, "device_type", NULL);
 
@@ -337,6 +351,37 @@ static void * unflatten_dt_node(const void *blob,
 			np->type = "<NULL>";
 	}
 
+	*pnp = np;
+	return fpsize;
+}
+
+/**
+ * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * @blob: The parent device tree blob
+ * @mem: Memory chunk to use for allocating device nodes and properties
+ * @poffset: pointer to node in flat tree
+ * @dad: Parent struct device_node
+ * @nodepp: The device_node tree created by the call
+ * @fpsize: Size of the node path up at the current depth.
+ * @dryrun: If true, do not allocate device nodes but still calculate needed
+ * memory size
+ */
+static void *unflatten_dt_node(const void *blob,
+			       void *mem,
+			       int *poffset,
+			       struct device_node *dad,
+			       struct device_node **nodepp,
+			       unsigned long fpsize,
+			       bool dryrun)
+{
+	struct device_node *np;
+	static int depth;
+	int old_depth;
+
+	fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
+	if (!fpsize)
+		return mem;
+
 	old_depth = depth;
 	*poffset = fdt_next_node(blob, *poffset, &depth);
 	if (depth < 0)
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (43 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 44/50] drivers/of: Split unflatten_dt_node() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 16:07   ` Rob Herring
  2015-12-06 20:28   ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Rob Herring
  2015-11-04 13:12 ` [PATCH v7 46/50] drivers/of: Rename unflatten_dt_node() Gavin Shan
                   ` (5 subsequent siblings)
  50 siblings, 2 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
In current implementation, unflatten_dt_node() is called recursively
to unflatten device nodes in FDT blob. It's stress to limited stack
capacity.
This avoids calling the function recursively, meaning the device
nodes are unflattened in one call on unflatten_dt_node(): two arrays
are introduced to track the parent path size and the device node of
current level of depth, which will be used by the device node on next
level of depth to be unflattened. Also, the parameter "poffset" and
"fpsize" are unused and dropped.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 94 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 56 insertions(+), 38 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 173b036..f4793d0 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -355,61 +355,82 @@ static unsigned long populate_node(const void *blob,
 	return fpsize;
 }
 
+static void reverse_nodes(struct device_node *parent)
+{
+	struct device_node *child, *next;
+
+	/* In-depth first */
+	child = parent->child;
+	while (child) {
+		reverse_nodes(child);
+
+		child = child->sibling;
+	}
+
+	/* Reverse the nodes in the child list */
+	child = parent->child;
+	parent->child = NULL;
+	while (child) {
+		next = child->sibling;
+
+		child->sibling = parent->child;
+		parent->child = child;
+		child = next;
+	}
+}
+
 /**
  * unflatten_dt_node - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
- * @poffset: pointer to node in flat tree
  * @dad: Parent struct device_node
  * @nodepp: The device_node tree created by the call
- * @fpsize: Size of the node path up at the current depth.
  * @dryrun: If true, do not allocate device nodes but still calculate needed
  * memory size
  */
 static void *unflatten_dt_node(const void *blob,
 			       void *mem,
-			       int *poffset,
 			       struct device_node *dad,
 			       struct device_node **nodepp,
-			       unsigned long fpsize,
 			       bool dryrun)
 {
-	struct device_node *np;
-	static int depth;
-	int old_depth;
-
-	fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
-	if (!fpsize)
-		return mem;
+	struct device_node *root;
+	int offset = 0, depth = 0;
+	unsigned long fpsizes[64];
+	struct device_node *nps[64];
 
-	old_depth = depth;
-	*poffset = fdt_next_node(blob, *poffset, &depth);
-	if (depth < 0)
-		depth = 0;
-	while (*poffset > 0 && depth > old_depth)
-		mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
-					fpsize, dryrun);
+	if (nodepp)
+		*nodepp = NULL;
+
+	root = dad;
+	fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
+	nps[depth++] = dad;
+	while (offset >= 0 && depth < 64) {
+		fpsizes[depth] = populate_node(blob, offset, &mem,
+					       nps[depth - 1],
+					       fpsizes[depth - 1],
+					       &nps[depth], dryrun);
+		if (!fpsizes[depth])
+			return mem;
+
+		if (!dryrun && nodepp && !*nodepp)
+			*nodepp = nps[depth];
+		if (!dryrun && !root)
+			root = nps[depth];
+
+		offset = fdt_next_node(blob, offset, &depth);
+	}
 
-	if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
-		pr_err("unflatten: error %d processing FDT\n", *poffset);
+	if (offset < 0 && offset != -FDT_ERR_NOTFOUND)
+		pr_err("%s: Error %d processing FDT\n",
+		       __func__, offset);
 
 	/*
 	 * Reverse the child list. Some drivers assumes node order matches .dts
 	 * node order
 	 */
-	if (!dryrun && np->child) {
-		struct device_node *child = np->child;
-		np->child = NULL;
-		while (child) {
-			struct device_node *next = child->sibling;
-			child->sibling = np->child;
-			np->child = child;
-			child = next;
-		}
-	}
-
-	if (nodepp)
-		*nodepp = np;
+	if (!dryrun)
+		reverse_nodes(root);
 
 	return mem;
 }
@@ -431,7 +452,6 @@ static void __unflatten_device_tree(const void *blob,
 			     void * (*dt_alloc)(u64 size, u64 align))
 {
 	unsigned long size;
-	int start;
 	void *mem;
 
 	pr_debug(" -> unflatten_device_tree()\n");
@@ -452,8 +472,7 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	start = 0;
-	size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
+	size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
 	size = ALIGN(size, 4);
 
 	pr_debug("  size is %lx, allocating...\n", size);
@@ -467,8 +486,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	start = 0;
-	unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
+	unflatten_dt_node(blob, mem, NULL, mynodes, false);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 46/50] drivers/of: Rename unflatten_dt_node()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (44 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 47/50] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
                   ` (4 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This renames unflatten_dt_node() to unflatten_dt_nodes() as it
populates multiple device nodes from FDT blob. No logical changes
introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index f4793d0..559ce49 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -380,7 +380,7 @@ static void reverse_nodes(struct device_node *parent)
 }
 
 /**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * unflatten_dt_nodes - Alloc and populate device nodes from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
  * @dad: Parent struct device_node
@@ -388,11 +388,11 @@ static void reverse_nodes(struct device_node *parent)
  * @dryrun: If true, do not allocate device nodes but still calculate needed
  * memory size
  */
-static void *unflatten_dt_node(const void *blob,
-			       void *mem,
-			       struct device_node *dad,
-			       struct device_node **nodepp,
-			       bool dryrun)
+static void *unflatten_dt_nodes(const void *blob,
+				void *mem,
+				struct device_node *dad,
+				struct device_node **nodepp,
+				bool dryrun)
 {
 	struct device_node *root;
 	int offset = 0, depth = 0;
@@ -472,7 +472,7 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
+	size = (unsigned long)unflatten_dt_nodes(blob, NULL, NULL, NULL, true);
 	size = ALIGN(size, 4);
 
 	pr_debug("  size is %lx, allocating...\n", size);
@@ -486,7 +486,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	unflatten_dt_node(blob, mem, NULL, mynodes, false);
+	unflatten_dt_nodes(blob, mem, NULL, mynodes, false);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 47/50] drivers/of: Specify parent node in of_fdt_unflatten_tree()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (45 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 46/50] drivers/of: Rename unflatten_dt_node() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 48/50] drivers/of: Return allocated memory from of_fdt_unflatten_tree() Gavin Shan
                   ` (3 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This adds one more argument to of_fdt_unflatten_tree() to specify
the parent node of the FDT blob that is going to be unflattened.
In the result, the function can be used to unflatten FDT blob that
represents device sub-tree in PowerNV PCI hotplug driver.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c       | 14 ++++++++++----
 drivers/of/unittest.c  |  2 +-
 include/linux/of_fdt.h |  1 +
 3 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 559ce49..8c8228e 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -443,11 +443,13 @@ static void *unflatten_dt_nodes(const void *blob,
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
  * @blob: The blob to expand
+ * @dad: Parent device node
  * @mynodes: The device_node tree created by the call
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
  */
 static void __unflatten_device_tree(const void *blob,
+			     struct device_node *dad,
 			     struct device_node **mynodes,
 			     void * (*dt_alloc)(u64 size, u64 align))
 {
@@ -472,7 +474,7 @@ static void __unflatten_device_tree(const void *blob,
 	}
 
 	/* First pass, scan for size */
-	size = (unsigned long)unflatten_dt_nodes(blob, NULL, NULL, NULL, true);
+	size = (unsigned long)unflatten_dt_nodes(blob, NULL, dad, NULL, true);
 	size = ALIGN(size, 4);
 
 	pr_debug("  size is %lx, allocating...\n", size);
@@ -486,7 +488,7 @@ static void __unflatten_device_tree(const void *blob,
 	pr_debug("  unflattening %p...\n", mem);
 
 	/* Second pass, do actual unflattening */
-	unflatten_dt_nodes(blob, mem, NULL, mynodes, false);
+	unflatten_dt_nodes(blob, mem, dad, mynodes, false);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
@@ -501,6 +503,9 @@ static void *kernel_tree_alloc(u64 size, u64 align)
 
 /**
  * of_fdt_unflatten_tree - create tree of device_nodes from flat blob
+ * @blob: Flat device tree blob
+ * @dad: Parent device node
+ * @mynodes: The device tree created by the call
  *
  * unflattens the device-tree passed by the firmware, creating the
  * tree of struct device_node. It also fills the "name" and "type"
@@ -508,9 +513,10 @@ static void *kernel_tree_alloc(u64 size, u64 align)
  * can be used.
  */
 void of_fdt_unflatten_tree(const unsigned long *blob,
+			struct device_node *dad,
 			struct device_node **mynodes)
 {
-	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
+	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
 
@@ -1163,7 +1169,7 @@ bool __init early_init_dt_scan(void *params)
  */
 void __init unflatten_device_tree(void)
 {
-	__unflatten_device_tree(initial_boot_params, &of_root,
+	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
 				early_init_dt_alloc_memory_arch);
 
 	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 9f71770b6..bafcf66 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -907,7 +907,7 @@ static int __init unittest_data_add(void)
 			"not running tests\n", __func__);
 		return -ENOMEM;
 	}
-	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
+	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
 	if (!unittest_data_node) {
 		pr_warn("%s: No tree to attach; not running tests\n", __func__);
 		return -ENODATA;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index df9ef38..3644960 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
 extern void of_fdt_unflatten_tree(const unsigned long *blob,
+			       struct device_node *dad,
 			       struct device_node **mynodes);
 
 /* TBD: Temporary export of fdt globals - remove when code fully merged */
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 48/50] drivers/of: Return allocated memory from of_fdt_unflatten_tree()
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (46 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 47/50] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 13:12 ` [PATCH v7 49/50] drivers/of: Export OF changeset functions Gavin Shan
                   ` (2 subsequent siblings)
  50 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This returns the allocate memory chunk, storing the unflattened device
tree, from of_fdt_unflatten_tree() so that memory chunk can be released
on demand in PowerNV PCI hotplug driver.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/fdt.c       | 25 ++++++++++++++++---------
 include/linux/of_fdt.h |  6 +++---
 2 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 8c8228e..b0a5708 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -447,11 +447,14 @@ static void *unflatten_dt_nodes(const void *blob,
  * @mynodes: The device_node tree created by the call
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
+ *
+ * Returns NULL on failure or the memory chunk containing the unflattened
+ * device tree on success.
  */
-static void __unflatten_device_tree(const void *blob,
-			     struct device_node *dad,
-			     struct device_node **mynodes,
-			     void * (*dt_alloc)(u64 size, u64 align))
+static void *__unflatten_device_tree(const void *blob,
+				     struct device_node *dad,
+				     struct device_node **mynodes,
+				     void *(*dt_alloc)(u64 size, u64 align))
 {
 	unsigned long size;
 	void *mem;
@@ -460,7 +463,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	if (!blob) {
 		pr_debug("No device tree pointer\n");
-		return;
+		return NULL;
 	}
 
 	pr_debug("Unflattening device tree:\n");
@@ -470,7 +473,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	if (fdt_check_header(blob)) {
 		pr_err("Invalid device tree blob header\n");
-		return;
+		return NULL;
 	}
 
 	/* First pass, scan for size */
@@ -494,6 +497,7 @@ static void __unflatten_device_tree(const void *blob,
 			   be32_to_cpup(mem + size));
 
 	pr_debug(" <- unflatten_device_tree()\n");
+	return mem;
 }
 
 static void *kernel_tree_alloc(u64 size, u64 align)
@@ -511,10 +515,13 @@ static void *kernel_tree_alloc(u64 size, u64 align)
  * tree of struct device_node. It also fills the "name" and "type"
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
+ *
+ * Returns NULL on failure or the memory chunk containing the unflattened
+ * device tree on success.
  */
-void of_fdt_unflatten_tree(const unsigned long *blob,
-			struct device_node *dad,
-			struct device_node **mynodes)
+void *of_fdt_unflatten_tree(const unsigned long *blob,
+			    struct device_node *dad,
+			    struct device_node **mynodes)
 {
 	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
 }
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 3644960..b87b26a7 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -37,9 +37,9 @@ extern bool of_fdt_is_big_endian(const void *blob,
 				 unsigned long node);
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
-extern void of_fdt_unflatten_tree(const unsigned long *blob,
-			       struct device_node *dad,
-			       struct device_node **mynodes);
+extern void *of_fdt_unflatten_tree(const unsigned long *blob,
+				   struct device_node *dad,
+				   struct device_node **mynodes);
 
 /* TBD: Temporary export of fdt globals - remove when code fully merged */
 extern int __initdata dt_root_addr_cells;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 49/50] drivers/of: Export OF changeset functions
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (47 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 48/50] drivers/of: Return allocated memory from of_fdt_unflatten_tree() Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-04 16:12   ` Rob Herring
  2016-01-13 13:54   ` [v7,49/50] " Wolfram Sang
  2015-11-04 13:12 ` [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
  2015-11-09  3:09 ` [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
  50 siblings, 2 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
The PowerNV PCI hotplug driver is going to use the OF changeset
to manage the changed device sub-tree. This exports those OF
changeset functions for that.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/dynamic.c    | 65 ++++++++++++++++++++++++++++++++++---------------
 drivers/of/of_private.h |  2 ++
 drivers/of/overlay.c    |  8 +++---
 drivers/of/unittest.c   |  4 ---
 4 files changed, 52 insertions(+), 27 deletions(-)
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 53826b8..c647bd1 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
 	memset(ocs, 0, sizeof(*ocs));
 	INIT_LIST_HEAD(&ocs->entries);
 }
+EXPORT_SYMBOL_GPL(of_changeset_init);
 
 /**
  * of_changeset_destroy - Destroy a changeset
@@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
 	list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
 		__of_changeset_entry_destroy(ce);
 }
+EXPORT_SYMBOL_GPL(of_changeset_destroy);
 
-/**
- * of_changeset_apply - Applies a changeset
- *
- * @ocs:	changeset pointer
- *
- * Applies a changeset to the live tree.
- * Any side-effects of live tree state changes are applied here on
- * sucess, like creation/destruction of devices and side-effects
- * like creation of sysfs properties and directories.
- * Returns 0 on success, a negative error value in case of an error.
- * On error the partially applied effects are reverted.
- */
-int of_changeset_apply(struct of_changeset *ocs)
+int __of_changeset_apply(struct of_changeset *ocs)
 {
 	struct of_changeset_entry *ce;
 	int ret;
@@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
 }
 
 /**
- * of_changeset_revert - Reverts an applied changeset
+ * of_changeset_apply - Applies a changeset
  *
  * @ocs:	changeset pointer
  *
- * Reverts a changeset returning the state of the tree to what it
- * was before the application.
- * Any side-effects like creation/destruction of devices and
- * removal of sysfs properties and directories are applied.
+ * Applies a changeset to the live tree.
+ * Any side-effects of live tree state changes are applied here on
+ * success, like creation/destruction of devices and side-effects
+ * like creation of sysfs properties and directories.
  * Returns 0 on success, a negative error value in case of an error.
+ * On error the partially applied effects are reverted.
  */
-int of_changeset_revert(struct of_changeset *ocs)
+int of_changeset_apply(struct of_changeset *ocs)
+{
+	int ret;
+
+	mutex_lock(&of_mutex);
+	ret = __of_changeset_apply(ocs);
+	mutex_unlock(&of_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(of_changeset_apply);
+
+int __of_changeset_revert(struct of_changeset *ocs)
 {
 	struct of_changeset_entry *ce;
 	int ret;
@@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
 }
 
 /**
+ * of_changeset_revert - Reverts an applied changeset
+ *
+ * @ocs:	changeset pointer
+ *
+ * Reverts a changeset returning the state of the tree to what it
+ * was before the application.
+ * Any side-effects like creation/destruction of devices and
+ * removal of sysfs properties and directories are applied.
+ * Returns 0 on success, a negative error value in case of an error.
+ */
+int of_changeset_revert(struct of_changeset *ocs)
+{
+	int ret;
+
+	mutex_lock(&of_mutex);
+	ret = __of_changeset_revert(ocs);
+	mutex_unlock(&of_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(of_changeset_revert);
+
+/**
  * of_changeset_action - Perform a changeset action
  *
  * @ocs:	changeset pointer
@@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
 	list_add_tail(&ce->node, &ocs->entries);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(of_changeset_action);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 8e882e7..829469f 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -45,6 +45,8 @@ static inline struct device_node *kobj_to_device_node(struct kobject *kobj)
 extern int of_property_notify(int action, struct device_node *np,
 			      struct property *prop, struct property *old_prop);
 extern void of_node_release(struct kobject *kobj);
+extern int __of_changeset_apply(struct of_changeset *ocs);
+extern int __of_changeset_revert(struct of_changeset *ocs);
 #else /* CONFIG_OF_DYNAMIC */
 static inline int of_property_notify(int action, struct device_node *np,
 				     struct property *prop, struct property *old_prop)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 24e025f..804ea33 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
 	}
 
 	/* apply the changeset */
-	err = of_changeset_apply(&ov->cset);
+	err = __of_changeset_apply(&ov->cset);
 	if (err) {
-		pr_err("%s: of_changeset_apply() failed for tree@%s\n",
+		pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
 				__func__, tree->full_name);
 		goto err_revert_overlay;
 	}
@@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
 
 
 	list_del(&ov->node);
-	of_changeset_revert(&ov->cset);
+	__of_changeset_revert(&ov->cset);
 	of_free_overlay_info(ov);
 	idr_remove(&ov_idr, id);
 	of_changeset_destroy(&ov->cset);
@@ -539,7 +539,7 @@ int of_overlay_destroy_all(void)
 	/* the tail of list is guaranteed to be safe to remove */
 	list_for_each_entry_safe_reverse(ov, ovn, &ov_list, node) {
 		list_del(&ov->node);
-		of_changeset_revert(&ov->cset);
+		__of_changeset_revert(&ov->cset);
 		of_free_overlay_info(ov);
 		idr_remove(&ov_idr, ov->id);
 		kfree(ov);
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index bafcf66..dad3fd2 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -526,18 +526,14 @@ static void __init of_unittest_changeset(void)
 	unittest(!of_changeset_add_property(&chgset, parent, ppadd), "fail add prop\n");
 	unittest(!of_changeset_update_property(&chgset, parent, ppupdate), "fail update prop\n");
 	unittest(!of_changeset_remove_property(&chgset, parent, ppremove), "fail remove prop\n");
-	mutex_lock(&of_mutex);
 	unittest(!of_changeset_apply(&chgset), "apply failed\n");
-	mutex_unlock(&of_mutex);
 
 	/* Make sure node names are constructed correctly */
 	unittest((np = of_find_node_by_path("/testcase-data/changeset/n2/n21")),
 		 "'%s' not added\n", n21->full_name);
 	of_node_put(np);
 
-	mutex_lock(&of_mutex);
 	unittest(!of_changeset_revert(&chgset), "revert failed\n");
-	mutex_unlock(&of_mutex);
 
 	of_changeset_destroy(&chgset);
 #endif
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (48 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 49/50] drivers/of: Export OF changeset functions Gavin Shan
@ 2015-11-04 13:12 ` Gavin Shan
  2015-11-18  7:33   ` Alexey Kardashevskiy
  2015-11-09  3:09 ` [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 13:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
This adds standalone driver to support PCI hotplug for PowerPC PowerNV
platform that runs on top of skiboot firmware. The firmware identifies
hotpluggable slots and marked their device tree node with proper
"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans device
tree nodes to create/register PCI hotplug slot accordingly.
If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 MAINTAINERS                   |   6 +
 drivers/pci/hotplug/Kconfig   |  12 +
 drivers/pci/hotplug/Makefile  |   3 +
 drivers/pci/hotplug/pnv_php.c | 866 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 887 insertions(+)
 create mode 100644 drivers/pci/hotplug/pnv_php.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 9f6685f..10088f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7931,6 +7931,12 @@ L:	linux-pci@vger.kernel.org
 S:	Supported
 F:	Documentation/PCI/pci-error-recovery.txt
 
+PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
+M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
+L:	linux-pci@vger.kernel.org
+S:	Supported
+F:	drivers/pci/hotplug/pnv_php.c
+
 PCI SUBSYSTEM
 M:	Bjorn Helgaas <bhelgaas@google.com>
 L:	linux-pci@vger.kernel.org
diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..167c8ce 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+	  PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called pnv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index b616e75..e33cdda 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+pnv-php-objs		:=	pnv_php.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
new file mode 100644
index 0000000..415e9b9
--- /dev/null
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -0,0 +1,866 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/module.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+#include <asm/ppc-pci.h>
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+struct pnv_php_slot {
+	struct hotplug_slot		php_slot;
+	struct hotplug_slot_info	php_slot_info;
+	uint64_t			id;
+	char				*name;
+	int				slot_no;
+	struct kref			kref;
+	int				state;
+#define PNV_PHP_STATE_INIT		0
+#define PNV_PHP_STATE_REGISTER		1
+#define PNV_PHP_STATE_POPULATED		2
+	struct device_node		*dn;
+	struct pci_dev			*pdev;
+	struct pci_bus			*bus;
+	bool				power_state_check;
+	int				power_state_confirmed;
+#define PNV_PHP_POWER_CONFIRMED_INVALID	0
+#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
+#define PNV_PHP_POWER_CONFIRMED_FAIL	2
+	struct opal_msg			*msg;
+	void				*fdt;
+	void				*dt;
+	struct of_changeset		ocs;
+	struct work_struct		work;
+	wait_queue_head_t		queue;
+	struct pnv_php_slot		*parent;
+	struct list_head		children;
+	struct list_head		link;
+};
+
+static LIST_HEAD(pnv_php_slot_list);
+static DEFINE_SPINLOCK(pnv_php_lock);
+
+static void pnv_php_register(struct device_node *dn);
+static void pnv_php_unregister_one(struct device_node *dn);
+static void pnv_php_unregister(struct device_node *dn);
+
+static inline struct pnv_php_slot *pnv_php_get_slot(struct pnv_php_slot *slot)
+{
+	if (slot) {
+		kref_get(&slot->kref);
+		return slot;
+	}
+
+	return NULL;
+}
+
+static void pnv_php_free_slot(struct kref *kref)
+{
+	struct pnv_php_slot *slot = container_of(kref,
+						 struct pnv_php_slot,
+						 kref);
+
+	WARN_ON(!list_empty(&slot->children));
+	kfree(slot->name);
+	kfree(slot);
+}
+
+static inline void pnv_php_put_slot(struct pnv_php_slot *slot)
+{
+	if (!slot)
+		return;
+
+	kref_put(&slot->kref, pnv_php_free_slot);
+}
+
+static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
+					  struct pnv_php_slot *slot)
+{
+	struct pnv_php_slot *target, *tmp;
+
+	if (slot->dn == dn)
+		return pnv_php_get_slot(slot);
+
+	list_for_each_entry(tmp, &slot->children, link) {
+		target = pnv_php_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
+{
+	struct pnv_php_slot *slot, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
+		slot = pnv_php_match(dn, tmp);
+		if (slot) {
+			spin_unlock_irqrestore(&pnv_php_lock, flags);
+			return slot;
+		}
+	}
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	return NULL;
+}
+
+/*
+ * Remove pdn for all children of the indicated device node.
+ * The function should remove pdn in a depth-first manner.
+ */
+static void pnv_php_rmv_pdns(struct device_node *dn)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(dn, child) {
+		pnv_php_rmv_pdns(child);
+
+		pci_remove_device_node_info(child);
+	}
+}
+
+/*
+ * Remove all child nodes of the indicated device nodes. The
+ * function should remove device nodes in depth-first manner.
+ */
+static int pnv_php_rmv_device_nodes(struct device_node *parent)
+{
+	struct device_node *dn, *child;
+	int ret = 0;
+
+	for_each_child_of_node(parent, dn) {
+		ret = pnv_php_rmv_device_nodes(dn);
+		if (ret)
+			return ret;
+
+		child = of_get_next_child(dn, NULL);
+		if (child) {
+			of_node_put(child);
+			of_node_put(dn);
+			pr_err("%s: Alive children of node <%s>\n",
+			       __func__, of_node_full_name(dn));
+			return -EBUSY;
+		}
+
+		of_detach_node(dn);
+		of_node_put(dn);
+	}
+
+	return 0;
+}
+
+/*
+ * The function processes the message sent by firmware
+ * to remove all device tree nodes beneath the slot's
+ * nodes and the associated auxiliary data.
+ */
+static void pnv_php_handle_poweroff(struct pnv_php_slot *slot)
+{
+	int ret;
+
+	pnv_php_rmv_pdns(slot->dn);
+
+	/*
+	 * If the device sub-tree was created from OF changeset, simply
+	 * to revert that. Otherwise, the device nodes in the sub-tree
+	 * need to be iterated and detached.
+	 */
+	if (slot->fdt) {
+		of_changeset_destroy(&slot->ocs);
+		kfree(slot->dt);
+		kfree(slot->fdt);
+		slot->dt = NULL;
+		slot->dn->child = NULL;
+		slot->fdt = NULL;
+		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_SUCCESS;
+		goto confirm;
+	}
+
+	ret = pnv_php_rmv_device_nodes(slot->dn);
+	if (!ret) {
+		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_SUCCESS;
+	} else {
+		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
+		dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
+			 ret);
+	}
+
+confirm:
+	wake_up_interruptible(&slot->queue);
+}
+
+static int pnv_php_populate_changeset(struct of_changeset *ocs,
+				      struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	for_each_child_of_node(dn, child) {
+		ret = of_changeset_attach_node(ocs, child);
+		if (ret)
+			return ret;
+
+		ret = pnv_php_populate_changeset(ocs, child);
+	}
+
+	return ret;
+}
+
+static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
+{
+	struct pci_controller *hose = (struct pci_controller *)data;
+	struct pci_dn *pdn;
+
+	pdn = pci_add_device_node_info(hose, dn);
+	if (!pdn)
+		return ERR_PTR(-ENOMEM);
+
+	return NULL;
+}
+
+static void pnv_php_add_pdns(struct pnv_php_slot *slot)
+{
+	struct pci_controller *hose = pci_bus_to_host(slot->bus);
+
+	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
+}
+
+static void pnv_php_handle_poweron(struct pnv_php_slot *slot)
+{
+	void *fdt, *dt;
+	uint64_t len;
+	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
+	int ret;
+
+	/* We don't know the FDT blob size. It tries with incremental
+	 * sized memory chunk.
+	 */
+	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
+		fdt = kzalloc(len, GFP_KERNEL);
+		if (!fdt)
+			break;
+
+		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
+		if (!ret)
+			break;
+
+		kfree(fdt);
+	}
+
+	if (len > 0x10000) {
+		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
+		goto out;
+	}
+
+	/* Unflatten device tree blob */
+	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
+	if (!dt) {
+		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
+		goto free_fdt;
+	}
+
+	/* Initialize and apply the changeset */
+	of_changeset_init(&slot->ocs);
+	ret = pnv_php_populate_changeset(&slot->ocs, slot->dn);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
+			 ret);
+		goto free_dt;
+	}
+
+	slot->dn->child = NULL;
+	ret = of_changeset_apply(&slot->ocs);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
+			 ret);
+		goto destroy_changeset;
+	}
+
+	/* Add device node firmware data */
+	pnv_php_add_pdns(slot);
+	slot->fdt = fdt;
+	slot->dt = dt;
+	goto out;
+
+destroy_changeset:
+	of_changeset_destroy(&slot->ocs);
+free_dt:
+	kfree(dt);
+	slot->dn->child = NULL;
+free_fdt:
+	kfree(fdt);
+	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
+out:
+	/* Confirm status change */
+	slot->power_state_confirmed = confirm;
+	wake_up_interruptible(&slot->queue);
+}
+
+static void pnv_php_work(struct work_struct *data)
+{
+	struct pnv_php_slot *slot = container_of(data,
+						 struct pnv_php_slot, work);
+	uint64_t event = be64_to_cpu(slot->msg->params[0]);
+
+	if (event == OPAL_PCI_SLOT_POWER_OFF)
+		pnv_php_handle_poweroff(slot);
+	else
+		pnv_php_handle_poweron(slot);
+
+	pnv_php_put_slot(slot);
+}
+
+static int pnv_php_handle_msg(struct notifier_block *nb,
+			      unsigned long type,
+			      void *message)
+{
+	phandle h;
+	struct device_node *dn;
+	struct pnv_php_slot *slot;
+	struct opal_msg *msg = message;
+
+	if (type != OPAL_MSG_PCI_HOTPLUG) {
+		pr_warn("%s: Invalid message %ld received!\n",
+			__func__, type);
+		return NOTIFY_DONE;
+	}
+
+	h = (phandle)be64_to_cpu(msg->params[1]);
+	dn = of_find_node_by_phandle(h);
+	if (!dn) {
+		pr_warn("%s: No device node for phandle 0x%x\n",
+			__func__, h);
+		return NOTIFY_DONE;
+	}
+
+	slot = pnv_php_find_slot(dn);
+	of_node_put(dn);
+	if (!slot) {
+		pr_warn("%s: No slot found for node <%s>\n",
+			__func__, of_node_full_name(dn));
+		of_node_put(dn);
+		return NOTIFY_DONE;
+	}
+
+	slot->msg = msg;
+	schedule_work(&slot->work);
+	return NOTIFY_OK;
+}
+
+static int pnv_php_set_power_state(struct hotplug_slot *php_slot, u8 state)
+{
+	struct pnv_php_slot *slot = php_slot->private;
+	int ret;
+
+	slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
+	ret = pnv_pci_set_power_state(slot->id, state);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
+			 ret, state ? "on" : "off");
+		return ret;
+	}
+
+	/* Continue to PCI probing after finalized device-tree. The
+	 * device-tree might have been updated completely at this
+	 * point. Thus we don't have to always waiting for that.
+	 */
+	if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
+		return 0;
+	else if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
+		return -EBUSY;
+
+	ret = wait_event_timeout(slot->queue,
+				 slot->power_state_confirmed, 10 * HZ);
+	if (!ret) {
+		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
+			 ret, state ? "on" : "off");
+		return -EBUSY;
+	}
+
+	if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
+		return 0;
+
+	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
+		 slot->power_state_confirmed, state ? "on" : "off");
+	return -EBUSY;
+}
+
+static int pnv_php_get_power_state(struct hotplug_slot *php_slot, u8 *state)
+{
+	struct pnv_php_slot *slot = php_slot->private;
+	uint8_t power_state;
+	int ret;
+
+	/*
+	 * Retrieve power status from firmware. If we fail
+	 * getting that, the power status fails back to
+	 * be on.
+	 */
+	ret = pnv_pci_get_power_state(slot->id, &power_state);
+	if (ret) {
+		*state = OPAL_PCI_SLOT_POWER_ON;
+		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
+			 ret);
+	} else {
+		*state = power_state;
+		php_slot->info->power_status = power_state;
+	}
+
+	return 0;
+}
+
+static int pnv_php_get_adapter_state(struct hotplug_slot *php_slot, u8 *state)
+{
+	struct pnv_php_slot *slot = php_slot->private;
+	uint8_t presence;
+	int ret;
+
+	/*
+	 * Retrieve presence status from firmware. If we can't
+	 * get that, it will fail back to be empty.
+	 */
+	ret = pnv_pci_get_presence_state(slot->id, &presence);
+	if (ret >= 0) {
+		*state = presence;
+		php_slot->info->adapter_status = presence;
+		ret = 0;
+	} else {
+		*state = OPAL_PCI_SLOT_EMPTY;
+		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
+			 ret);
+	}
+
+	return ret;
+}
+
+static int pnv_php_set_attention_state(struct hotplug_slot *php_slot, u8 state)
+{
+	/* FIXME: Make it real once firmware supports it */
+	php_slot->info->attention_status = state;
+
+	return 0;
+}
+
+static int pnv_php_enable(struct pnv_php_slot *slot, bool rescan)
+{
+	struct hotplug_slot *php_slot = &slot->php_slot;
+	uint8_t presence, power_status;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (slot->state != PNV_PHP_STATE_REGISTER)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
+	if (ret)
+		return ret;
+
+	/* Proceed if there have nothing behind the slot */
+	if (presence == OPAL_PCI_SLOT_EMPTY)
+		goto scan;
+
+	/*
+	 * If we don't detect something behind the slot, we need
+	 * make sure the power suply to the slot is on. Otherwise,
+	 * the slot downstream PCIe linkturn should be down.
+	 *
+	 * On the first time, we don't change the power status to
+	 * boost system boot with assumption that the firmware
+	 * supplies consistent slot power status: empty slot always
+	 * has its power off and non-empty slot has its power on.
+	 */
+	if (!slot->power_state_check) {
+		slot->power_state_check = true;
+		goto scan;
+	}
+
+	/* Check the power status. Scan the slot if that's already on */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret)
+		return ret;
+
+	if (power_status == OPAL_PCI_SLOT_POWER_ON)
+		goto scan;
+
+	/* Power is off, turn it on and then scan the slot */
+	ret = pnv_php_set_power_state(php_slot, OPAL_PCI_SLOT_POWER_ON);
+	if (ret)
+		return ret;
+
+scan:
+	if (presence == OPAL_PCI_SLOT_PRESENT) {
+		if (rescan) {
+			pci_lock_rescan_remove();
+			pci_add_pci_devices(slot->bus);
+			pci_unlock_rescan_remove();
+		}
+
+		/* Rescan for child hotpluggable slots */
+		slot->state = PNV_PHP_STATE_POPULATED;
+		if (rescan)
+			pnv_php_register(slot->dn);
+	} else {
+		slot->state = PNV_PHP_STATE_POPULATED;
+	}
+
+	return 0;
+}
+
+static int pnv_php_enable_slot(struct hotplug_slot *php_slot)
+{
+	struct pnv_php_slot *slot = container_of(php_slot,
+						 struct pnv_php_slot,
+						 php_slot);
+
+	return pnv_php_enable(slot, true);
+}
+
+static int pnv_php_disable_slot(struct hotplug_slot *php_slot)
+{
+	struct pnv_php_slot *slot = php_slot->private;
+	uint8_t power_state;
+	int ret;
+
+	if (slot->state != PNV_PHP_STATE_POPULATED)
+		return 0;
+
+	/* Remove all devices behind the slot */
+	pci_lock_rescan_remove();
+	pci_remove_pci_devices(slot->bus);
+	pci_unlock_rescan_remove();
+
+	/* Detach the child hotpluggable slots */
+	pnv_php_unregister(slot->dn);
+
+	/*
+	 * Check the power status and turn it off if necessary. If we
+	 * fail to get the power status, the power will be forced to
+	 * be off.
+	 */
+	ret = php_slot->ops->get_power_status(php_slot, &power_state);
+	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
+		ret = pnv_php_set_power_state(php_slot,
+					      OPAL_PCI_SLOT_POWER_OFF);
+		if (ret)
+			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
+				 ret);
+	}
+
+	/* Update slot state */
+	slot->state = PNV_PHP_STATE_REGISTER;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= pnv_php_get_power_state,
+	.get_adapter_status	= pnv_php_get_adapter_state,
+	.set_attention_status	= pnv_php_set_attention_state,
+	.enable_slot		= pnv_php_enable_slot,
+	.disable_slot		= pnv_php_disable_slot,
+};
+
+static void pnv_php_release(struct hotplug_slot *hp_slot)
+{
+	struct pnv_php_slot *slot = hp_slot->private;
+	unsigned long flags;
+
+	/* Remove from global or child list */
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	list_del(&slot->link);
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	/* Detach from parent */
+	pnv_php_put_slot(slot);
+	pnv_php_put_slot(slot->parent);
+}
+
+static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return -ENXIO;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return -ENXIO;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
+{
+	struct pnv_php_slot *slot;
+	struct pci_bus *bus;
+	const char *label;
+	uint64_t id;
+
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	if (pnv_php_get_slot_id(dn, &id))
+		return NULL;
+
+	bus = pci_find_bus_by_node(dn);
+	if (!bus)
+		return NULL;
+
+	slot = kzalloc(sizeof(*slot), GFP_KERNEL);
+	if (!slot)
+		return NULL;
+
+	slot->name = kstrdup(label, GFP_KERNEL);
+	if (!slot->name) {
+		kfree(slot);
+		return NULL;
+	}
+
+	if (dn->child && PCI_DN(dn->child))
+		slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		slot->slot_no = -1;   /* Placeholder slot */
+
+	kref_init(&slot->kref);
+	slot->state	            = PNV_PHP_STATE_INIT;
+	slot->dn	            = dn;
+	slot->pdev	            = bus->self;
+	slot->bus	            = bus;
+	slot->id	            = id;
+	slot->power_state_check     = false;
+	slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
+	slot->php_slot.ops          = &php_slot_ops;
+	slot->php_slot.info         = &slot->php_slot_info;
+	slot->php_slot.release      = pnv_php_release;
+	slot->php_slot.private      = slot;
+
+	INIT_WORK(&slot->work, pnv_php_work);
+	init_waitqueue_head(&slot->queue);
+	INIT_LIST_HEAD(&slot->children);
+	INIT_LIST_HEAD(&slot->link);
+
+	return slot;
+}
+
+static int pnv_php_register_slot(struct pnv_php_slot *slot)
+{
+	struct pnv_php_slot *parent;
+	struct device_node *dn = slot->dn;
+	unsigned long flags;
+	int ret;
+
+	/* Check if the slot exists or not */
+	parent = pnv_php_find_slot(slot->dn);
+	if (parent) {
+		pnv_php_put_slot(parent);
+		return -EEXIST;
+	}
+
+	/* Register PCI slot */
+	ret = pci_hp_register(&slot->php_slot, slot->bus,
+			      slot->slot_no, slot->name);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
+			 ret);
+		return ret;
+	}
+
+	/* Attach to the parent's child list or global list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = pnv_php_find_slot(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+	}
+
+	spin_lock_irqsave(&pnv_php_lock, flags);
+	slot->parent = parent;
+	if (parent)
+		list_add_tail(&slot->link, &parent->children);
+	else
+		list_add_tail(&slot->link, &pnv_php_slot_list);
+	spin_unlock_irqrestore(&pnv_php_lock, flags);
+
+	slot->state = PNV_PHP_STATE_REGISTER;
+	return 0;
+}
+
+static int pnv_php_register_one(struct device_node *dn)
+{
+	struct pnv_php_slot *slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	slot = pnv_php_alloc_slot(dn);
+	if (!slot)
+		return -ENODEV;
+
+	ret = pnv_php_register_slot(slot);
+	if (ret)
+		goto free_slot;
+
+	ret = pnv_php_enable(slot, false);
+	if (ret)
+		goto unregister_slot;
+
+	return 0;
+
+unregister_slot:
+	pnv_php_unregister_one(slot->dn);
+free_slot:
+	pnv_php_put_slot(slot);
+	return ret;
+}
+
+static void pnv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/*
+	 * The parent slots should be registered before their
+	 * child slots.
+	 */
+	for_each_child_of_node(dn, child) {
+		pnv_php_register_one(child);
+		pnv_php_register(child);
+	}
+}
+
+static void pnv_php_unregister_one(struct device_node *dn)
+{
+	struct pnv_php_slot *slot;
+
+	slot = pnv_php_find_slot(dn);
+	if (!slot)
+		return;
+
+	pnv_php_put_slot(slot);
+	pci_hp_deregister(&slot->php_slot);
+}
+
+static void pnv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/* The child slots should go before their parent slots */
+	for_each_child_of_node(dn, child) {
+		pnv_php_unregister(child);
+		pnv_php_unregister_one(child);
+	}
+}
+
+static struct notifier_block php_msg_nb = {
+	.notifier_call	= pnv_php_handle_msg,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int __init pnv_php_init(void)
+{
+	struct device_node *dn;
+	int ret;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Register hotplug message handler */
+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
+	if (ret) {
+		pr_warn("%s: Error %d registering hotplug notifier\n",
+			__func__, ret);
+		return ret;
+	}
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		pnv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		pnv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit pnv_php_exit(void)
+{
+	struct device_node *dn;
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		pnv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		pnv_php_unregister(dn);
+
+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
+}
+
+module_init(pnv_php_init);
+module_exit(pnv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-11-04 13:12 ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
@ 2015-11-04 16:07   ` Rob Herring
  2015-11-04 23:23     ` Gavin Shan
  2016-05-13  7:16     ` Geert Uytterhoeven
  2015-12-06 20:28   ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Rob Herring
  1 sibling, 2 replies; 157+ messages in thread
From: Rob Herring @ 2015-11-04 16:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> In current implementation, unflatten_dt_node() is called recursively
> to unflatten device nodes in FDT blob. It's stress to limited stack
> capacity.
Did you actually hit a problem?
Now we have a max depth of 64. Seems like that should be plenty... Any
idea how this compares to when we run out of stack space?
> This avoids calling the function recursively, meaning the device
> nodes are unflattened in one call on unflatten_dt_node(): two arrays
> are introduced to track the parent path size and the device node of
> current level of depth, which will be used by the device node on next
> level of depth to be unflattened. Also, the parameter "poffset" and
> "fpsize" are unused and dropped.
Yay. I'm happy to see parameters removed instead of added to this function.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 94 +++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 56 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 173b036..f4793d0 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -355,61 +355,82 @@ static unsigned long populate_node(const void *blob,
>         return fpsize;
>  }
>
> +static void reverse_nodes(struct device_node *parent)
> +{
> +       struct device_node *child, *next;
> +
> +       /* In-depth first */
> +       child = parent->child;
> +       while (child) {
> +               reverse_nodes(child);
> +
> +               child = child->sibling;
> +       }
> +
> +       /* Reverse the nodes in the child list */
> +       child = parent->child;
> +       parent->child = NULL;
> +       while (child) {
> +               next = child->sibling;
> +
> +               child->sibling = parent->child;
> +               parent->child = child;
> +               child = next;
> +       }
> +}
> +
>  /**
>   * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>   * @blob: The parent device tree blob
>   * @mem: Memory chunk to use for allocating device nodes and properties
> - * @poffset: pointer to node in flat tree
>   * @dad: Parent struct device_node
>   * @nodepp: The device_node tree created by the call
> - * @fpsize: Size of the node path up at the current depth.
>   * @dryrun: If true, do not allocate device nodes but still calculate needed
>   * memory size
>   */
>  static void *unflatten_dt_node(const void *blob,
>                                void *mem,
> -                              int *poffset,
>                                struct device_node *dad,
>                                struct device_node **nodepp,
> -                              unsigned long fpsize,
>                                bool dryrun)
We can probably further simplify things by returning an int with
negative being errors and positive being the size. Also, dryrun can be
dropped and implied by mem and/or nodepp being NULL.
>  {
> -       struct device_node *np;
> -       static int depth;
> -       int old_depth;
> -
> -       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
> -       if (!fpsize)
> -               return mem;
> +       struct device_node *root;
> +       int offset = 0, depth = 0;
> +       unsigned long fpsizes[64];
> +       struct device_node *nps[64];
Use a define here.
>
> -       old_depth = depth;
> -       *poffset = fdt_next_node(blob, *poffset, &depth);
> -       if (depth < 0)
> -               depth = 0;
> -       while (*poffset > 0 && depth > old_depth)
> -               mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
> -                                       fpsize, dryrun);
> +       if (nodepp)
> +               *nodepp = NULL;
> +
> +       root = dad;
> +       fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
> +       nps[depth++] = dad;
> +       while (offset >= 0 && depth < 64) {
> +               fpsizes[depth] = populate_node(blob, offset, &mem,
> +                                              nps[depth - 1],
> +                                              fpsizes[depth - 1],
> +                                              &nps[depth], dryrun);
> +               if (!fpsizes[depth])
> +                       return mem;
> +
> +               if (!dryrun && nodepp && !*nodepp)
> +                       *nodepp = nps[depth];
> +               if (!dryrun && !root)
> +                       root = nps[depth];
> +
> +               offset = fdt_next_node(blob, offset, &depth);
> +       }
>
> -       if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
> -               pr_err("unflatten: error %d processing FDT\n", *poffset);
> +       if (offset < 0 && offset != -FDT_ERR_NOTFOUND)
> +               pr_err("%s: Error %d processing FDT\n",
> +                      __func__, offset);
What about depth == 64 case? I think the behavior should be a WARN and
ignore those nodes so we at least can continue to boot and see the
error. Of course, if there is a phandle pointing to ignored nodes, we
have to handle that too.
>
>         /*
>          * Reverse the child list. Some drivers assumes node order matches .dts
>          * node order
>          */
> -       if (!dryrun && np->child) {
> -               struct device_node *child = np->child;
> -               np->child = NULL;
> -               while (child) {
> -                       struct device_node *next = child->sibling;
> -                       child->sibling = np->child;
> -                       np->child = child;
> -                       child = next;
> -               }
> -       }
> -
> -       if (nodepp)
> -               *nodepp = np;
> +       if (!dryrun)
> +               reverse_nodes(root);
>
>         return mem;
>  }
> @@ -431,7 +452,6 @@ static void __unflatten_device_tree(const void *blob,
>                              void * (*dt_alloc)(u64 size, u64 align))
>  {
>         unsigned long size;
> -       int start;
>         void *mem;
>
>         pr_debug(" -> unflatten_device_tree()\n");
> @@ -452,8 +472,7 @@ static void __unflatten_device_tree(const void *blob,
>         }
>
>         /* First pass, scan for size */
> -       start = 0;
> -       size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
> +       size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
>         size = ALIGN(size, 4);
>
>         pr_debug("  size is %lx, allocating...\n", size);
> @@ -467,8 +486,7 @@ static void __unflatten_device_tree(const void *blob,
>         pr_debug("  unflattening %p...\n", mem);
>
>         /* Second pass, do actual unflattening */
> -       start = 0;
> -       unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
> +       unflatten_dt_node(blob, mem, NULL, mynodes, false);
>         if (be32_to_cpup(mem + size) != 0xdeadbeef)
>                 pr_warning("End of tree marker overwritten: %08x\n",
>                            be32_to_cpup(mem + size));
> --
> 2.1.0
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 49/50] drivers/of: Export OF changeset functions
  2015-11-04 13:12 ` [PATCH v7 49/50] drivers/of: Export OF changeset functions Gavin Shan
@ 2015-11-04 16:12   ` Rob Herring
  2015-11-04 23:23     ` Gavin Shan
  2016-01-13 13:54   ` [v7,49/50] " Wolfram Sang
  1 sibling, 1 reply; 157+ messages in thread
From: Rob Herring @ 2015-11-04 16:12 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> The PowerNV PCI hotplug driver is going to use the OF changeset
> to manage the changed device sub-tree. This exports those OF
> changeset functions for that.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Rob Herring <robh@kernel.org>
> ---
>  drivers/of/dynamic.c    | 65 ++++++++++++++++++++++++++++++++++---------------
>  drivers/of/of_private.h |  2 ++
>  drivers/of/overlay.c    |  8 +++---
>  drivers/of/unittest.c   |  4 ---
>  4 files changed, 52 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 53826b8..c647bd1 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
>         memset(ocs, 0, sizeof(*ocs));
>         INIT_LIST_HEAD(&ocs->entries);
>  }
> +EXPORT_SYMBOL_GPL(of_changeset_init);
>
>  /**
>   * of_changeset_destroy - Destroy a changeset
> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
>         list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>                 __of_changeset_entry_destroy(ce);
>  }
> +EXPORT_SYMBOL_GPL(of_changeset_destroy);
>
> -/**
> - * of_changeset_apply - Applies a changeset
> - *
> - * @ocs:       changeset pointer
> - *
> - * Applies a changeset to the live tree.
> - * Any side-effects of live tree state changes are applied here on
> - * sucess, like creation/destruction of devices and side-effects
> - * like creation of sysfs properties and directories.
> - * Returns 0 on success, a negative error value in case of an error.
> - * On error the partially applied effects are reverted.
> - */
> -int of_changeset_apply(struct of_changeset *ocs)
> +int __of_changeset_apply(struct of_changeset *ocs)
>  {
>         struct of_changeset_entry *ce;
>         int ret;
> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
>  }
>
>  /**
> - * of_changeset_revert - Reverts an applied changeset
> + * of_changeset_apply - Applies a changeset
>   *
>   * @ocs:       changeset pointer
>   *
> - * Reverts a changeset returning the state of the tree to what it
> - * was before the application.
> - * Any side-effects like creation/destruction of devices and
> - * removal of sysfs properties and directories are applied.
> + * Applies a changeset to the live tree.
> + * Any side-effects of live tree state changes are applied here on
> + * success, like creation/destruction of devices and side-effects
> + * like creation of sysfs properties and directories.
>   * Returns 0 on success, a negative error value in case of an error.
> + * On error the partially applied effects are reverted.
>   */
> -int of_changeset_revert(struct of_changeset *ocs)
> +int of_changeset_apply(struct of_changeset *ocs)
> +{
> +       int ret;
> +
> +       mutex_lock(&of_mutex);
> +       ret = __of_changeset_apply(ocs);
> +       mutex_unlock(&of_mutex);
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(of_changeset_apply);
> +
> +int __of_changeset_revert(struct of_changeset *ocs)
>  {
>         struct of_changeset_entry *ce;
>         int ret;
> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
>  }
>
>  /**
> + * of_changeset_revert - Reverts an applied changeset
> + *
> + * @ocs:       changeset pointer
> + *
> + * Reverts a changeset returning the state of the tree to what it
> + * was before the application.
> + * Any side-effects like creation/destruction of devices and
> + * removal of sysfs properties and directories are applied.
> + * Returns 0 on success, a negative error value in case of an error.
> + */
> +int of_changeset_revert(struct of_changeset *ocs)
> +{
> +       int ret;
> +
> +       mutex_lock(&of_mutex);
> +       ret = __of_changeset_revert(ocs);
> +       mutex_unlock(&of_mutex);
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(of_changeset_revert);
> +
> +/**
>   * of_changeset_action - Perform a changeset action
>   *
>   * @ocs:       changeset pointer
> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
>         list_add_tail(&ce->node, &ocs->entries);
>         return 0;
>  }
> +EXPORT_SYMBOL_GPL(of_changeset_action);
> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> index 8e882e7..829469f 100644
> --- a/drivers/of/of_private.h
> +++ b/drivers/of/of_private.h
> @@ -45,6 +45,8 @@ static inline struct device_node *kobj_to_device_node(struct kobject *kobj)
>  extern int of_property_notify(int action, struct device_node *np,
>                               struct property *prop, struct property *old_prop);
>  extern void of_node_release(struct kobject *kobj);
> +extern int __of_changeset_apply(struct of_changeset *ocs);
> +extern int __of_changeset_revert(struct of_changeset *ocs);
>  #else /* CONFIG_OF_DYNAMIC */
>  static inline int of_property_notify(int action, struct device_node *np,
>                                      struct property *prop, struct property *old_prop)
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 24e025f..804ea33 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
>         }
>
>         /* apply the changeset */
> -       err = of_changeset_apply(&ov->cset);
> +       err = __of_changeset_apply(&ov->cset);
>         if (err) {
> -               pr_err("%s: of_changeset_apply() failed for tree@%s\n",
> +               pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
>                                 __func__, tree->full_name);
>                 goto err_revert_overlay;
>         }
> @@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
>
>
>         list_del(&ov->node);
> -       of_changeset_revert(&ov->cset);
> +       __of_changeset_revert(&ov->cset);
>         of_free_overlay_info(ov);
>         idr_remove(&ov_idr, id);
>         of_changeset_destroy(&ov->cset);
> @@ -539,7 +539,7 @@ int of_overlay_destroy_all(void)
>         /* the tail of list is guaranteed to be safe to remove */
>         list_for_each_entry_safe_reverse(ov, ovn, &ov_list, node) {
>                 list_del(&ov->node);
> -               of_changeset_revert(&ov->cset);
> +               __of_changeset_revert(&ov->cset);
>                 of_free_overlay_info(ov);
>                 idr_remove(&ov_idr, ov->id);
>                 kfree(ov);
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index bafcf66..dad3fd2 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -526,18 +526,14 @@ static void __init of_unittest_changeset(void)
>         unittest(!of_changeset_add_property(&chgset, parent, ppadd), "fail add prop\n");
>         unittest(!of_changeset_update_property(&chgset, parent, ppupdate), "fail update prop\n");
>         unittest(!of_changeset_remove_property(&chgset, parent, ppremove), "fail remove prop\n");
> -       mutex_lock(&of_mutex);
>         unittest(!of_changeset_apply(&chgset), "apply failed\n");
> -       mutex_unlock(&of_mutex);
>
>         /* Make sure node names are constructed correctly */
>         unittest((np = of_find_node_by_path("/testcase-data/changeset/n2/n21")),
>                  "'%s' not added\n", n21->full_name);
>         of_node_put(np);
>
> -       mutex_lock(&of_mutex);
>         unittest(!of_changeset_revert(&chgset), "revert failed\n");
> -       mutex_unlock(&of_mutex);
>
>         of_changeset_destroy(&chgset);
>  #endif
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 44/50] drivers/of: Split unflatten_dt_node()
  2015-11-04 13:12 ` [PATCH v7 44/50] drivers/of: Split unflatten_dt_node() Gavin Shan
@ 2015-11-04 18:43   ` Rob Herring
  2015-11-04 23:05     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Rob Herring @ 2015-11-04 18:43 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> The function unflatten_dt_node() is called recursively to unflatten
> device nodes and properties in the FDT blob. It looks complicated
> and hard to be understood.
>
> This splits the function into 3 functions: populate_properties(),
> populate_node() and unflatten_dt_node(). populate_properties(),
> which is called by populate_node(), creates properties for the
> indicated device node. The later one creates the device nodes
> from FDT blob. populate_node() gets the offset in FDT blob for
> next device nodes and then calls populate_node(). No logical
> changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 275 ++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 160 insertions(+), 115 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 6e82bc42..173b036 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -160,39 +160,127 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
>         return res;
>  }
>
> -/**
> - * unflatten_dt_node - Alloc and populate a device_node from the flat tree
> - * @blob: The parent device tree blob
> - * @mem: Memory chunk to use for allocating device nodes and properties
> - * @poffset: pointer to node in flat tree
> - * @dad: Parent struct device_node
> - * @nodepp: The device_node tree created by the call
> - * @fpsize: Size of the node path up at the current depth.
> - * @dryrun: If true, do not allocate device nodes but still calculate needed
> - * memory size
> - */
> -static void * unflatten_dt_node(const void *blob,
> -                               void *mem,
> -                               int *poffset,
> -                               struct device_node *dad,
> -                               struct device_node **nodepp,
> -                               unsigned long fpsize,
> +static void populate_properties(const void *blob,
> +                               int offset,
> +                               void **mem,
> +                               struct device_node *np,
> +                               const char *nodename,
>                                 bool dryrun)
I'd like to make dryrun implicit. It is basically a function of NULL
or near NULL pointers.
>  {
> -       const __be32 *p;
> +       struct property *pp, **pprev = NULL;
> +       int cur;
> +       bool has_name = false;
> +
> +       pprev = &np->properties;
> +       cur = fdt_first_property_offset(blob, offset);
> +       while (cur >= 0) {
This could be better written as a for loop to avoid the gotos:
for (cur = fdt_first_property_offset(blob, offset); cur >=0; cur =
fdt_next_property_offset(blob, cur))
> +               const __be32 *val;
> +               const char *pname;
> +               u32 sz;
> +
> +               val = fdt_getprop_by_offset(blob, cur, &pname, &sz);
> +               if (!val) {
> +                       pr_warn("%s: Cannot locate property at 0x%x\n",
> +                               __func__, cur);
> +                       goto next;
> +               }
> +
> +               if (!pname) {
> +                       pr_warn("%s: Cannot find property name at 0x%x\n",
> +                               __func__, cur);
> +                       goto next;
> +               } else if (!strcmp(pname, "name")) {
> +                       has_name = true;
> +               }
> +
> +               pp = unflatten_dt_alloc(mem, sizeof(struct property),
> +                                       __alignof__(struct property));
> +               if (!dryrun) {
Then:
if (dryrun)
  continue;
to save some indentation and vertical code.
> +                       /* We accept flattened tree phandles either in
> +                        * ePAPR-style "phandle" properties, or the
> +                        * legacy "linux,phandle" properties.  If both
> +                        * appear and have different values, things
> +                        * will get weird. Don't do that.
> +                        */
> +                       if (!strcmp(pname, "phandle") ||
> +                           !strcmp(pname, "linux,phandle")) {
> +                               if (!np->phandle)
> +                                       np->phandle = be32_to_cpup(val);
> +                       }
> +
> +                       /* And we process the "ibm,phandle" property
> +                        * used in pSeries dynamic device tree
> +                        * stuff
> +                        */
> +                       if (!strcmp(pname, "ibm,phandle"))
> +                               np->phandle = be32_to_cpup(val);
> +
> +                       pp->name   = (char *)pname;
> +                       pp->length = sz;
> +                       pp->value  = (__be32 *)val;
> +                       *pprev     = pp;
> +                       pprev      = &pp->next;
> +               }
> +next:
> +               cur = fdt_next_property_offset(blob, cur);
> +       }
> +
> +       /* With version 0x10 we may not have the name property,
> +        * recreate it here from the unit name if absent
> +        */
> +       if (!has_name) {
> +               const char *p = nodename, *ps = p, *pa = NULL;
> +               int len;
> +
> +               while (*p) {
> +                       if ((*p) == '@')
> +                               pa = p;
> +                       else if ((*p) == '/')
> +                               ps = p + 1;
> +                       p++;
> +               }
> +
> +               if (pa < ps)
> +                       pa = p;
> +               len = (pa - ps) + 1;
> +               pp = unflatten_dt_alloc(mem, sizeof(struct property) + len,
> +                                       __alignof__(struct property));
> +               if (!dryrun) {
> +                       pp->name   = "name";
> +                       pp->length = len;
> +                       pp->value  = pp + 1;
> +                       *pprev     = pp;
> +                       pprev      = &pp->next;
> +                       memcpy(pp->value, ps, len - 1);
> +                       ((char *)pp->value)[len - 1] = 0;
> +                       pr_debug("fixed up name for %s -> %s\n",
> +                                nodename, (char *)pp->value);
> +               }
> +       }
> +
> +       if (!dryrun)
> +               *pprev = NULL;
> +}
> +
> +static unsigned long populate_node(const void *blob,
> +                                  int offset,
> +                                  void **mem,
> +                                  struct device_node *dad,
> +                                  unsigned long fpsize,
> +                                  struct device_node **pnp,
> +                                  bool dryrun)
I think dryrun could be implied here too.
> +{
>         struct device_node *np;
> -       struct property *pp, **prev_pp = NULL;
>         const char *pathp;
>         unsigned int l, allocl;
> -       static int depth = 0;
> -       int old_depth;
> -       int offset;
> -       int has_name = 0;
> -       int new_format = 0;
> +       bool new_format = false;
> +       char *fname;
>
> -       pathp = fdt_get_name(blob, *poffset, &l);
> -       if (!pathp)
> -               return mem;
> +       pathp = fdt_get_name(blob, offset, &l);
> +       if (!pathp) {
> +               *pnp = NULL;
Can't pnp be NULL?
> +               return 0;
> +       }
>
>         allocl = ++l;
>
> @@ -202,7 +290,7 @@ static void * unflatten_dt_node(const void *blob,
>          * not '/'.
>          */
>         if ((*pathp) != '/') {
> -               new_format = 1;
> +               new_format = true;
>                 if (fpsize == 0) {
>                         /* root node: special case. fpsize accounts for path
>                          * plus terminating zero. root node only has '/', so
> @@ -222,112 +310,38 @@ static void * unflatten_dt_node(const void *blob,
>                 }
>         }
>
> -       np = unflatten_dt_alloc(&mem, sizeof(struct device_node) + allocl,
> +       np = unflatten_dt_alloc(mem, sizeof(struct device_node) + allocl,
>                                 __alignof__(struct device_node));
>         if (!dryrun) {
> -               char *fn;
>                 of_node_init(np);
> -               np->full_name = fn = ((char *)np) + sizeof(*np);
> +               np->full_name = fname = ((char *)np) + sizeof(*np);
If you kept "fn" that would cut down the diff and make it a bit easier
to review.
>                 if (new_format) {
> -                       /* rebuild full path for new format */
> +                       /* Rebuild full path for new format */
>                         if (dad && dad->parent) {
> -                               strcpy(fn, dad->full_name);
> +                               strcpy(fname, dad->full_name);
>  #ifdef DEBUG
> -                               if ((strlen(fn) + l + 1) != allocl) {
> +                               if ((strlen(fname) + l + 1) != allocl) {
>                                         pr_debug("%s: p: %d, l: %d, a: %d\n",
> -                                               pathp, (int)strlen(fn),
> -                                               l, allocl);
> +                                                pathp, (int)strlen(fn),
This won't compile if enabled (should be fname).
> +                                                l, allocl);
>                                 }
>  #endif
> -                               fn += strlen(fn);
> +                               fname += strlen(fname);
>                         }
> -                       *(fn++) = '/';
> +                       *(fname++) = '/';
>                 }
> -               memcpy(fn, pathp, l);
> +               memcpy(fname, pathp, l);
>
> -               prev_pp = &np->properties;
> -               if (dad != NULL) {
> +               if (dad) {
>                         np->parent = dad;
>                         np->sibling = dad->child;
>                         dad->child = np;
>                 }
>         }
> -       /* process properties */
> -       for (offset = fdt_first_property_offset(blob, *poffset);
> -            (offset >= 0);
> -            (offset = fdt_next_property_offset(blob, offset))) {
> -               const char *pname;
> -               u32 sz;
>
> -               if (!(p = fdt_getprop_by_offset(blob, offset, &pname, &sz))) {
> -                       offset = -FDT_ERR_INTERNAL;
> -                       break;
> -               }
> -
> -               if (pname == NULL) {
> -                       pr_info("Can't find property name in list !\n");
> -                       break;
> -               }
> -               if (strcmp(pname, "name") == 0)
> -                       has_name = 1;
> -               pp = unflatten_dt_alloc(&mem, sizeof(struct property),
> -                                       __alignof__(struct property));
> -               if (!dryrun) {
> -                       /* We accept flattened tree phandles either in
> -                        * ePAPR-style "phandle" properties, or the
> -                        * legacy "linux,phandle" properties.  If both
> -                        * appear and have different values, things
> -                        * will get weird.  Don't do that. */
> -                       if ((strcmp(pname, "phandle") == 0) ||
> -                           (strcmp(pname, "linux,phandle") == 0)) {
> -                               if (np->phandle == 0)
> -                                       np->phandle = be32_to_cpup(p);
> -                       }
> -                       /* And we process the "ibm,phandle" property
> -                        * used in pSeries dynamic device tree
> -                        * stuff */
> -                       if (strcmp(pname, "ibm,phandle") == 0)
> -                               np->phandle = be32_to_cpup(p);
> -                       pp->name = (char *)pname;
> -                       pp->length = sz;
> -                       pp->value = (__be32 *)p;
> -                       *prev_pp = pp;
> -                       prev_pp = &pp->next;
> -               }
> -       }
> -       /* with version 0x10 we may not have the name property, recreate
> -        * it here from the unit name if absent
> -        */
> -       if (!has_name) {
> -               const char *p1 = pathp, *ps = pathp, *pa = NULL;
> -               int sz;
> -
> -               while (*p1) {
> -                       if ((*p1) == '@')
> -                               pa = p1;
> -                       if ((*p1) == '/')
> -                               ps = p1 + 1;
> -                       p1++;
> -               }
> -               if (pa < ps)
> -                       pa = p1;
> -               sz = (pa - ps) + 1;
> -               pp = unflatten_dt_alloc(&mem, sizeof(struct property) + sz,
> -                                       __alignof__(struct property));
> -               if (!dryrun) {
> -                       pp->name = "name";
> -                       pp->length = sz;
> -                       pp->value = pp + 1;
> -                       *prev_pp = pp;
> -                       prev_pp = &pp->next;
> -                       memcpy(pp->value, ps, sz - 1);
> -                       ((char *)pp->value)[sz - 1] = 0;
> -                       pr_debug("fixed up name for %s -> %s\n", pathp,
> -                               (char *)pp->value);
> -               }
> -       }
> +       /* Populate the properties */
Kind of a useless comment.
> +       populate_properties(blob, offset, mem, np, pathp, dryrun);
>         if (!dryrun) {
> -               *prev_pp = NULL;
>                 np->name = of_get_property(np, "name", NULL);
>                 np->type = of_get_property(np, "device_type", NULL);
>
> @@ -337,6 +351,37 @@ static void * unflatten_dt_node(const void *blob,
>                         np->type = "<NULL>";
>         }
>
> +       *pnp = np;
> +       return fpsize;
> +}
> +
> +/**
> + * unflatten_dt_node - Alloc and populate a device_node from the flat tree
> + * @blob: The parent device tree blob
> + * @mem: Memory chunk to use for allocating device nodes and properties
> + * @poffset: pointer to node in flat tree
> + * @dad: Parent struct device_node
> + * @nodepp: The device_node tree created by the call
> + * @fpsize: Size of the node path up at the current depth.
> + * @dryrun: If true, do not allocate device nodes but still calculate needed
> + * memory size
> + */
> +static void *unflatten_dt_node(const void *blob,
> +                              void *mem,
> +                              int *poffset,
> +                              struct device_node *dad,
> +                              struct device_node **nodepp,
> +                              unsigned long fpsize,
> +                              bool dryrun)
> +{
> +       struct device_node *np;
> +       static int depth;
> +       int old_depth;
> +
> +       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
Doesn't this give a warning assigning a ptr to long?
Looks like np can be moved internal to populate_node.
> +       if (!fpsize)
> +               return mem;
> +
>         old_depth = depth;
>         *poffset = fdt_next_node(blob, *poffset, &depth);
>         if (depth < 0)
> --
> 2.1.0
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 44/50] drivers/of: Split unflatten_dt_node()
  2015-11-04 18:43   ` Rob Herring
@ 2015-11-04 23:05     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 23:05 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Wed, Nov 04, 2015 at 12:43:08PM -0600, Rob Herring wrote:
>On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> The function unflatten_dt_node() is called recursively to unflatten
>> device nodes and properties in the FDT blob. It looks complicated
>> and hard to be understood.
>>
>> This splits the function into 3 functions: populate_properties(),
>> populate_node() and unflatten_dt_node(). populate_properties(),
>> which is called by populate_node(), creates properties for the
>> indicated device node. The later one creates the device nodes
>> from FDT blob. populate_node() gets the offset in FDT blob for
>> next device nodes and then calls populate_node(). No logical
>> changes introduced.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c | 275 ++++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 160 insertions(+), 115 deletions(-)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index 6e82bc42..173b036 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -160,39 +160,127 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
>>         return res;
>>  }
>>
>> -/**
>> - * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>> - * @blob: The parent device tree blob
>> - * @mem: Memory chunk to use for allocating device nodes and properties
>> - * @poffset: pointer to node in flat tree
>> - * @dad: Parent struct device_node
>> - * @nodepp: The device_node tree created by the call
>> - * @fpsize: Size of the node path up at the current depth.
>> - * @dryrun: If true, do not allocate device nodes but still calculate needed
>> - * memory size
>> - */
>> -static void * unflatten_dt_node(const void *blob,
>> -                               void *mem,
>> -                               int *poffset,
>> -                               struct device_node *dad,
>> -                               struct device_node **nodepp,
>> -                               unsigned long fpsize,
>> +static void populate_properties(const void *blob,
>> +                               int offset,
>> +                               void **mem,
>> +                               struct device_node *np,
>> +                               const char *nodename,
>>                                 bool dryrun)
>
>I'd like to make dryrun implicit. It is basically a function of NULL
>or near NULL pointers.
>
[1] The condition would be something like below:
    if ((unsigned long)(*mem) < limit)
	dryrun = true;
    else
        dryrun = false;
The question here is how to choose a sane @limit in practice. In !dryrun case,
the memory is allocated from memblock at system booting time, or slab when system
is up. memblock could assign memory in bottom-up fasion.
When @limit is too small, the condition wouldn't be comprehensive in !dryrun
case. When @limit is too large, the !dryrun cases can be regarded as dryrun case.
>>  {
>> -       const __be32 *p;
>> +       struct property *pp, **pprev = NULL;
>> +       int cur;
>> +       bool has_name = false;
>> +
>> +       pprev = &np->properties;
>> +       cur = fdt_first_property_offset(blob, offset);
>> +       while (cur >= 0) {
>
>This could be better written as a for loop to avoid the gotos:
>
>for (cur = fdt_first_property_offset(blob, offset); cur >=0; cur =
>fdt_next_property_offset(blob, cur))
>
Thanks, the changes will be included in next revision.
>> +               const __be32 *val;
>> +               const char *pname;
>> +               u32 sz;
>> +
>> +               val = fdt_getprop_by_offset(blob, cur, &pname, &sz);
>> +               if (!val) {
>> +                       pr_warn("%s: Cannot locate property at 0x%x\n",
>> +                               __func__, cur);
>> +                       goto next;
>> +               }
>> +
>> +               if (!pname) {
>> +                       pr_warn("%s: Cannot find property name at 0x%x\n",
>> +                               __func__, cur);
>> +                       goto next;
>> +               } else if (!strcmp(pname, "name")) {
>> +                       has_name = true;
>> +               }
>> +
>> +               pp = unflatten_dt_alloc(mem, sizeof(struct property),
>> +                                       __alignof__(struct property));
>> +               if (!dryrun) {
>
>Then:
>
>if (dryrun)
>  continue;
>
>to save some indentation and vertical code.
>
Good idea, it will be included in next revision.
>> +                       /* We accept flattened tree phandles either in
>> +                        * ePAPR-style "phandle" properties, or the
>> +                        * legacy "linux,phandle" properties.  If both
>> +                        * appear and have different values, things
>> +                        * will get weird. Don't do that.
>> +                        */
>> +                       if (!strcmp(pname, "phandle") ||
>> +                           !strcmp(pname, "linux,phandle")) {
>> +                               if (!np->phandle)
>> +                                       np->phandle = be32_to_cpup(val);
>> +                       }
>> +
>> +                       /* And we process the "ibm,phandle" property
>> +                        * used in pSeries dynamic device tree
>> +                        * stuff
>> +                        */
>> +                       if (!strcmp(pname, "ibm,phandle"))
>> +                               np->phandle = be32_to_cpup(val);
>> +
>> +                       pp->name   = (char *)pname;
>> +                       pp->length = sz;
>> +                       pp->value  = (__be32 *)val;
>> +                       *pprev     = pp;
>> +                       pprev      = &pp->next;
>> +               }
>> +next:
>> +               cur = fdt_next_property_offset(blob, cur);
>> +       }
>> +
>> +       /* With version 0x10 we may not have the name property,
>> +        * recreate it here from the unit name if absent
>> +        */
>> +       if (!has_name) {
>> +               const char *p = nodename, *ps = p, *pa = NULL;
>> +               int len;
>> +
>> +               while (*p) {
>> +                       if ((*p) == '@')
>> +                               pa = p;
>> +                       else if ((*p) == '/')
>> +                               ps = p + 1;
>> +                       p++;
>> +               }
>> +
>> +               if (pa < ps)
>> +                       pa = p;
>> +               len = (pa - ps) + 1;
>> +               pp = unflatten_dt_alloc(mem, sizeof(struct property) + len,
>> +                                       __alignof__(struct property));
>> +               if (!dryrun) {
>> +                       pp->name   = "name";
>> +                       pp->length = len;
>> +                       pp->value  = pp + 1;
>> +                       *pprev     = pp;
>> +                       pprev      = &pp->next;
>> +                       memcpy(pp->value, ps, len - 1);
>> +                       ((char *)pp->value)[len - 1] = 0;
>> +                       pr_debug("fixed up name for %s -> %s\n",
>> +                                nodename, (char *)pp->value);
>> +               }
>> +       }
>> +
>> +       if (!dryrun)
>> +               *pprev = NULL;
>> +}
>> +
>> +static unsigned long populate_node(const void *blob,
>> +                                  int offset,
>> +                                  void **mem,
>> +                                  struct device_node *dad,
>> +                                  unsigned long fpsize,
>> +                                  struct device_node **pnp,
>> +                                  bool dryrun)
>
>I think dryrun could be implied here too.
>
I don't think so. Please refer to the explanation at [1].
>> +{
>>         struct device_node *np;
>> -       struct property *pp, **prev_pp = NULL;
>>         const char *pathp;
>>         unsigned int l, allocl;
>> -       static int depth = 0;
>> -       int old_depth;
>> -       int offset;
>> -       int has_name = 0;
>> -       int new_format = 0;
>> +       bool new_format = false;
>> +       char *fname;
>>
>> -       pathp = fdt_get_name(blob, *poffset, &l);
>> -       if (!pathp)
>> -               return mem;
>> +       pathp = fdt_get_name(blob, offset, &l);
>> +       if (!pathp) {
>> +               *pnp = NULL;
>
>Can't pnp be NULL?
>
It can't be NULL in both dryrun and !dryrun cases.
>> +               return 0;
>> +       }
>>
>>         allocl = ++l;
>>
>> @@ -202,7 +290,7 @@ static void * unflatten_dt_node(const void *blob,
>>          * not '/'.
>>          */
>>         if ((*pathp) != '/') {
>> -               new_format = 1;
>> +               new_format = true;
>>                 if (fpsize == 0) {
>>                         /* root node: special case. fpsize accounts for path
>>                          * plus terminating zero. root node only has '/', so
>> @@ -222,112 +310,38 @@ static void * unflatten_dt_node(const void *blob,
>>                 }
>>         }
>>
>> -       np = unflatten_dt_alloc(&mem, sizeof(struct device_node) + allocl,
>> +       np = unflatten_dt_alloc(mem, sizeof(struct device_node) + allocl,
>>                                 __alignof__(struct device_node));
>>         if (!dryrun) {
>> -               char *fn;
>>                 of_node_init(np);
>> -               np->full_name = fn = ((char *)np) + sizeof(*np);
>> +               np->full_name = fname = ((char *)np) + sizeof(*np);
>
>If you kept "fn" that would cut down the diff and make it a bit easier
>to review.
>
Agree, I'll drop the rename in next revision. I perhaps have separate
patch to do the renaming after this patch. In that way, unrelated code
changes will be avoided in this one.
>>                 if (new_format) {
>> -                       /* rebuild full path for new format */
>> +                       /* Rebuild full path for new format */
>>                         if (dad && dad->parent) {
>> -                               strcpy(fn, dad->full_name);
>> +                               strcpy(fname, dad->full_name);
>>  #ifdef DEBUG
>> -                               if ((strlen(fn) + l + 1) != allocl) {
>> +                               if ((strlen(fname) + l + 1) != allocl) {
>>                                         pr_debug("%s: p: %d, l: %d, a: %d\n",
>> -                                               pathp, (int)strlen(fn),
>> -                                               l, allocl);
>> +                                                pathp, (int)strlen(fn),
>
>This won't compile if enabled (should be fname).
>
Indeed, I even didn't try to compile this piece of debugging code. Will change
accordingly in next revision.
>> +                                                l, allocl);
>>                                 }
>>  #endif
>> -                               fn += strlen(fn);
>> +                               fname += strlen(fname);
>>                         }
>> -                       *(fn++) = '/';
>> +                       *(fname++) = '/';
>>                 }
>> -               memcpy(fn, pathp, l);
>> +               memcpy(fname, pathp, l);
>>
>> -               prev_pp = &np->properties;
>> -               if (dad != NULL) {
>> +               if (dad) {
>>                         np->parent = dad;
>>                         np->sibling = dad->child;
>>                         dad->child = np;
>>                 }
>>         }
>> -       /* process properties */
>> -       for (offset = fdt_first_property_offset(blob, *poffset);
>> -            (offset >= 0);
>> -            (offset = fdt_next_property_offset(blob, offset))) {
>> -               const char *pname;
>> -               u32 sz;
>>
>> -               if (!(p = fdt_getprop_by_offset(blob, offset, &pname, &sz))) {
>> -                       offset = -FDT_ERR_INTERNAL;
>> -                       break;
>> -               }
>> -
>> -               if (pname == NULL) {
>> -                       pr_info("Can't find property name in list !\n");
>> -                       break;
>> -               }
>> -               if (strcmp(pname, "name") == 0)
>> -                       has_name = 1;
>> -               pp = unflatten_dt_alloc(&mem, sizeof(struct property),
>> -                                       __alignof__(struct property));
>> -               if (!dryrun) {
>> -                       /* We accept flattened tree phandles either in
>> -                        * ePAPR-style "phandle" properties, or the
>> -                        * legacy "linux,phandle" properties.  If both
>> -                        * appear and have different values, things
>> -                        * will get weird.  Don't do that. */
>> -                       if ((strcmp(pname, "phandle") == 0) ||
>> -                           (strcmp(pname, "linux,phandle") == 0)) {
>> -                               if (np->phandle == 0)
>> -                                       np->phandle = be32_to_cpup(p);
>> -                       }
>> -                       /* And we process the "ibm,phandle" property
>> -                        * used in pSeries dynamic device tree
>> -                        * stuff */
>> -                       if (strcmp(pname, "ibm,phandle") == 0)
>> -                               np->phandle = be32_to_cpup(p);
>> -                       pp->name = (char *)pname;
>> -                       pp->length = sz;
>> -                       pp->value = (__be32 *)p;
>> -                       *prev_pp = pp;
>> -                       prev_pp = &pp->next;
>> -               }
>> -       }
>> -       /* with version 0x10 we may not have the name property, recreate
>> -        * it here from the unit name if absent
>> -        */
>> -       if (!has_name) {
>> -               const char *p1 = pathp, *ps = pathp, *pa = NULL;
>> -               int sz;
>> -
>> -               while (*p1) {
>> -                       if ((*p1) == '@')
>> -                               pa = p1;
>> -                       if ((*p1) == '/')
>> -                               ps = p1 + 1;
>> -                       p1++;
>> -               }
>> -               if (pa < ps)
>> -                       pa = p1;
>> -               sz = (pa - ps) + 1;
>> -               pp = unflatten_dt_alloc(&mem, sizeof(struct property) + sz,
>> -                                       __alignof__(struct property));
>> -               if (!dryrun) {
>> -                       pp->name = "name";
>> -                       pp->length = sz;
>> -                       pp->value = pp + 1;
>> -                       *prev_pp = pp;
>> -                       prev_pp = &pp->next;
>> -                       memcpy(pp->value, ps, sz - 1);
>> -                       ((char *)pp->value)[sz - 1] = 0;
>> -                       pr_debug("fixed up name for %s -> %s\n", pathp,
>> -                               (char *)pp->value);
>> -               }
>> -       }
>> +       /* Populate the properties */
>
>Kind of a useless comment.
>
Agree, will drop in next revision.
>> +       populate_properties(blob, offset, mem, np, pathp, dryrun);
>>         if (!dryrun) {
>> -               *prev_pp = NULL;
>>                 np->name = of_get_property(np, "name", NULL);
>>                 np->type = of_get_property(np, "device_type", NULL);
>>
>> @@ -337,6 +351,37 @@ static void * unflatten_dt_node(const void *blob,
>>                         np->type = "<NULL>";
>>         }
>>
>> +       *pnp = np;
>> +       return fpsize;
>> +}
>> +
>> +/**
>> + * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>> + * @blob: The parent device tree blob
>> + * @mem: Memory chunk to use for allocating device nodes and properties
>> + * @poffset: pointer to node in flat tree
>> + * @dad: Parent struct device_node
>> + * @nodepp: The device_node tree created by the call
>> + * @fpsize: Size of the node path up at the current depth.
>> + * @dryrun: If true, do not allocate device nodes but still calculate needed
>> + * memory size
>> + */
>> +static void *unflatten_dt_node(const void *blob,
>> +                              void *mem,
>> +                              int *poffset,
>> +                              struct device_node *dad,
>> +                              struct device_node **nodepp,
>> +                              unsigned long fpsize,
>> +                              bool dryrun)
>> +{
>> +       struct device_node *np;
>> +       static int depth;
>> +       int old_depth;
>> +
>> +       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
>
>Doesn't this give a warning assigning a ptr to long?
>
>Looks like np can be moved internal to populate_node.
>
@fpsize and the return value of populate_node() are "unsigned long".
Nope, @np will be used in next patch in this function, which tracks the
device node unflattened in last level of depth.
>
>> +       if (!fpsize)
>> +               return mem;
>> +
>>         old_depth = depth;
>>         *poffset = fdt_next_node(blob, *poffset, &depth);
>>         if (depth < 0)
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-11-04 16:07   ` Rob Herring
@ 2015-11-04 23:23     ` Gavin Shan
  2015-11-04 23:26       ` Gavin Shan
  2016-05-13  7:16     ` Geert Uytterhoeven
  1 sibling, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 23:23 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Wed, Nov 04, 2015 at 10:07:50AM -0600, Rob Herring wrote:
>On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> In current implementation, unflatten_dt_node() is called recursively
>> to unflatten device nodes in FDT blob. It's stress to limited stack
>> capacity.
>
>Did you actually hit a problem?
>
>Now we have a max depth of 64. Seems like that should be plenty... Any
>idea how this compares to when we run out of stack space?
>
When I rebased last revision (v6), particular below patch, to 4.3.rc6,
the kernel won't boot in P7 and P8 boxes. On P7 boxes, the stack overruns
according to the printed kernel messages. On P8 boxes, the /bin/init in
initramfs image can't be loaded/executed properly and it's potentially
caused by memory corruption. That's why I reworked it to avoid recursive
calling to unflatten_dt_node().
The max depth "64" wasn't selected based on the stack usage. I was thinking
the device tree is converted to friendly *.dts format and it's using TAB
as the prefix for each line. If the device tree has 64 depth, Each line
in *.dts for leaf nodes have to be wrapped and spanning multiple lines.
That's why I choosed 64, maybe 32 is enough. Did you see a device-tree
that has more than 16 depth in field? :-)
>> This avoids calling the function recursively, meaning the device
>> nodes are unflattened in one call on unflatten_dt_node(): two arrays
>> are introduced to track the parent path size and the device node of
>> current level of depth, which will be used by the device node on next
>> level of depth to be unflattened. Also, the parameter "poffset" and
>> "fpsize" are unused and dropped.
>
>Yay. I'm happy to see parameters removed instead of added to this function.
>
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c | 94 +++++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 56 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index 173b036..f4793d0 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -355,61 +355,82 @@ static unsigned long populate_node(const void *blob,
>>         return fpsize;
>>  }
>>
>> +static void reverse_nodes(struct device_node *parent)
>> +{
>> +       struct device_node *child, *next;
>> +
>> +       /* In-depth first */
>> +       child = parent->child;
>> +       while (child) {
>> +               reverse_nodes(child);
>> +
>> +               child = child->sibling;
>> +       }
>> +
>> +       /* Reverse the nodes in the child list */
>> +       child = parent->child;
>> +       parent->child = NULL;
>> +       while (child) {
>> +               next = child->sibling;
>> +
>> +               child->sibling = parent->child;
>> +               parent->child = child;
>> +               child = next;
>> +       }
>> +}
>> +
>>  /**
>>   * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>>   * @blob: The parent device tree blob
>>   * @mem: Memory chunk to use for allocating device nodes and properties
>> - * @poffset: pointer to node in flat tree
>>   * @dad: Parent struct device_node
>>   * @nodepp: The device_node tree created by the call
>> - * @fpsize: Size of the node path up at the current depth.
>>   * @dryrun: If true, do not allocate device nodes but still calculate needed
>>   * memory size
>>   */
>>  static void *unflatten_dt_node(const void *blob,
>>                                void *mem,
>> -                              int *poffset,
>>                                struct device_node *dad,
>>                                struct device_node **nodepp,
>> -                              unsigned long fpsize,
>>                                bool dryrun)
>
>We can probably further simplify things by returning an int with
>negative being errors and positive being the size. Also, dryrun can be
>dropped and implied by mem and/or nodepp being NULL.
>
Yeah, I think it's reasonable to return "size" from this function. "dryrun"
can be dropped and implied by NULL @mem. @nodepp can't be NULL. I perhaps
have separate patch to address it in next revision.
>>  {
>> -       struct device_node *np;
>> -       static int depth;
>> -       int old_depth;
>> -
>> -       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
>> -       if (!fpsize)
>> -               return mem;
>> +       struct device_node *root;
>> +       int offset = 0, depth = 0;
>> +       unsigned long fpsizes[64];
>> +       struct device_node *nps[64];
>
>Use a define here.
>
Fair enough, will do in next revision. I'm not good at naming. Would
"FDT_MAX_DEPTH" is a good one?
>>
>> -       old_depth = depth;
>> -       *poffset = fdt_next_node(blob, *poffset, &depth);
>> -       if (depth < 0)
>> -               depth = 0;
>> -       while (*poffset > 0 && depth > old_depth)
>> -               mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
>> -                                       fpsize, dryrun);
>> +       if (nodepp)
>> +               *nodepp = NULL;
>> +
>> +       root = dad;
>> +       fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
>> +       nps[depth++] = dad;
>> +       while (offset >= 0 && depth < 64) {
>> +               fpsizes[depth] = populate_node(blob, offset, &mem,
>> +                                              nps[depth - 1],
>> +                                              fpsizes[depth - 1],
>> +                                              &nps[depth], dryrun);
>> +               if (!fpsizes[depth])
>> +                       return mem;
>> +
>> +               if (!dryrun && nodepp && !*nodepp)
>> +                       *nodepp = nps[depth];
>> +               if (!dryrun && !root)
>> +                       root = nps[depth];
>> +
>> +               offset = fdt_next_node(blob, offset, &depth);
>> +       }
>>
>> -       if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
>> -               pr_err("unflatten: error %d processing FDT\n", *poffset);
>> +       if (offset < 0 && offset != -FDT_ERR_NOTFOUND)
>> +               pr_err("%s: Error %d processing FDT\n",
>> +                      __func__, offset);
>
>What about depth == 64 case? I think the behavior should be a WARN and
>ignore those nodes so we at least can continue to boot and see the
>error. Of course, if there is a phandle pointing to ignored nodes, we
>have to handle that too.
>
Yeah, I'll have a WARN_ON(depth >= 64) in next revision. Sorry, I didn't
get the 2nd part of your comments: When depth > 64, the system won't work.
It might boot up. Why the phandle pointing to the ignored node has to be
dropped? 
>>
>>         /*
>>          * Reverse the child list. Some drivers assumes node order matches .dts
>>          * node order
>>          */
>> -       if (!dryrun && np->child) {
>> -               struct device_node *child = np->child;
>> -               np->child = NULL;
>> -               while (child) {
>> -                       struct device_node *next = child->sibling;
>> -                       child->sibling = np->child;
>> -                       np->child = child;
>> -                       child = next;
>> -               }
>> -       }
>> -
>> -       if (nodepp)
>> -               *nodepp = np;
>> +       if (!dryrun)
>> +               reverse_nodes(root);
>>
>>         return mem;
>>  }
>> @@ -431,7 +452,6 @@ static void __unflatten_device_tree(const void *blob,
>>                              void * (*dt_alloc)(u64 size, u64 align))
>>  {
>>         unsigned long size;
>> -       int start;
>>         void *mem;
>>
>>         pr_debug(" -> unflatten_device_tree()\n");
>> @@ -452,8 +472,7 @@ static void __unflatten_device_tree(const void *blob,
>>         }
>>
>>         /* First pass, scan for size */
>> -       start = 0;
>> -       size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
>> +       size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
>>         size = ALIGN(size, 4);
>>
>>         pr_debug("  size is %lx, allocating...\n", size);
>> @@ -467,8 +486,7 @@ static void __unflatten_device_tree(const void *blob,
>>         pr_debug("  unflattening %p...\n", mem);
>>
>>         /* Second pass, do actual unflattening */
>> -       start = 0;
>> -       unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
>> +       unflatten_dt_node(blob, mem, NULL, mynodes, false);
>>         if (be32_to_cpup(mem + size) != 0xdeadbeef)
>>                 pr_warning("End of tree marker overwritten: %08x\n",
>>                            be32_to_cpup(mem + size));
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 49/50] drivers/of: Export OF changeset functions
  2015-11-04 16:12   ` Rob Herring
@ 2015-11-04 23:23     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 23:23 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, devicetree@vger.kernel.org, Frank Rowand, aik,
	linux-pci@vger.kernel.org, Pantelis Antoniou, Grant Likely,
	Bjorn Helgaas, linuxppc-dev
On Wed, Nov 04, 2015 at 10:12:00AM -0600, Rob Herring wrote:
>On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> The PowerNV PCI hotplug driver is going to use the OF changeset
>> to manage the changed device sub-tree. This exports those OF
>> changeset functions for that.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>Acked-by: Rob Herring <robh@kernel.org>
>
Rob, thank you for the quick response :-)
Thanks,
Gavin
>> ---
>>  drivers/of/dynamic.c    | 65 ++++++++++++++++++++++++++++++++++---------------
>>  drivers/of/of_private.h |  2 ++
>>  drivers/of/overlay.c    |  8 +++---
>>  drivers/of/unittest.c   |  4 ---
>>  4 files changed, 52 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index 53826b8..c647bd1 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
>>         memset(ocs, 0, sizeof(*ocs));
>>         INIT_LIST_HEAD(&ocs->entries);
>>  }
>> +EXPORT_SYMBOL_GPL(of_changeset_init);
>>
>>  /**
>>   * of_changeset_destroy - Destroy a changeset
>> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
>>         list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>>                 __of_changeset_entry_destroy(ce);
>>  }
>> +EXPORT_SYMBOL_GPL(of_changeset_destroy);
>>
>> -/**
>> - * of_changeset_apply - Applies a changeset
>> - *
>> - * @ocs:       changeset pointer
>> - *
>> - * Applies a changeset to the live tree.
>> - * Any side-effects of live tree state changes are applied here on
>> - * sucess, like creation/destruction of devices and side-effects
>> - * like creation of sysfs properties and directories.
>> - * Returns 0 on success, a negative error value in case of an error.
>> - * On error the partially applied effects are reverted.
>> - */
>> -int of_changeset_apply(struct of_changeset *ocs)
>> +int __of_changeset_apply(struct of_changeset *ocs)
>>  {
>>         struct of_changeset_entry *ce;
>>         int ret;
>> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
>>  }
>>
>>  /**
>> - * of_changeset_revert - Reverts an applied changeset
>> + * of_changeset_apply - Applies a changeset
>>   *
>>   * @ocs:       changeset pointer
>>   *
>> - * Reverts a changeset returning the state of the tree to what it
>> - * was before the application.
>> - * Any side-effects like creation/destruction of devices and
>> - * removal of sysfs properties and directories are applied.
>> + * Applies a changeset to the live tree.
>> + * Any side-effects of live tree state changes are applied here on
>> + * success, like creation/destruction of devices and side-effects
>> + * like creation of sysfs properties and directories.
>>   * Returns 0 on success, a negative error value in case of an error.
>> + * On error the partially applied effects are reverted.
>>   */
>> -int of_changeset_revert(struct of_changeset *ocs)
>> +int of_changeset_apply(struct of_changeset *ocs)
>> +{
>> +       int ret;
>> +
>> +       mutex_lock(&of_mutex);
>> +       ret = __of_changeset_apply(ocs);
>> +       mutex_unlock(&of_mutex);
>> +
>> +       return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_changeset_apply);
>> +
>> +int __of_changeset_revert(struct of_changeset *ocs)
>>  {
>>         struct of_changeset_entry *ce;
>>         int ret;
>> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
>>  }
>>
>>  /**
>> + * of_changeset_revert - Reverts an applied changeset
>> + *
>> + * @ocs:       changeset pointer
>> + *
>> + * Reverts a changeset returning the state of the tree to what it
>> + * was before the application.
>> + * Any side-effects like creation/destruction of devices and
>> + * removal of sysfs properties and directories are applied.
>> + * Returns 0 on success, a negative error value in case of an error.
>> + */
>> +int of_changeset_revert(struct of_changeset *ocs)
>> +{
>> +       int ret;
>> +
>> +       mutex_lock(&of_mutex);
>> +       ret = __of_changeset_revert(ocs);
>> +       mutex_unlock(&of_mutex);
>> +
>> +       return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_changeset_revert);
>> +
>> +/**
>>   * of_changeset_action - Perform a changeset action
>>   *
>>   * @ocs:       changeset pointer
>> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
>>         list_add_tail(&ce->node, &ocs->entries);
>>         return 0;
>>  }
>> +EXPORT_SYMBOL_GPL(of_changeset_action);
>> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
>> index 8e882e7..829469f 100644
>> --- a/drivers/of/of_private.h
>> +++ b/drivers/of/of_private.h
>> @@ -45,6 +45,8 @@ static inline struct device_node *kobj_to_device_node(struct kobject *kobj)
>>  extern int of_property_notify(int action, struct device_node *np,
>>                               struct property *prop, struct property *old_prop);
>>  extern void of_node_release(struct kobject *kobj);
>> +extern int __of_changeset_apply(struct of_changeset *ocs);
>> +extern int __of_changeset_revert(struct of_changeset *ocs);
>>  #else /* CONFIG_OF_DYNAMIC */
>>  static inline int of_property_notify(int action, struct device_node *np,
>>                                      struct property *prop, struct property *old_prop)
>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
>> index 24e025f..804ea33 100644
>> --- a/drivers/of/overlay.c
>> +++ b/drivers/of/overlay.c
>> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
>>         }
>>
>>         /* apply the changeset */
>> -       err = of_changeset_apply(&ov->cset);
>> +       err = __of_changeset_apply(&ov->cset);
>>         if (err) {
>> -               pr_err("%s: of_changeset_apply() failed for tree@%s\n",
>> +               pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
>>                                 __func__, tree->full_name);
>>                 goto err_revert_overlay;
>>         }
>> @@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
>>
>>
>>         list_del(&ov->node);
>> -       of_changeset_revert(&ov->cset);
>> +       __of_changeset_revert(&ov->cset);
>>         of_free_overlay_info(ov);
>>         idr_remove(&ov_idr, id);
>>         of_changeset_destroy(&ov->cset);
>> @@ -539,7 +539,7 @@ int of_overlay_destroy_all(void)
>>         /* the tail of list is guaranteed to be safe to remove */
>>         list_for_each_entry_safe_reverse(ov, ovn, &ov_list, node) {
>>                 list_del(&ov->node);
>> -               of_changeset_revert(&ov->cset);
>> +               __of_changeset_revert(&ov->cset);
>>                 of_free_overlay_info(ov);
>>                 idr_remove(&ov_idr, ov->id);
>>                 kfree(ov);
>> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
>> index bafcf66..dad3fd2 100644
>> --- a/drivers/of/unittest.c
>> +++ b/drivers/of/unittest.c
>> @@ -526,18 +526,14 @@ static void __init of_unittest_changeset(void)
>>         unittest(!of_changeset_add_property(&chgset, parent, ppadd), "fail add prop\n");
>>         unittest(!of_changeset_update_property(&chgset, parent, ppupdate), "fail update prop\n");
>>         unittest(!of_changeset_remove_property(&chgset, parent, ppremove), "fail remove prop\n");
>> -       mutex_lock(&of_mutex);
>>         unittest(!of_changeset_apply(&chgset), "apply failed\n");
>> -       mutex_unlock(&of_mutex);
>>
>>         /* Make sure node names are constructed correctly */
>>         unittest((np = of_find_node_by_path("/testcase-data/changeset/n2/n21")),
>>                  "'%s' not added\n", n21->full_name);
>>         of_node_put(np);
>>
>> -       mutex_lock(&of_mutex);
>>         unittest(!of_changeset_revert(&chgset), "revert failed\n");
>> -       mutex_unlock(&of_mutex);
>>
>>         of_changeset_destroy(&chgset);
>>  #endif
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-11-04 23:23     ` Gavin Shan
@ 2015-11-04 23:26       ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-04 23:26 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Rob Herring, linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Thu, Nov 05, 2015 at 10:23:15AM +1100, Gavin Shan wrote:
>On Wed, Nov 04, 2015 at 10:07:50AM -0600, Rob Herring wrote:
>>On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>> In current implementation, unflatten_dt_node() is called recursively
>>> to unflatten device nodes in FDT blob. It's stress to limited stack
>>> capacity.
>>
>>Did you actually hit a problem?
>>
>>Now we have a max depth of 64. Seems like that should be plenty... Any
>>idea how this compares to when we run out of stack space?
>>
>
>When I rebased last revision (v6), particular below patch, to 4.3.rc6,
>the kernel won't boot in P7 and P8 boxes. On P7 boxes, the stack overruns
>according to the printed kernel messages. On P8 boxes, the /bin/init in
>initramfs image can't be loaded/executed properly and it's potentially
>caused by memory corruption. That's why I reworked it to avoid recursive
>calling to unflatten_dt_node().
>
Missed the link to the patch here:
https://patchwork.ozlabs.org/patch/504512/
>The max depth "64" wasn't selected based on the stack usage. I was thinking
>the device tree is converted to friendly *.dts format and it's using TAB
>as the prefix for each line. If the device tree has 64 depth, Each line
>in *.dts for leaf nodes have to be wrapped and spanning multiple lines.
>That's why I choosed 64, maybe 32 is enough. Did you see a device-tree
>that has more than 16 depth in field? :-)
>
>>> This avoids calling the function recursively, meaning the device
>>> nodes are unflattened in one call on unflatten_dt_node(): two arrays
>>> are introduced to track the parent path size and the device node of
>>> current level of depth, which will be used by the device node on next
>>> level of depth to be unflattened. Also, the parameter "poffset" and
>>> "fpsize" are unused and dropped.
>>
>>Yay. I'm happy to see parameters removed instead of added to this function.
>>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  drivers/of/fdt.c | 94 +++++++++++++++++++++++++++++++++-----------------------
>>>  1 file changed, 56 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>>> index 173b036..f4793d0 100644
>>> --- a/drivers/of/fdt.c
>>> +++ b/drivers/of/fdt.c
>>> @@ -355,61 +355,82 @@ static unsigned long populate_node(const void *blob,
>>>         return fpsize;
>>>  }
>>>
>>> +static void reverse_nodes(struct device_node *parent)
>>> +{
>>> +       struct device_node *child, *next;
>>> +
>>> +       /* In-depth first */
>>> +       child = parent->child;
>>> +       while (child) {
>>> +               reverse_nodes(child);
>>> +
>>> +               child = child->sibling;
>>> +       }
>>> +
>>> +       /* Reverse the nodes in the child list */
>>> +       child = parent->child;
>>> +       parent->child = NULL;
>>> +       while (child) {
>>> +               next = child->sibling;
>>> +
>>> +               child->sibling = parent->child;
>>> +               parent->child = child;
>>> +               child = next;
>>> +       }
>>> +}
>>> +
>>>  /**
>>>   * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>>>   * @blob: The parent device tree blob
>>>   * @mem: Memory chunk to use for allocating device nodes and properties
>>> - * @poffset: pointer to node in flat tree
>>>   * @dad: Parent struct device_node
>>>   * @nodepp: The device_node tree created by the call
>>> - * @fpsize: Size of the node path up at the current depth.
>>>   * @dryrun: If true, do not allocate device nodes but still calculate needed
>>>   * memory size
>>>   */
>>>  static void *unflatten_dt_node(const void *blob,
>>>                                void *mem,
>>> -                              int *poffset,
>>>                                struct device_node *dad,
>>>                                struct device_node **nodepp,
>>> -                              unsigned long fpsize,
>>>                                bool dryrun)
>>
>>We can probably further simplify things by returning an int with
>>negative being errors and positive being the size. Also, dryrun can be
>>dropped and implied by mem and/or nodepp being NULL.
>>
>
>Yeah, I think it's reasonable to return "size" from this function. "dryrun"
>can be dropped and implied by NULL @mem. @nodepp can't be NULL. I perhaps
>have separate patch to address it in next revision.
>
>>>  {
>>> -       struct device_node *np;
>>> -       static int depth;
>>> -       int old_depth;
>>> -
>>> -       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
>>> -       if (!fpsize)
>>> -               return mem;
>>> +       struct device_node *root;
>>> +       int offset = 0, depth = 0;
>>> +       unsigned long fpsizes[64];
>>> +       struct device_node *nps[64];
>>
>>Use a define here.
>>
>
>Fair enough, will do in next revision. I'm not good at naming. Would
>"FDT_MAX_DEPTH" is a good one?
>
>>>
>>> -       old_depth = depth;
>>> -       *poffset = fdt_next_node(blob, *poffset, &depth);
>>> -       if (depth < 0)
>>> -               depth = 0;
>>> -       while (*poffset > 0 && depth > old_depth)
>>> -               mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
>>> -                                       fpsize, dryrun);
>>> +       if (nodepp)
>>> +               *nodepp = NULL;
>>> +
>>> +       root = dad;
>>> +       fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
>>> +       nps[depth++] = dad;
>>> +       while (offset >= 0 && depth < 64) {
>>> +               fpsizes[depth] = populate_node(blob, offset, &mem,
>>> +                                              nps[depth - 1],
>>> +                                              fpsizes[depth - 1],
>>> +                                              &nps[depth], dryrun);
>>> +               if (!fpsizes[depth])
>>> +                       return mem;
>>> +
>>> +               if (!dryrun && nodepp && !*nodepp)
>>> +                       *nodepp = nps[depth];
>>> +               if (!dryrun && !root)
>>> +                       root = nps[depth];
>>> +
>>> +               offset = fdt_next_node(blob, offset, &depth);
>>> +       }
>>>
>>> -       if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
>>> -               pr_err("unflatten: error %d processing FDT\n", *poffset);
>>> +       if (offset < 0 && offset != -FDT_ERR_NOTFOUND)
>>> +               pr_err("%s: Error %d processing FDT\n",
>>> +                      __func__, offset);
>>
>>What about depth == 64 case? I think the behavior should be a WARN and
>>ignore those nodes so we at least can continue to boot and see the
>>error. Of course, if there is a phandle pointing to ignored nodes, we
>>have to handle that too.
>>
>
>Yeah, I'll have a WARN_ON(depth >= 64) in next revision. Sorry, I didn't
>get the 2nd part of your comments: When depth > 64, the system won't work.
>It might boot up. Why the phandle pointing to the ignored node has to be
>dropped? 
>
>>>
>>>         /*
>>>          * Reverse the child list. Some drivers assumes node order matches .dts
>>>          * node order
>>>          */
>>> -       if (!dryrun && np->child) {
>>> -               struct device_node *child = np->child;
>>> -               np->child = NULL;
>>> -               while (child) {
>>> -                       struct device_node *next = child->sibling;
>>> -                       child->sibling = np->child;
>>> -                       np->child = child;
>>> -                       child = next;
>>> -               }
>>> -       }
>>> -
>>> -       if (nodepp)
>>> -               *nodepp = np;
>>> +       if (!dryrun)
>>> +               reverse_nodes(root);
>>>
>>>         return mem;
>>>  }
>>> @@ -431,7 +452,6 @@ static void __unflatten_device_tree(const void *blob,
>>>                              void * (*dt_alloc)(u64 size, u64 align))
>>>  {
>>>         unsigned long size;
>>> -       int start;
>>>         void *mem;
>>>
>>>         pr_debug(" -> unflatten_device_tree()\n");
>>> @@ -452,8 +472,7 @@ static void __unflatten_device_tree(const void *blob,
>>>         }
>>>
>>>         /* First pass, scan for size */
>>> -       start = 0;
>>> -       size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
>>> +       size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
>>>         size = ALIGN(size, 4);
>>>
>>>         pr_debug("  size is %lx, allocating...\n", size);
>>> @@ -467,8 +486,7 @@ static void __unflatten_device_tree(const void *blob,
>>>         pr_debug("  unflattening %p...\n", mem);
>>>
>>>         /* Second pass, do actual unflattening */
>>> -       start = 0;
>>> -       unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
>>> +       unflatten_dt_node(blob, mem, NULL, mynodes, false);
>>>         if (be32_to_cpup(mem + size) != 0xdeadbeef)
>>>                 pr_warning("End of tree marker overwritten: %08x\n",
>>>                            be32_to_cpup(mem + size));
>
>Thanks,
>Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge()
  2015-11-04 13:12 ` [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
@ 2015-11-05 22:27   ` Daniel Axtens
  2015-11-05 23:44     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-05 22:27 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 971 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>  
> +void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
Very much a nit-pick, but I thought we were trying to move towards using
phb instead of hose in new code?
Apart from that this looks good. I would probably have merged it with
the previous patch, but I know Alexey has been suggesting a lot of
splitting and merging previously, so whatever he prefers here is OK.
> +
> +	if (hose->controller_ops.setup_bridge)
> +		hose->controller_ops.setup_bridge(bus, type);
> +}
> +
>  void pcibios_reset_secondary_bus(struct pci_dev *dev)
>  {
>  	struct pci_controller *phb = pci_bus_to_host(dev->bus);
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
  2015-11-04 13:12 ` [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops Gavin Shan
@ 2015-11-05 22:28   ` Daniel Axtens
  2015-11-06  1:09     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-05 22:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: devicetree, aik, linux-pci, panto, Gavin Shan, grant.likely,
	robherring2, bhelgaas, frowand.list
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> This cleans up on pnv_pci_ioda_controller_ops struct to use tab
> instead of space indent of statement to avoid complains from
> scripts/checkpatch.pl. No logical changes introduced.
Oh, that was my code :/ Sorry I missed that, thanks for cleaning it up!
Reviewed-by: Daniel Axtens <dja@axtens.net>
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 2e2bedb..aa3645c 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3064,17 +3064,17 @@ static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>  }
>  
>  static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
> -       .dma_dev_setup = pnv_pci_dma_dev_setup,
> +	.dma_dev_setup		= pnv_pci_dma_dev_setup,
>  #ifdef CONFIG_PCI_MSI
> -       .setup_msi_irqs = pnv_setup_msi_irqs,
> -       .teardown_msi_irqs = pnv_teardown_msi_irqs,
> +	.setup_msi_irqs		= pnv_setup_msi_irqs,
> +	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
>  #endif
> -       .enable_device_hook = pnv_pci_enable_device_hook,
> -       .window_alignment = pnv_pci_window_alignment,
> -       .reset_secondary_bus = pnv_pci_reset_secondary_bus,
> -       .dma_set_mask = pnv_pci_ioda_dma_set_mask,
> -       .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
> -       .shutdown = pnv_pci_ioda_shutdown,
> +	.enable_device_hook	= pnv_pci_enable_device_hook,
> +	.window_alignment	= pnv_pci_window_alignment,
> +	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> +	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
> +	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
> +	.shutdown		= pnv_pci_ioda_shutdown,
>  };
>  
>  static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> -- 
> 2.1.0
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops
  2015-11-04 13:12 ` [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
@ 2015-11-05 22:32   ` Daniel Axtens
  2015-11-05 23:45     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-05 22:32 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> Each PHB has one instance of "struct pci_controller_ops", which
> includes various callbacks called by PCI subsystem. In the definition
> of this struct, some callbacks have explicit names for its arguments,
> but the left don't have.
>
> This adds all explicit names of the arguments to the callbacks in
> "struct pci_controller_ops" so that the code looks consistent.
Thank you very much for doing this - I should have done it the first
time I created pci_controller_ops.
They all look good, with one nit-pick:
> -	void		(*shutdown)(struct pci_controller *);
> +	void		(*shutdown)(struct pci_controller *hose);
I think we're trying to move from hose to phb in new code.
Once that is fixed:
  Reviewed-by: Daniel Axtens <dja@axtens.net>
Regards,
Daniel
>  };
>  
>  /*
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2015-11-04 13:12 ` [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
@ 2015-11-05 22:56   ` Daniel Axtens
  2015-11-05 23:52     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-05 22:56 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 4529 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> The original implementation of pnv_ioda_setup_pe_seg() configures
> IO and M32 segments by separate logics, which can be merged by
> by caching @segmap, @seg_size, @win in advance. This shouldn't
> cause any behavioural changes.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 34 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 7ee7cfe..553d3f3 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2752,8 +2752,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>  	struct pnv_phb *phb = hose->private_data;
>  	struct pci_bus_region region;
>  	struct resource *res;
> -	int i, index;
> -	int rc;
> +	unsigned int segsize;
> +	int *segmap, index, i;
> +	uint16_t win;
> +	int64_t rc;
Good catch! Opal return codes are 64 bit and that should be explicit
in the type. However, I seem to remember that we preferred a different
type for 64 bit ints in the kernel. I think it's s64, and there are some
other uses of that in pci_ioda.c for return codes.
(I'm actually surprised that's not picked up as a compiler
warning. Maybe that's something to look at in future.)
The rest of the patch looks good on casual inspection - to be sure I'll
test the entire series on a machine. (hopefully, time permitting!)
Regards,
Daniel
>  
>  	/*
>  	 * NOTE: We only care PCI bus based PE for now. For PCI
> @@ -2770,23 +2772,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>  		if (res->flags & IORESOURCE_IO) {
>  			region.start = res->start - phb->ioda.io_pci_base;
>  			region.end   = res->end - phb->ioda.io_pci_base;
> -			index = region.start / phb->ioda.io_segsize;
> -
> -			while (index < phb->ioda.total_pe_num &&
> -			       region.start <= region.end) {
> -				phb->ioda.io_segmap[index] = pe->pe_number;
> -				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
> -				if (rc != OPAL_SUCCESS) {
> -					pr_err("%s: OPAL error %d when mapping IO "
> -					       "segment #%d to PE#%d\n",
> -					       __func__, rc, index, pe->pe_number);
> -					break;
> -				}
> -
> -				region.start += phb->ioda.io_segsize;
> -				index++;
> -			}
> +			segsize      = phb->ioda.io_segsize;
> +			segmap       = phb->ioda.io_segmap;
> +			win          = OPAL_IO_WINDOW_TYPE;
>  		} else if ((res->flags & IORESOURCE_MEM) &&
>  			   !pnv_pci_is_mem_pref_64(res->flags)) {
>  			region.start = res->start -
> @@ -2795,23 +2783,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>  			region.end   = res->end -
>  				       hose->mem_offset[0] -
>  				       phb->ioda.m32_pci_base;
> -			index = region.start / phb->ioda.m32_segsize;
> -
> -			while (index < phb->ioda.total_pe_num &&
> -			       region.start <= region.end) {
> -				phb->ioda.m32_segmap[index] = pe->pe_number;
> -				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
> -				if (rc != OPAL_SUCCESS) {
> -					pr_err("%s: OPAL error %d when mapping M32 "
> -					       "segment#%d to PE#%d",
> -					       __func__, rc, index, pe->pe_number);
> -					break;
> -				}
> +			segsize      = phb->ioda.m32_segsize;
> +			segmap       = phb->ioda.m32_segmap;
> +			win          = OPAL_M32_WINDOW_TYPE;
> +		} else {
> +			continue;
> +		}
>  
> -				region.start += phb->ioda.m32_segsize;
> -				index++;
> +		index = region.start / segsize;
> +		while (index < phb->ioda.total_pe_num &&
> +		       region.start <= region.end) {
> +			segmap[index] = pe->pe_number;
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +					pe->pe_number, win, 0, index);
> +			if (rc != OPAL_SUCCESS) {
> +				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
> +					__func__, rc, win, index,
> +					pe->phb->hose->global_number,
> +					pe->pe_number);
> +				break;
>  			}
> +
> +			region.start += segsize;
> +			index++;
>  		}
>  	}
>  }
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge()
  2015-11-05 22:27   ` Daniel Axtens
@ 2015-11-05 23:44     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-05 23:44 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Fri, Nov 06, 2015 at 09:27:42AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>>  
>> +void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
>> +{
>> +	struct pci_controller *hose = pci_bus_to_host(bus);
>Very much a nit-pick, but I thought we were trying to move towards using
>phb instead of hose in new code?
>
Take PowerNV platform as an example, "hose" means "struct pci_controller",
but "phb" means "struct pnv_phb". We don't have the movement to use "phb"
to represent "struct pci_controller".
>Apart from that this looks good. I would probably have merged it with
>the previous patch, but I know Alexey has been suggesting a lot of
>splitting and merging previously, so whatever he prefers here is OK.
>
I'd like to keep them separate as they're for different subsystem:
PCI generic subsystem and PowerPC subsystem. Separate maintainers
for them can pick them as they will.
>> +
>> +	if (hose->controller_ops.setup_bridge)
>> +		hose->controller_ops.setup_bridge(bus, type);
>> +}
>> +
>>  void pcibios_reset_secondary_bus(struct pci_dev *dev)
>>  {
>>  	struct pci_controller *phb = pci_bus_to_host(dev->bus);
>> -- 
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops
  2015-11-05 22:32   ` Daniel Axtens
@ 2015-11-05 23:45     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-05 23:45 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Fri, Nov 06, 2015 at 09:32:57AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> Each PHB has one instance of "struct pci_controller_ops", which
>> includes various callbacks called by PCI subsystem. In the definition
>> of this struct, some callbacks have explicit names for its arguments,
>> but the left don't have.
>>
>> This adds all explicit names of the arguments to the callbacks in
>> "struct pci_controller_ops" so that the code looks consistent.
>
>Thank you very much for doing this - I should have done it the first
>time I created pci_controller_ops.
>
>They all look good, with one nit-pick:
>
>> -	void		(*shutdown)(struct pci_controller *);
>> +	void		(*shutdown)(struct pci_controller *hose);
>
>I think we're trying to move from hose to phb in new code.
>
Nope, We don't have the movement as I explained in previous reply:
hose for pci_controller while phb represents pnv_phb on PowerNV
platform.
>Once that is fixed:
>  Reviewed-by: Daniel Axtens <dja@axtens.net>
>
Thanks,
Gavin
>
>>  };
>>  
>>  /*
>> -- 
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2015-11-05 22:56   ` Daniel Axtens
@ 2015-11-05 23:52     ` Gavin Shan
  2015-11-16  8:01       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-05 23:52 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Fri, Nov 06, 2015 at 09:56:06AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> The original implementation of pnv_ioda_setup_pe_seg() configures
>> IO and M32 segments by separate logics, which can be merged by
>> by caching @segmap, @seg_size, @win in advance. This shouldn't
>> cause any behavioural changes.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
>>  1 file changed, 28 insertions(+), 34 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 7ee7cfe..553d3f3 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -2752,8 +2752,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  	struct pnv_phb *phb = hose->private_data;
>>  	struct pci_bus_region region;
>>  	struct resource *res;
>> -	int i, index;
>> -	int rc;
>> +	unsigned int segsize;
>> +	int *segmap, index, i;
>> +	uint16_t win;
>> +	int64_t rc;
>
>Good catch! Opal return codes are 64 bit and that should be explicit
>in the type. However, I seem to remember that we preferred a different
>type for 64 bit ints in the kernel. I think it's s64, and there are some
>other uses of that in pci_ioda.c for return codes.
>
Both int64_t and s64 are fine. I used s64 for the OPAL return value, but
Alexey likes "int64_t", which is ok to me as well. I won't change it back
to s64 :-)
>(I'm actually surprised that's not picked up as a compiler
>warning. Maybe that's something to look at in future.)
>
Indeed, I didn't see a warning from gcc.
>The rest of the patch looks good on casual inspection - to be sure I'll
>test the entire series on a machine. (hopefully, time permitting!)
>
I run scripts/checkpatch.pl on the patchset. Only one warning came from
[PATCH 44/50], but I won't bother to change that as the warning was
brought by original code. If you want to test this patchset, you need
run it on Tuleta where the hotpluggable PCI slots are supported.
Thanks,
Gavin
>>  
>>  	/*
>>  	 * NOTE: We only care PCI bus based PE for now. For PCI
>> @@ -2770,23 +2772,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  		if (res->flags & IORESOURCE_IO) {
>>  			region.start = res->start - phb->ioda.io_pci_base;
>>  			region.end   = res->end - phb->ioda.io_pci_base;
>> -			index = region.start / phb->ioda.io_segsize;
>> -
>> -			while (index < phb->ioda.total_pe_num &&
>> -			       region.start <= region.end) {
>> -				phb->ioda.io_segmap[index] = pe->pe_number;
>> -				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>> -					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>> -				if (rc != OPAL_SUCCESS) {
>> -					pr_err("%s: OPAL error %d when mapping IO "
>> -					       "segment #%d to PE#%d\n",
>> -					       __func__, rc, index, pe->pe_number);
>> -					break;
>> -				}
>> -
>> -				region.start += phb->ioda.io_segsize;
>> -				index++;
>> -			}
>> +			segsize      = phb->ioda.io_segsize;
>> +			segmap       = phb->ioda.io_segmap;
>> +			win          = OPAL_IO_WINDOW_TYPE;
>>  		} else if ((res->flags & IORESOURCE_MEM) &&
>>  			   !pnv_pci_is_mem_pref_64(res->flags)) {
>>  			region.start = res->start -
>> @@ -2795,23 +2783,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  			region.end   = res->end -
>>  				       hose->mem_offset[0] -
>>  				       phb->ioda.m32_pci_base;
>> -			index = region.start / phb->ioda.m32_segsize;
>> -
>> -			while (index < phb->ioda.total_pe_num &&
>> -			       region.start <= region.end) {
>> -				phb->ioda.m32_segmap[index] = pe->pe_number;
>> -				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>> -					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>> -				if (rc != OPAL_SUCCESS) {
>> -					pr_err("%s: OPAL error %d when mapping M32 "
>> -					       "segment#%d to PE#%d",
>> -					       __func__, rc, index, pe->pe_number);
>> -					break;
>> -				}
>> +			segsize      = phb->ioda.m32_segsize;
>> +			segmap       = phb->ioda.m32_segmap;
>> +			win          = OPAL_M32_WINDOW_TYPE;
>> +		} else {
>> +			continue;
>> +		}
>>  
>> -				region.start += phb->ioda.m32_segsize;
>> -				index++;
>> +		index = region.start / segsize;
>> +		while (index < phb->ioda.total_pe_num &&
>> +		       region.start <= region.end) {
>> +			segmap[index] = pe->pe_number;
>> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>> +					pe->pe_number, win, 0, index);
>> +			if (rc != OPAL_SUCCESS) {
>> +				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>> +					__func__, rc, win, index,
>> +					pe->phb->hose->global_number,
>> +					pe->pe_number);
>> +				break;
>>  			}
>> +
>> +			region.start += segsize;
>> +			index++;
>>  		}
>>  	}
>>  }
>> -- 
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
  2015-11-05 22:28   ` Daniel Axtens
@ 2015-11-06  1:09     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-06  1:09 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, devicetree, aik, linux-pci, panto,
	grant.likely, robherring2, bhelgaas, frowand.list
On Fri, Nov 06, 2015 at 09:28:20AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> This cleans up on pnv_pci_ioda_controller_ops struct to use tab
>> instead of space indent of statement to avoid complains from
>> scripts/checkpatch.pl. No logical changes introduced.
>
>Oh, that was my code :/ Sorry I missed that, thanks for cleaning it up!
>
>Reviewed-by: Daniel Axtens <dja@axtens.net>
>
nah, that's fine, Daniel. You might be saying: I didn't review your code
carefully enough, which leaves me the chance to clean it up :-)
Thanks,
Gavin
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 2e2bedb..aa3645c 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -3064,17 +3064,17 @@ static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>  }
>>  
>>  static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>> -       .dma_dev_setup = pnv_pci_dma_dev_setup,
>> +	.dma_dev_setup		= pnv_pci_dma_dev_setup,
>>  #ifdef CONFIG_PCI_MSI
>> -       .setup_msi_irqs = pnv_setup_msi_irqs,
>> -       .teardown_msi_irqs = pnv_teardown_msi_irqs,
>> +	.setup_msi_irqs		= pnv_setup_msi_irqs,
>> +	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
>>  #endif
>> -       .enable_device_hook = pnv_pci_enable_device_hook,
>> -       .window_alignment = pnv_pci_window_alignment,
>> -       .reset_secondary_bus = pnv_pci_reset_secondary_bus,
>> -       .dma_set_mask = pnv_pci_ioda_dma_set_mask,
>> -       .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
>> -       .shutdown = pnv_pci_ioda_shutdown,
>> +	.enable_device_hook	= pnv_pci_enable_device_hook,
>> +	.window_alignment	= pnv_pci_window_alignment,
>> +	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>> +	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
>> +	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
>> +	.shutdown		= pnv_pci_ioda_shutdown,
>>  };
>>  
>>  static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>> -- 
>> 2.1.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 00/50] powerpc/powernv: PCI hotplug support
  2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
                   ` (49 preceding siblings ...)
  2015-11-04 13:12 ` [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
@ 2015-11-09  3:09 ` Gavin Shan
  2015-11-09  4:24   ` Pramod Sudheendra
  50 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-09  3:09 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, aik, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On Thu, Nov 05, 2015 at 12:12:00AM +1100, Gavin Shan wrote:
>This series of patches rebases on powerpc/next branch, plus below additional
>patches:
>
>   https://patchwork.ozlabs.org/patch/534804/   (PATCH[1/1] Andrew's EEH fix)
>   https://patchwork.ozlabs.org/patch/534154/   (PATCH[7/7] Richard's SRIOV Rework)
>   commit 3b0e21e Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
>
As asked by Alexey, here is the repo on github:
https://github.com/gwshan/pnv-pci-hotplug.git
>The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>which is running on top of skiboot firmware. The patchset requires corresponding
>changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>for review. The PCI slots are exposed by skiboot with device node properties,
>and kernel utilizes those properties to populated PCI slots accordingly.
>
>The original PCI infrastructure on PowerNV platform can't support hotplug
>because the PE is assigned during PHB fixup time, which is called for once
>during system boot time. For this, the PCI infrastructure on PowerNV platform
>has been reworked for a lot. After that, the PE and its corresponding resources
>(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>resources, on P8 strictly speaking). Each PE will maintain a reference count,
>which is (number of child PCI devices + 1). That indicates when last child PCI
>device leaves the PE, the PE and its included resources will be relased and put
>back into free pool again. With this design, the PE will be released when EEH PE
>is released. PATCH[1 - 27] are related to this part.
>
>>From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>resets to EEH. The kernel gets to know if skiboot supports various reset on one
>particular PCI slot through device-tree node. If it does, EEH will utilize the
>functionality provided by skiboot. Besides, the device-tree nodes have to change
>in order to support PCI hotplug. For example, when one PCI adapter inserted to
>one slot, its device-tree node should be added to the system dynamically. Conversely,
>the device-tree node should be removed from the system when the PCI adapter is going
>to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>they should be added/removed accordingly during PCI hotplug. PATCH[28 - 43] are
>doing the related work.
>
>The OF driver is changed to support unflattening FDT blob for sub-stree, which
>is covered by PATCH[44 - 49].
>
>The last one, PATCH[50], is the standalone PCI hotplug driver for PowerPC PowerNV
>platform. 
>
>Changelog
>=========
>v7:
>   * Reworked revision to some extent.
>   * Rebased to powerpc/next repository.
>   * Reorder/split/merge/drop according - Alexey.
>   * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>   * Merged 3 files to one for the hotplug driver - Alexey.
>   * As part of OPAL API, defined macros for PCI slot power state, hotplug
>     message type. Defined macros for PCI slot power confirmed state in
>     hotplug driver.
>   * Misc comments from Alexey.
>   * Reworked unflatten_dt_node() to avoid recursive function calls.
>   * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
>v6:
>   * Patch reorder, split, squash - Alexey.
>   * Minor coding style - Alexey.
>   * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>   * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>   * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>   * Replace overlay with of_changeset - Grant
>v5:
>   * Rebased to 4.1.rc6 and some unmerged patches as below:
>     Alexey's DDW patchset (v11);
>     Gavin's EEH error injection support (in mpe's next branch);
>     Richard's EEH cleanup patches (in mpe's next branch);
>     Richard's EEH support for VF (v7);
>     Gavin's misc EEH fixes for 4.2;
>   * The revision bases on skiboot corresponding patches (v7):
>     https://patchwork.ozlabs.org/patch/480437/
>   * Utilize OF overlay to update device-tree with help of newly introduced
>     OPAL API opal_get_overlay_dt().
>   * Split patches for easy review according to aik's comments.
>   * Fix coding style from checkpatchc.pl as pointed by aik.
>   * Code cleanup and misc fixup according to aik's input.
>v4:
>   * Rebased to 4.1.RC1
>   * Added API to unflatten FDT blob to device node sub-tree, which is attached
>     the indicated parent device node. The original mechanism based on formatted
>     string stream has been dropped.
>   * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>     was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>     Support" depends on that.
>v3:
>   * Rebased to 4.1.RC0
>   * PowerNV PCI infrasturcture is total refactored in order to support PCI
>     hotplug. The PowerNV hotplug driver is also reworked a lot because of
>     the changes in skiboot in order to support PCI hotplug.
>
>Gavin Shan (50):
>  PCI: Add pcibios_setup_bridge()
>  powerpc/pci: Override pcibios_setup_bridge()
>  powerpc/pci: Cleanup on struct pci_controller_ops
>  powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
>  powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
>  powerpc/powernv: Drop phb->bdfn_to_pe()
>  powerpc/powernv: Reorder fields in struct pnv_phb
>  powerpc/powernv: Rename PE# fields in struct pnv_phb
>  powerpc/powernv: Fix initial IO and M32 segmap
>  powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>  powerpc/powernv: IO and M32 mapping based on PCI device resources
>  powerpc/powernv: Track M64 segment consumption
>  powerpc/powernv: Rename M64 related functions
>  powerpc/powernv: M64 support on P7IOC
>  powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe()
>  powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE
>  powerpc/powernv: Avoid calculating DMA32 segments on PHB3
>  powerpc/powernv: Remove DMA32 PE list
>  powerpc/powernv: Track DMA32 segment consumption
>  powerpc/powernv: Improve DMA32 segment calculation
>  powerpc/powernv: Increase PE# capacity
>  powerpc/powernv: Introduce pnv_ioda_init_pe()
>  powerpc/powernv: Use PE instead of number during setup and release
>  powerpc/powernv: Allocate PE# in reverse order
>  powerpc/powernv: Reserve PE for root bus
>  powerpc/powernv: Create PEs at PCI hot plugging time
>  powerpc/powernv: Dynamically release PEs
>  powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>  powerpc/pci: Rename pcibios_find_pci_bus()
>  powerpc/pci: Move pci_find_bus_by_node() around
>  powerpc/pci: Export pci_add_device_node_info()
>  powerpc/pci: Introduce pci_remove_device_node_info()
>  powerpc/pci: Export pci_traverse_device_nodes()
>  powerpc/pci: Delay populating pdn
>  powerpc/pci: Don't scan empty slot
>  powerpc/pci: Update bridge windows on PCI plug
>  powerpc/powernv: Simplify pnv_eeh_reset()
>  powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>  powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>  powerpc/powernv: Support PCI slot ID
>  powerpc/powernv: Use firmware PCI slot reset infrastructure
>  powerpc/powernv: Functions to get/set PCI slot status
>  powerpc/powernv: Select OF_DYNAMIC
>  drivers/of: Split unflatten_dt_node()
>  drivers/of: Avoid recursively calling unflatten_dt_node()
>  drivers/of: Rename unflatten_dt_node()
>  drivers/of: Specify parent node in of_fdt_unflatten_tree()
>  drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>  drivers/of: Export OF changeset functions
>  PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>
> MAINTAINERS                                    |    6 +
> arch/powerpc/include/asm/eeh.h                 |    2 +-
> arch/powerpc/include/asm/opal-api.h            |   17 +-
> arch/powerpc/include/asm/opal.h                |    8 +-
> arch/powerpc/include/asm/pci-bridge.h          |   25 +-
> arch/powerpc/include/asm/pnv-pci.h             |    7 +
> arch/powerpc/include/asm/ppc-pci.h             |    8 +-
> arch/powerpc/kernel/eeh_dev.c                  |   19 +-
> arch/powerpc/kernel/eeh_driver.c               |   12 +-
> arch/powerpc/kernel/pci-common.c               |   16 +-
> arch/powerpc/kernel/pci-hotplug.c              |   47 +-
> arch/powerpc/kernel/pci_dn.c                   |   85 +-
> arch/powerpc/platforms/maple/pci.c             |   34 +-
> arch/powerpc/platforms/pasemi/pci.c            |    3 -
> arch/powerpc/platforms/powermac/pci.c          |   38 +-
> arch/powerpc/platforms/powernv/Kconfig         |    1 +
> arch/powerpc/platforms/powernv/eeh-powernv.c   |  173 ++--
> arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
> arch/powerpc/platforms/powernv/pci-ioda.c      | 1251 +++++++++++++++---------
> arch/powerpc/platforms/powernv/pci.c           |   92 +-
> arch/powerpc/platforms/powernv/pci.h           |   62 +-
> arch/powerpc/platforms/pseries/msi.c           |    4 +-
> arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
> arch/powerpc/platforms/pseries/setup.c         |    8 +-
> drivers/of/dynamic.c                           |   65 +-
> drivers/of/fdt.c                               |  378 ++++---
> drivers/of/of_private.h                        |    2 +
> drivers/of/overlay.c                           |    8 +-
> drivers/of/unittest.c                          |    6 +-
> drivers/pci/hotplug/Kconfig                    |   12 +
> drivers/pci/hotplug/Makefile                   |    3 +
> drivers/pci/hotplug/pnv_php.c                  |  866 ++++++++++++++++
> drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
> drivers/pci/hotplug/rpaphp_core.c              |    4 +-
> drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
> drivers/pci/setup-bus.c                        |    5 +
> include/linux/of_fdt.h                         |    5 +-
> include/linux/pci.h                            |    1 +
> 38 files changed, 2389 insertions(+), 932 deletions(-)
> create mode 100644 drivers/pci/hotplug/pnv_php.c
>
>-- 
>2.1.0
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 00/50] powerpc/powernv: PCI hotplug support
  2015-11-09  3:09 ` [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
@ 2015-11-09  4:24   ` Pramod Sudheendra
  2015-11-09  4:29     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Pramod Sudheendra @ 2015-11-09  4:24 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, aik, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
> On Nov 8, 2015, at 7:09 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> =
wrote:
>=20
> On Thu, Nov 05, 2015 at 12:12:00AM +1100, Gavin Shan wrote:
>> This series of patches rebases on powerpc/next branch, plus below =
additional
>> patches:
>>=20
>>  https://patchwork.ozlabs.org/patch/534804/   (PATCH[1/1] Andrew's =
EEH fix)
>>  https://patchwork.ozlabs.org/patch/534154/   (PATCH[7/7] Richard's =
SRIOV Rework)
>>  commit 3b0e21e Merge branch 'next' of =
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
>>=20
>=20
> As asked by Alexey, here is the repo on github:
>=20
> https://github.com/gwshan/pnv-pci-hotplug.git
Don=E2=80=99t see that link working.=20
>=20
>> The series of patches intend to support PCI slot for PowerPC PowerNV =
platform,
>> which is running on top of skiboot firmware. The patchset requires =
corresponding
>> changes from skiboot firmware, which is sent to =
skiboot@lists.ozlabs.org
>> for review. The PCI slots are exposed by skiboot with device node =
properties,
>> and kernel utilizes those properties to populated PCI slots =
accordingly.
>>=20
>> The original PCI infrastructure on PowerNV platform can't support =
hotplug
>> because the PE is assigned during PHB fixup time, which is called for =
once
>> during system boot time. For this, the PCI infrastructure on PowerNV =
platform
>> has been reworked for a lot. After that, the PE and its corresponding =
resources
>> (IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned =
upon updating
>> PCI bridge's resources, which might decide PE# assigned to the PE =
(e.g. M64
>> resources, on P8 strictly speaking). Each PE will maintain a =
reference count,
>> which is (number of child PCI devices + 1). That indicates when last =
child PCI
>> device leaves the PE, the PE and its included resources will be =
relased and put
>> back into free pool again. With this design, the PE will be released =
when EEH PE
>> is released. PATCH[1 - 27] are related to this part.
>>=20
>>> =46rom skiboot perspective, PCI slot is providing =
(hot/fundamental/complete)
>> resets to EEH. The kernel gets to know if skiboot supports various =
reset on one
>> particular PCI slot through device-tree node. If it does, EEH will =
utilize the
>> functionality provided by skiboot. Besides, the device-tree nodes =
have to change
>> in order to support PCI hotplug. For example, when one PCI adapter =
inserted to
>> one slot, its device-tree node should be added to the system =
dynamically. Conversely,
>> the device-tree node should be removed from the system when the PCI =
adapter is going
>> to be offline. Since pci_dn and eeh_dev have same life cyle as PCI =
device nodes,
>> they should be added/removed accordingly during PCI hotplug. PATCH[28 =
- 43] are
>> doing the related work.
>>=20
>> The OF driver is changed to support unflattening FDT blob for =
sub-stree, which
>> is covered by PATCH[44 - 49].
>>=20
>> The last one, PATCH[50], is the standalone PCI hotplug driver for =
PowerPC PowerNV
>> platform.=20
>>=20
>> Changelog
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D
>> v7:
>>  * Reworked revision to some extent.
>>  * Rebased to powerpc/next repository.
>>  * Reorder/split/merge/drop according - Alexey.
>>  * Defined macros and use array to track IO/M32/M64/DMA32 segments - =
Alexey.
>>  * Merged 3 files to one for the hotplug driver - Alexey.
>>  * As part of OPAL API, defined macros for PCI slot power state, =
hotplug
>>    message type. Defined macros for PCI slot power confirmed state in
>>    hotplug driver.
>>  * Misc comments from Alexey.
>>  * Reworked unflatten_dt_node() to avoid recursive function calls.
>>  * Use EXPORT_SYMBOL_GPL() and document function's input/output - =
Rob/Frank.
>> v6:
>>  * Patch reorder, split, squash - Alexey.
>>  * Minor coding style - Alexey.
>>  * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>>  * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - =
Bjorn
>>  * Concurrent depth as parameter passed to __unflatten_dt_node() - =
Grant / Alexey
>>  * Replace overlay with of_changeset - Grant
>> v5:
>>  * Rebased to 4.1.rc6 and some unmerged patches as below:
>>    Alexey's DDW patchset (v11);
>>    Gavin's EEH error injection support (in mpe's next branch);
>>    Richard's EEH cleanup patches (in mpe's next branch);
>>    Richard's EEH support for VF (v7);
>>    Gavin's misc EEH fixes for 4.2;
>>  * The revision bases on skiboot corresponding patches (v7):
>>    https://patchwork.ozlabs.org/patch/480437/
>>  * Utilize OF overlay to update device-tree with help of newly =
introduced
>>    OPAL API opal_get_overlay_dt().
>>  * Split patches for easy review according to aik's comments.
>>  * Fix coding style from checkpatchc.pl as pointed by aik.
>>  * Code cleanup and misc fixup according to aik's input.
>> v4:
>>  * Rebased to 4.1.RC1
>>  * Added API to unflatten FDT blob to device node sub-tree, which is =
attached
>>    the indicated parent device node. The original mechanism based on =
formatted
>>    string stream has been dropped.
>>  * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during =
hotplug")
>>    was picked up sent to linux-ppc@ separately for review as =
Richard's "VF EEH
>>    Support" depends on that.
>> v3:
>>  * Rebased to 4.1.RC0
>>  * PowerNV PCI infrasturcture is total refactored in order to support =
PCI
>>    hotplug. The PowerNV hotplug driver is also reworked a lot because =
of
>>    the changes in skiboot in order to support PCI hotplug.
>>=20
>> Gavin Shan (50):
>> PCI: Add pcibios_setup_bridge()
>> powerpc/pci: Override pcibios_setup_bridge()
>> powerpc/pci: Cleanup on struct pci_controller_ops
>> powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
>> powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
>> powerpc/powernv: Drop phb->bdfn_to_pe()
>> powerpc/powernv: Reorder fields in struct pnv_phb
>> powerpc/powernv: Rename PE# fields in struct pnv_phb
>> powerpc/powernv: Fix initial IO and M32 segmap
>> powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>> powerpc/powernv: IO and M32 mapping based on PCI device resources
>> powerpc/powernv: Track M64 segment consumption
>> powerpc/powernv: Rename M64 related functions
>> powerpc/powernv: M64 support on P7IOC
>> powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe()
>> powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE
>> powerpc/powernv: Avoid calculating DMA32 segments on PHB3
>> powerpc/powernv: Remove DMA32 PE list
>> powerpc/powernv: Track DMA32 segment consumption
>> powerpc/powernv: Improve DMA32 segment calculation
>> powerpc/powernv: Increase PE# capacity
>> powerpc/powernv: Introduce pnv_ioda_init_pe()
>> powerpc/powernv: Use PE instead of number during setup and release
>> powerpc/powernv: Allocate PE# in reverse order
>> powerpc/powernv: Reserve PE for root bus
>> powerpc/powernv: Create PEs at PCI hot plugging time
>> powerpc/powernv: Dynamically release PEs
>> powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>> powerpc/pci: Rename pcibios_find_pci_bus()
>> powerpc/pci: Move pci_find_bus_by_node() around
>> powerpc/pci: Export pci_add_device_node_info()
>> powerpc/pci: Introduce pci_remove_device_node_info()
>> powerpc/pci: Export pci_traverse_device_nodes()
>> powerpc/pci: Delay populating pdn
>> powerpc/pci: Don't scan empty slot
>> powerpc/pci: Update bridge windows on PCI plug
>> powerpc/powernv: Simplify pnv_eeh_reset()
>> powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>> powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>> powerpc/powernv: Support PCI slot ID
>> powerpc/powernv: Use firmware PCI slot reset infrastructure
>> powerpc/powernv: Functions to get/set PCI slot status
>> powerpc/powernv: Select OF_DYNAMIC
>> drivers/of: Split unflatten_dt_node()
>> drivers/of: Avoid recursively calling unflatten_dt_node()
>> drivers/of: Rename unflatten_dt_node()
>> drivers/of: Specify parent node in of_fdt_unflatten_tree()
>> drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>> drivers/of: Export OF changeset functions
>> PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>>=20
>> MAINTAINERS                                    |    6 +
>> arch/powerpc/include/asm/eeh.h                 |    2 +-
>> arch/powerpc/include/asm/opal-api.h            |   17 +-
>> arch/powerpc/include/asm/opal.h                |    8 +-
>> arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>> arch/powerpc/include/asm/pnv-pci.h             |    7 +
>> arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>> arch/powerpc/kernel/eeh_dev.c                  |   19 +-
>> arch/powerpc/kernel/eeh_driver.c               |   12 +-
>> arch/powerpc/kernel/pci-common.c               |   16 +-
>> arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>> arch/powerpc/kernel/pci_dn.c                   |   85 +-
>> arch/powerpc/platforms/maple/pci.c             |   34 +-
>> arch/powerpc/platforms/pasemi/pci.c            |    3 -
>> arch/powerpc/platforms/powermac/pci.c          |   38 +-
>> arch/powerpc/platforms/powernv/Kconfig         |    1 +
>> arch/powerpc/platforms/powernv/eeh-powernv.c   |  173 ++--
>> arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>> arch/powerpc/platforms/powernv/pci-ioda.c      | 1251 =
+++++++++++++++---------
>> arch/powerpc/platforms/powernv/pci.c           |   92 +-
>> arch/powerpc/platforms/powernv/pci.h           |   62 +-
>> arch/powerpc/platforms/pseries/msi.c           |    4 +-
>> arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>> arch/powerpc/platforms/pseries/setup.c         |    8 +-
>> drivers/of/dynamic.c                           |   65 +-
>> drivers/of/fdt.c                               |  378 ++++---
>> drivers/of/of_private.h                        |    2 +
>> drivers/of/overlay.c                           |    8 +-
>> drivers/of/unittest.c                          |    6 +-
>> drivers/pci/hotplug/Kconfig                    |   12 +
>> drivers/pci/hotplug/Makefile                   |    3 +
>> drivers/pci/hotplug/pnv_php.c                  |  866 =
++++++++++++++++
>> drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>> drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>> drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>> drivers/pci/setup-bus.c                        |    5 +
>> include/linux/of_fdt.h                         |    5 +-
>> include/linux/pci.h                            |    1 +
>> 38 files changed, 2389 insertions(+), 932 deletions(-)
>> create mode 100644 drivers/pci/hotplug/pnv_php.c
>>=20
>> --=20
>> 2.1.0
>>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 00/50] powerpc/powernv: PCI hotplug support
  2015-11-09  4:24   ` Pramod Sudheendra
@ 2015-11-09  4:29     ` Gavin Shan
  2015-11-09  6:43       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-09  4:29 UTC (permalink / raw)
  To: Pramod Sudheendra
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Sun, Nov 08, 2015 at 08:24:37PM -0800, Pramod Sudheendra wrote:
>> On Nov 8, 2015, at 7:09 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Thu, Nov 05, 2015 at 12:12:00AM +1100, Gavin Shan wrote:
>>> This series of patches rebases on powerpc/next branch, plus below additional
>>> patches:
>>> 
>>>  https://patchwork.ozlabs.org/patch/534804/   (PATCH[1/1] Andrew's EEH fix)
>>>  https://patchwork.ozlabs.org/patch/534154/   (PATCH[7/7] Richard's SRIOV Rework)
>>>  commit 3b0e21e Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
>>> 
>> 
>> As asked by Alexey, here is the repo on github:
>> 
>> https://github.com/gwshan/pnv-pci-hotplug.git
>Don’t see that link working. 
Yeah, I dropped that before it's populated completely as I was told it's disallowed
by my employer. I have to push it into IBM internal git server and it's only visible
to IBM. Sorry for the inconvienence...
>> 
>>> The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>> which is running on top of skiboot firmware. The patchset requires corresponding
>>> changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>>> for review. The PCI slots are exposed by skiboot with device node properties,
>>> and kernel utilizes those properties to populated PCI slots accordingly.
>>> 
>>> The original PCI infrastructure on PowerNV platform can't support hotplug
>>> because the PE is assigned during PHB fixup time, which is called for once
>>> during system boot time. For this, the PCI infrastructure on PowerNV platform
>>> has been reworked for a lot. After that, the PE and its corresponding resources
>>> (IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
>>> PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>> resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>> which is (number of child PCI devices + 1). That indicates when last child PCI
>>> device leaves the PE, the PE and its included resources will be relased and put
>>> back into free pool again. With this design, the PE will be released when EEH PE
>>> is released. PATCH[1 - 27] are related to this part.
>>> 
>>>> From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>> resets to EEH. The kernel gets to know if skiboot supports various reset on one
>>> particular PCI slot through device-tree node. If it does, EEH will utilize the
>>> functionality provided by skiboot. Besides, the device-tree nodes have to change
>>> in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>> one slot, its device-tree node should be added to the system dynamically. Conversely,
>>> the device-tree node should be removed from the system when the PCI adapter is going
>>> to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
>>> they should be added/removed accordingly during PCI hotplug. PATCH[28 - 43] are
>>> doing the related work.
>>> 
>>> The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>> is covered by PATCH[44 - 49].
>>> 
>>> The last one, PATCH[50], is the standalone PCI hotplug driver for PowerPC PowerNV
>>> platform. 
>>> 
>>> Changelog
>>> =========
>>> v7:
>>>  * Reworked revision to some extent.
>>>  * Rebased to powerpc/next repository.
>>>  * Reorder/split/merge/drop according - Alexey.
>>>  * Defined macros and use array to track IO/M32/M64/DMA32 segments - Alexey.
>>>  * Merged 3 files to one for the hotplug driver - Alexey.
>>>  * As part of OPAL API, defined macros for PCI slot power state, hotplug
>>>    message type. Defined macros for PCI slot power confirmed state in
>>>    hotplug driver.
>>>  * Misc comments from Alexey.
>>>  * Reworked unflatten_dt_node() to avoid recursive function calls.
>>>  * Use EXPORT_SYMBOL_GPL() and document function's input/output - Rob/Frank.
>>> v6:
>>>  * Patch reorder, split, squash - Alexey.
>>>  * Minor coding style - Alexey.
>>>  * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
>>>  * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
>>>  * Concurrent depth as parameter passed to __unflatten_dt_node() - Grant / Alexey
>>>  * Replace overlay with of_changeset - Grant
>>> v5:
>>>  * Rebased to 4.1.rc6 and some unmerged patches as below:
>>>    Alexey's DDW patchset (v11);
>>>    Gavin's EEH error injection support (in mpe's next branch);
>>>    Richard's EEH cleanup patches (in mpe's next branch);
>>>    Richard's EEH support for VF (v7);
>>>    Gavin's misc EEH fixes for 4.2;
>>>  * The revision bases on skiboot corresponding patches (v7):
>>>    https://patchwork.ozlabs.org/patch/480437/
>>>  * Utilize OF overlay to update device-tree with help of newly introduced
>>>    OPAL API opal_get_overlay_dt().
>>>  * Split patches for easy review according to aik's comments.
>>>  * Fix coding style from checkpatchc.pl as pointed by aik.
>>>  * Code cleanup and misc fixup according to aik's input.
>>> v4:
>>>  * Rebased to 4.1.RC1
>>>  * Added API to unflatten FDT blob to device node sub-tree, which is attached
>>>    the indicated parent device node. The original mechanism based on formatted
>>>    string stream has been dropped.
>>>  * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
>>>    was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
>>>    Support" depends on that.
>>> v3:
>>>  * Rebased to 4.1.RC0
>>>  * PowerNV PCI infrasturcture is total refactored in order to support PCI
>>>    hotplug. The PowerNV hotplug driver is also reworked a lot because of
>>>    the changes in skiboot in order to support PCI hotplug.
>>> 
>>> Gavin Shan (50):
>>> PCI: Add pcibios_setup_bridge()
>>> powerpc/pci: Override pcibios_setup_bridge()
>>> powerpc/pci: Cleanup on struct pci_controller_ops
>>> powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
>>> powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
>>> powerpc/powernv: Drop phb->bdfn_to_pe()
>>> powerpc/powernv: Reorder fields in struct pnv_phb
>>> powerpc/powernv: Rename PE# fields in struct pnv_phb
>>> powerpc/powernv: Fix initial IO and M32 segmap
>>> powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
>>> powerpc/powernv: IO and M32 mapping based on PCI device resources
>>> powerpc/powernv: Track M64 segment consumption
>>> powerpc/powernv: Rename M64 related functions
>>> powerpc/powernv: M64 support on P7IOC
>>> powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe()
>>> powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE
>>> powerpc/powernv: Avoid calculating DMA32 segments on PHB3
>>> powerpc/powernv: Remove DMA32 PE list
>>> powerpc/powernv: Track DMA32 segment consumption
>>> powerpc/powernv: Improve DMA32 segment calculation
>>> powerpc/powernv: Increase PE# capacity
>>> powerpc/powernv: Introduce pnv_ioda_init_pe()
>>> powerpc/powernv: Use PE instead of number during setup and release
>>> powerpc/powernv: Allocate PE# in reverse order
>>> powerpc/powernv: Reserve PE for root bus
>>> powerpc/powernv: Create PEs at PCI hot plugging time
>>> powerpc/powernv: Dynamically release PEs
>>> powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
>>> powerpc/pci: Rename pcibios_find_pci_bus()
>>> powerpc/pci: Move pci_find_bus_by_node() around
>>> powerpc/pci: Export pci_add_device_node_info()
>>> powerpc/pci: Introduce pci_remove_device_node_info()
>>> powerpc/pci: Export pci_traverse_device_nodes()
>>> powerpc/pci: Delay populating pdn
>>> powerpc/pci: Don't scan empty slot
>>> powerpc/pci: Update bridge windows on PCI plug
>>> powerpc/powernv: Simplify pnv_eeh_reset()
>>> powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
>>> powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
>>> powerpc/powernv: Support PCI slot ID
>>> powerpc/powernv: Use firmware PCI slot reset infrastructure
>>> powerpc/powernv: Functions to get/set PCI slot status
>>> powerpc/powernv: Select OF_DYNAMIC
>>> drivers/of: Split unflatten_dt_node()
>>> drivers/of: Avoid recursively calling unflatten_dt_node()
>>> drivers/of: Rename unflatten_dt_node()
>>> drivers/of: Specify parent node in of_fdt_unflatten_tree()
>>> drivers/of: Return allocated memory from of_fdt_unflatten_tree()
>>> drivers/of: Export OF changeset functions
>>> PCI/hotplug: PowerPC PowerNV PCI hotplug driver
>>> 
>>> MAINTAINERS                                    |    6 +
>>> arch/powerpc/include/asm/eeh.h                 |    2 +-
>>> arch/powerpc/include/asm/opal-api.h            |   17 +-
>>> arch/powerpc/include/asm/opal.h                |    8 +-
>>> arch/powerpc/include/asm/pci-bridge.h          |   25 +-
>>> arch/powerpc/include/asm/pnv-pci.h             |    7 +
>>> arch/powerpc/include/asm/ppc-pci.h             |    8 +-
>>> arch/powerpc/kernel/eeh_dev.c                  |   19 +-
>>> arch/powerpc/kernel/eeh_driver.c               |   12 +-
>>> arch/powerpc/kernel/pci-common.c               |   16 +-
>>> arch/powerpc/kernel/pci-hotplug.c              |   47 +-
>>> arch/powerpc/kernel/pci_dn.c                   |   85 +-
>>> arch/powerpc/platforms/maple/pci.c             |   34 +-
>>> arch/powerpc/platforms/pasemi/pci.c            |    3 -
>>> arch/powerpc/platforms/powermac/pci.c          |   38 +-
>>> arch/powerpc/platforms/powernv/Kconfig         |    1 +
>>> arch/powerpc/platforms/powernv/eeh-powernv.c   |  173 ++--
>>> arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
>>> arch/powerpc/platforms/powernv/pci-ioda.c      | 1251 +++++++++++++++---------
>>> arch/powerpc/platforms/powernv/pci.c           |   92 +-
>>> arch/powerpc/platforms/powernv/pci.h           |   62 +-
>>> arch/powerpc/platforms/pseries/msi.c           |    4 +-
>>> arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
>>> arch/powerpc/platforms/pseries/setup.c         |    8 +-
>>> drivers/of/dynamic.c                           |   65 +-
>>> drivers/of/fdt.c                               |  378 ++++---
>>> drivers/of/of_private.h                        |    2 +
>>> drivers/of/overlay.c                           |    8 +-
>>> drivers/of/unittest.c                          |    6 +-
>>> drivers/pci/hotplug/Kconfig                    |   12 +
>>> drivers/pci/hotplug/Makefile                   |    3 +
>>> drivers/pci/hotplug/pnv_php.c                  |  866 ++++++++++++++++
>>> drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
>>> drivers/pci/hotplug/rpaphp_core.c              |    4 +-
>>> drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
>>> drivers/pci/setup-bus.c                        |    5 +
>>> include/linux/of_fdt.h                         |    5 +-
>>> include/linux/pci.h                            |    1 +
>>> 38 files changed, 2389 insertions(+), 932 deletions(-)
>>> create mode 100644 drivers/pci/hotplug/pnv_php.c
>>> 
>>> -- 
>>> 2.1.0
>>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 00/50] powerpc/powernv: PCI hotplug support
  2015-11-09  4:29     ` Gavin Shan
@ 2015-11-09  6:43       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 157+ messages in thread
From: Benjamin Herrenschmidt @ 2015-11-09  6:43 UTC (permalink / raw)
  To: Gavin Shan, Pramod Sudheendra
  Cc: linuxppc-dev, linux-pci, devicetree, mpe, aik, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On Mon, 2015-11-09 at 15:29 +1100, Gavin Shan wrote:
> 
> Yeah, I dropped that before it's populated completely as I was told
> it's disallowed
> by my employer. I have to push it into IBM internal git server and
> it's only visible
> to IBM. Sorry for the inconvienence...
I think that's a misinterpretation of the rule, I'll sort that out
tomorrow, there should be no problem publishing that tree on github as
long as you take a couple of precautions.
Cheers,
Ben.
> > > 
> > > > The series of patches intend to support PCI slot for PowerPC
> > > > PowerNV platform,
> > > > which is running on top of skiboot firmware. The patchset
> > > > requires corresponding
> > > > changes from skiboot firmware, which is sent to skiboot@lists.o
> > > > zlabs.org
> > > > for review. The PCI slots are exposed by skiboot with device
> > > > node properties,
> > > > and kernel utilizes those properties to populated PCI slots
> > > > accordingly.
> > > > 
> > > > The original PCI infrastructure on PowerNV platform can't
> > > > support hotplug
> > > > because the PE is assigned during PHB fixup time, which is
> > > > called for once
> > > > during system boot time. For this, the PCI infrastructure on
> > > > PowerNV platform
> > > > has been reworked for a lot. After that, the PE and its
> > > > corresponding resources
> > > > (IODT, M32DT, M64 segments, DMA32 and bypass window) are
> > > > assigned upon updating
> > > > PCI bridge's resources, which might decide PE# assigned to the
> > > > PE (e.g. M64
> > > > resources, on P8 strictly speaking). Each PE will maintain a
> > > > reference count,
> > > > which is (number of child PCI devices + 1). That indicates when
> > > > last child PCI
> > > > device leaves the PE, the PE and its included resources will be
> > > > relased and put
> > > > back into free pool again. With this design, the PE will be
> > > > released when EEH PE
> > > > is released. PATCH[1 - 27] are related to this part.
> > > > 
> > > > > From skiboot perspective, PCI slot is providing
> > > > > (hot/fundamental/complete)
> > > > resets to EEH. The kernel gets to know if skiboot supports
> > > > various reset on one
> > > > particular PCI slot through device-tree node. If it does, EEH
> > > > will utilize the
> > > > functionality provided by skiboot. Besides, the device-tree
> > > > nodes have to change
> > > > in order to support PCI hotplug. For example, when one PCI
> > > > adapter inserted to
> > > > one slot, its device-tree node should be added to the system
> > > > dynamically. Conversely,
> > > > the device-tree node should be removed from the system when the
> > > > PCI adapter is going
> > > > to be offline. Since pci_dn and eeh_dev have same life cyle as
> > > > PCI device nodes,
> > > > they should be added/removed accordingly during PCI hotplug.
> > > > PATCH[28 - 43] are
> > > > doing the related work.
> > > > 
> > > > The OF driver is changed to support unflattening FDT blob for
> > > > sub-stree, which
> > > > is covered by PATCH[44 - 49].
> > > > 
> > > > The last one, PATCH[50], is the standalone PCI hotplug driver
> > > > for PowerPC PowerNV
> > > > platform. 
> > > > 
> > > > Changelog
> > > > =========
> > > > v7:
> > > >  * Reworked revision to some extent.
> > > >  * Rebased to powerpc/next repository.
> > > >  * Reorder/split/merge/drop according - Alexey.
> > > >  * Defined macros and use array to track IO/M32/M64/DMA32
> > > > segments - Alexey.
> > > >  * Merged 3 files to one for the hotplug driver - Alexey.
> > > >  * As part of OPAL API, defined macros for PCI slot power
> > > > state, hotplug
> > > >    message type. Defined macros for PCI slot power confirmed
> > > > state in
> > > >    hotplug driver.
> > > >  * Misc comments from Alexey.
> > > >  * Reworked unflatten_dt_node() to avoid recursive function
> > > > calls.
> > > >  * Use EXPORT_SYMBOL_GPL() and document function's input/output
> > > > - Rob/Frank.
> > > > v6:
> > > >  * Patch reorder, split, squash - Alexey.
> > > >  * Minor coding style - Alexey.
> > > >  * Better function names for pcibios_{add,remove}_pci_devices -
> > > > Bjorn
> > > >  * Replace pr_warn() with dev_warn() in PowerNV hotplug driver
> > > > - Bjorn
> > > >  * Concurrent depth as parameter passed to
> > > > __unflatten_dt_node() - Grant / Alexey
> > > >  * Replace overlay with of_changeset - Grant
> > > > v5:
> > > >  * Rebased to 4.1.rc6 and some unmerged patches as below:
> > > >    Alexey's DDW patchset (v11);
> > > >    Gavin's EEH error injection support (in mpe's next branch);
> > > >    Richard's EEH cleanup patches (in mpe's next branch);
> > > >    Richard's EEH support for VF (v7);
> > > >    Gavin's misc EEH fixes for 4.2;
> > > >  * The revision bases on skiboot corresponding patches (v7):
> > > >    https://patchwork.ozlabs.org/patch/480437/
> > > >  * Utilize OF overlay to update device-tree with help of newly
> > > > introduced
> > > >    OPAL API opal_get_overlay_dt().
> > > >  * Split patches for easy review according to aik's comments.
> > > >  * Fix coding style from checkpatchc.pl as pointed by aik.
> > > >  * Code cleanup and misc fixup according to aik's input.
> > > > v4:
> > > >  * Rebased to 4.1.RC1
> > > >  * Added API to unflatten FDT blob to device node sub-tree,
> > > > which is attached
> > > >    the indicated parent device node. The original mechanism
> > > > based on formatted
> > > >    string stream has been dropped.
> > > >  * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device
> > > > during hotplug")
> > > >    was picked up sent to linux-ppc@ separately for review as
> > > > Richard's "VF EEH
> > > >    Support" depends on that.
> > > > v3:
> > > >  * Rebased to 4.1.RC0
> > > >  * PowerNV PCI infrasturcture is total refactored in order to
> > > > support PCI
> > > >    hotplug. The PowerNV hotplug driver is also reworked a lot
> > > > because of
> > > >    the changes in skiboot in order to support PCI hotplug.
> > > > 
> > > > Gavin Shan (50):
> > > > PCI: Add pcibios_setup_bridge()
> > > > powerpc/pci: Override pcibios_setup_bridge()
> > > > powerpc/pci: Cleanup on struct pci_controller_ops
> > > > powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops
> > > > powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
> > > > powerpc/powernv: Drop phb->bdfn_to_pe()
> > > > powerpc/powernv: Reorder fields in struct pnv_phb
> > > > powerpc/powernv: Rename PE# fields in struct pnv_phb
> > > > powerpc/powernv: Fix initial IO and M32 segmap
> > > > powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
> > > > powerpc/powernv: IO and M32 mapping based on PCI device
> > > > resources
> > > > powerpc/powernv: Track M64 segment consumption
> > > > powerpc/powernv: Rename M64 related functions
> > > > powerpc/powernv: M64 support on P7IOC
> > > > powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe()
> > > > powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE
> > > > powerpc/powernv: Avoid calculating DMA32 segments on PHB3
> > > > powerpc/powernv: Remove DMA32 PE list
> > > > powerpc/powernv: Track DMA32 segment consumption
> > > > powerpc/powernv: Improve DMA32 segment calculation
> > > > powerpc/powernv: Increase PE# capacity
> > > > powerpc/powernv: Introduce pnv_ioda_init_pe()
> > > > powerpc/powernv: Use PE instead of number during setup and
> > > > release
> > > > powerpc/powernv: Allocate PE# in reverse order
> > > > powerpc/powernv: Reserve PE for root bus
> > > > powerpc/powernv: Create PEs at PCI hot plugging time
> > > > powerpc/powernv: Dynamically release PEs
> > > > powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
> > > > powerpc/pci: Rename pcibios_find_pci_bus()
> > > > powerpc/pci: Move pci_find_bus_by_node() around
> > > > powerpc/pci: Export pci_add_device_node_info()
> > > > powerpc/pci: Introduce pci_remove_device_node_info()
> > > > powerpc/pci: Export pci_traverse_device_nodes()
> > > > powerpc/pci: Delay populating pdn
> > > > powerpc/pci: Don't scan empty slot
> > > > powerpc/pci: Update bridge windows on PCI plug
> > > > powerpc/powernv: Simplify pnv_eeh_reset()
> > > > powerpc/powernv: Exclude root bus in
> > > > pnv_pci_reset_secondary_bus()
> > > > powerpc/powernv: Fundamental reset in
> > > > pnv_pci_reset_secondary_bus()
> > > > powerpc/powernv: Support PCI slot ID
> > > > powerpc/powernv: Use firmware PCI slot reset infrastructure
> > > > powerpc/powernv: Functions to get/set PCI slot status
> > > > powerpc/powernv: Select OF_DYNAMIC
> > > > drivers/of: Split unflatten_dt_node()
> > > > drivers/of: Avoid recursively calling unflatten_dt_node()
> > > > drivers/of: Rename unflatten_dt_node()
> > > > drivers/of: Specify parent node in of_fdt_unflatten_tree()
> > > > drivers/of: Return allocated memory from
> > > > of_fdt_unflatten_tree()
> > > > drivers/of: Export OF changeset functions
> > > > PCI/hotplug: PowerPC PowerNV PCI hotplug driver
> > > > 
> > > > MAINTAINERS                                    |    6 +
> > > > arch/powerpc/include/asm/eeh.h                 |    2 +-
> > > > arch/powerpc/include/asm/opal-api.h            |   17 +-
> > > > arch/powerpc/include/asm/opal.h                |    8 +-
> > > > arch/powerpc/include/asm/pci-bridge.h          |   25 +-
> > > > arch/powerpc/include/asm/pnv-pci.h             |    7 +
> > > > arch/powerpc/include/asm/ppc-pci.h             |    8 +-
> > > > arch/powerpc/kernel/eeh_dev.c                  |   19 +-
> > > > arch/powerpc/kernel/eeh_driver.c               |   12 +-
> > > > arch/powerpc/kernel/pci-common.c               |   16 +-
> > > > arch/powerpc/kernel/pci-hotplug.c              |   47 +-
> > > > arch/powerpc/kernel/pci_dn.c                   |   85 +-
> > > > arch/powerpc/platforms/maple/pci.c             |   34 +-
> > > > arch/powerpc/platforms/pasemi/pci.c            |    3 -
> > > > arch/powerpc/platforms/powermac/pci.c          |   38 +-
> > > > arch/powerpc/platforms/powernv/Kconfig         |    1 +
> > > > arch/powerpc/platforms/powernv/eeh-powernv.c   |  173 ++--
> > > > arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
> > > > arch/powerpc/platforms/powernv/pci-ioda.c      | 1251
> > > > +++++++++++++++---------
> > > > arch/powerpc/platforms/powernv/pci.c           |   92 +-
> > > > arch/powerpc/platforms/powernv/pci.h           |   62 +-
> > > > arch/powerpc/platforms/pseries/msi.c           |    4 +-
> > > > arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
> > > > arch/powerpc/platforms/pseries/setup.c         |    8 +-
> > > > drivers/of/dynamic.c                           |   65 +-
> > > > drivers/of/fdt.c                               |  378 ++++---
> > > > drivers/of/of_private.h                        |    2 +
> > > > drivers/of/overlay.c                           |    8 +-
> > > > drivers/of/unittest.c                          |    6 +-
> > > > drivers/pci/hotplug/Kconfig                    |   12 +
> > > > drivers/pci/hotplug/Makefile                   |    3 +
> > > > drivers/pci/hotplug/pnv_php.c                  |  866
> > > > ++++++++++++++++
> > > > drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
> > > > drivers/pci/hotplug/rpaphp_core.c              |    4 +-
> > > > drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
> > > > drivers/pci/setup-bus.c                        |    5 +
> > > > include/linux/of_fdt.h                         |    5 +-
> > > > include/linux/pci.h                            |    1 +
> > > > 38 files changed, 2389 insertions(+), 932 deletions(-)
> > > > create mode 100644 drivers/pci/hotplug/pnv_php.c
> > > > 
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources
  2015-11-04 13:12 ` [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
@ 2015-11-12  3:30   ` Daniel Axtens
  2015-11-12  4:55     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-12  3:30 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 4654 bytes --]
Hi Gavin,
Sorry to have taken so long to resume these reviews!
> Currently, the IO and M32 segments are mapped to the corresponding
> PE based on the windows of the parent bridge of PE's primary bus.
> It's not going to work when the windows of root port or upstream
> port of the PCIe switch behind root port are extended to PHB's
> aperatuses in order to support hotplug in subsequent patch.
I'm not _entirely_ sure I understand this.
I *think* you mean PHB's apertures (i.e. s/aperatuses/apertures/)?
> This fixes the issue by mapping IO and M32 segments based on the
> resources of the PCI devices included in the PE, instead of the
> windows of the parent bridge of the PE's primary bus.
This solution seems to make a lot of sense, but I don't have a very good
understanding of PCI yet: why was it done that way and not this way
originally? Looking at the code, it looks like the old way was simple
but didn't support SR-IOV?
There are a few comments inline as well.
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 553d3f3..4ab93f8 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2741,71 +2741,90 @@ truncate_iov:
>  }
>  #endif /* CONFIG_PCI_IOV */
>  
> -/*
> - * This function is supposed to be called on basis of PE from top
> - * to bottom style. So the the I/O or MMIO segment assigned to
> - * parent PE could be overrided by its child PEs if necessary.
> - */
> -static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
> -				  struct pnv_ioda_pe *pe)
> +static int pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
> +				  struct resource *res)
>  {
> -	struct pnv_phb *phb = hose->private_data;
> +	struct pnv_phb *phb = pe->phb;
>  	struct pci_bus_region region;
> -	struct resource *res;
> -	unsigned int segsize;
> -	int *segmap, index, i;
> +	unsigned int index, segsize;
> +	int *segmap;
>  	uint16_t win;
>  	int64_t rc;
s/int64_t/s64/;
I think we might also want to change the uint16_t as well.
> -	/*
> -	 * NOTE: We only care PCI bus based PE for now. For PCI
> -	 * device based PE, for example SRIOV sensitive VF should
> -	 * be figured out later.
> -	 */
> -	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
> +	if (!res->parent || !res->flags || res->start > res->end)
> +		return 0;
>  
> -	pci_bus_for_each_resource(pe->pbus, res, i) {
> -		if (!res || !res->flags ||
> -		    res->start > res->end)
> -			continue;
> +	if (res->flags & IORESOURCE_IO) {
> +		region.start = res->start - phb->ioda.io_pci_base;
> +		region.end   = res->end - phb->ioda.io_pci_base;
> +		segsize      = phb->ioda.io_segsize;
> +		segmap       = phb->ioda.io_segmap;
> +		win          = OPAL_IO_WINDOW_TYPE;
> +	} else if ((res->flags & IORESOURCE_MEM) &&
> +		   !pnv_pci_is_mem_pref_64(res->flags)) {
> +		region.start = res->start -
> +			       phb->hose->mem_offset[0] -
> +			       phb->ioda.m32_pci_base;
> +		region.end   = res->end -
> +			       phb->hose->mem_offset[0] -
> +			       phb->ioda.m32_pci_base;
> +		segsize      = phb->ioda.m32_segsize;
> +		segmap       = phb->ioda.m32_segmap;
> +		win          = OPAL_M32_WINDOW_TYPE;
> +	} else {
> +		return 0;
The return codes are currently unused, but should this get a more
informative return code? Are there any invalid ones that should be
flagged, or is it just safe to ignore stuff we don't recognise?
> +	}
> +static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
> +{
> +	struct pci_dev *pdev;
> +	struct resource *res;
> +	int i;
> +
> +	/* This function only works for bus dependent PE */
> +	WARN_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
> +
> +	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
> +		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
> +			res = &pdev->resource[i];
> +			if (pnv_ioda_setup_one_res(pe, res))
> +				return;
As I mentioned earlier, setup_one_res can potentially return -EIO:
should we be trying to propagate that up?
> +		}
> +
> +		/*
> +		 * If the PE contains all subordinate PCI buses, the
> +		 * windows of the child bridges should be mapped to
> +		 * the PE as well.
> +		 */
> +		if (!(pe->flags & PNV_IODA_PE_BUS_ALL && pci_is_bridge(pdev)))
> +			continue;
>  
> -			region.start += segsize;
> -			index++;
> +		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
> +			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
> +			if (pnv_ioda_setup_one_res(pe, res))
> +				return;
>  		}
>  	}
>  }
Regards,
Daniel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption
  2015-11-04 13:12 ` [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption Gavin Shan
@ 2015-11-12  4:18   ` Daniel Axtens
  2015-11-16  8:01   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 157+ messages in thread
From: Daniel Axtens @ 2015-11-12  4:18 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]
Looks good.
Will hold off on an official review until I can test the series.
Regards,
Daniel
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> As we track M32 segment consumption, this introduces an array to
> the PHB to track the mapping between M64 segment and PE number.
> The information is going to be used to find M64 segment from the
> PE number during PCI unplugging time in subsequent patches.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++--
>  arch/powerpc/platforms/powernv/pci.h      |  3 ++-
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 4ab93f8..76ce694 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -315,6 +315,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>  		phb->ioda.total_pe_num) {
>  		pe = &phb->ioda.pe_array[i];
>  
> +		phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
>  		if (!master_pe) {
>  			pe->flags |= PNV_IODA_PE_MASTER;
>  			INIT_LIST_HEAD(&pe->slaves);
> @@ -3018,7 +3019,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  {
>  	struct pci_controller *hose;
>  	struct pnv_phb *phb;
> -	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
> +	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>  	const __be64 *prop64;
>  	const __be32 *prop32;
>  	int i, len;
> @@ -3103,6 +3104,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  
>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
> +	m64map_off = size;
> +	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
>  	m32map_off = size;
>  	size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
>  	if (phb->type == PNV_PHB_IODA1) {
> @@ -3113,9 +3116,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>  	aux = memblock_virt_alloc(size, 0);
>  	phb->ioda.pe_alloc = aux;
> +	phb->ioda.m64_segmap = aux + m64map_off;
>  	phb->ioda.m32_segmap = aux + m32map_off;
> -	for (i = 0; i < phb->ioda.total_pe_num; i++)
> +	for (i = 0; i < phb->ioda.total_pe_num; i++) {
> +		phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
>  		phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
> +	}
>  	if (phb->type == PNV_PHB_IODA1) {
>  		phb->ioda.io_segmap = aux + iomap_off;
>  		for (i = 0; i < phb->ioda.total_pe_num; i++)
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 2e01edd..671fd13 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -147,7 +147,8 @@ struct pnv_phb {
>  			unsigned long		*pe_alloc;
>  			struct pnv_ioda_pe	*pe_array;
>  
> -			/* M32 & IO segment maps */
> +			/* M64/M32/IO segment maps */
> +			int			*m64_segmap;
>  			int			*m32_segmap;
>  			int			*io_segmap;
>  
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources
  2015-11-12  3:30   ` Daniel Axtens
@ 2015-11-12  4:55     ` Gavin Shan
  2015-11-16  8:01       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-12  4:55 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Thu, Nov 12, 2015 at 02:30:27PM +1100, Daniel Axtens wrote:
>Hi Gavin,
>
>Sorry to have taken so long to resume these reviews!
>
Thanks for your review, Daniel!
>> Currently, the IO and M32 segments are mapped to the corresponding
>> PE based on the windows of the parent bridge of PE's primary bus.
>> It's not going to work when the windows of root port or upstream
>> port of the PCIe switch behind root port are extended to PHB's
>> aperatuses in order to support hotplug in subsequent patch.
>I'm not _entirely_ sure I understand this.
>
>I *think* you mean PHB's apertures (i.e. s/aperatuses/apertures/)?
>
I'll fix the typo in next revision.
>> This fixes the issue by mapping IO and M32 segments based on the
>> resources of the PCI devices included in the PE, instead of the
>> windows of the parent bridge of the PE's primary bus.
>
>This solution seems to make a lot of sense, but I don't have a very good
>understanding of PCI yet: why was it done that way and not this way
>originally? Looking at the code, it looks like the old way was simple
>but didn't support SR-IOV?
>
It's not related to SRIOV. Originally, the IO or M32 segments are mapped
according to the bridge's windows. The bridge windows on root port or the
upstream port of the switch behind that will be extended to PHB's apertures.
If we still use bridge's windows, all IO and M32 resources are mapped/assigned
to the PE corresponding to PCI bus#1 or PCI bus#2. That's not correct any more.
So the correct way is to do the mapping based on IO or M32 BARs of the devices
included in the PE.
>There are a few comments inline as well.
>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 553d3f3..4ab93f8 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -2741,71 +2741,90 @@ truncate_iov:
>>  }
>>  #endif /* CONFIG_PCI_IOV */
>>  
>> -/*
>> - * This function is supposed to be called on basis of PE from top
>> - * to bottom style. So the the I/O or MMIO segment assigned to
>> - * parent PE could be overrided by its child PEs if necessary.
>> - */
>> -static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>> -				  struct pnv_ioda_pe *pe)
>> +static int pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
>> +				  struct resource *res)
>>  {
>> -	struct pnv_phb *phb = hose->private_data;
>> +	struct pnv_phb *phb = pe->phb;
>>  	struct pci_bus_region region;
>> -	struct resource *res;
>> -	unsigned int segsize;
>> -	int *segmap, index, i;
>> +	unsigned int index, segsize;
>> +	int *segmap;
>>  	uint16_t win;
>>  	int64_t rc;
>
>s/int64_t/s64/;
>I think we might also want to change the uint16_t as well.
>
As I explained before, I changed it from s64 to int64_t and I won't change it
back since both of them are fine. Same situation to uint16 here. If we really
want to clean it all at once, I can do that later, but not in this patchset.
>> -	/*
>> -	 * NOTE: We only care PCI bus based PE for now. For PCI
>> -	 * device based PE, for example SRIOV sensitive VF should
>> -	 * be figured out later.
>> -	 */
>> -	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>> +	if (!res->parent || !res->flags || res->start > res->end)
>> +		return 0;
>>  
>> -	pci_bus_for_each_resource(pe->pbus, res, i) {
>> -		if (!res || !res->flags ||
>> -		    res->start > res->end)
>> -			continue;
>> +	if (res->flags & IORESOURCE_IO) {
>> +		region.start = res->start - phb->ioda.io_pci_base;
>> +		region.end   = res->end - phb->ioda.io_pci_base;
>> +		segsize      = phb->ioda.io_segsize;
>> +		segmap       = phb->ioda.io_segmap;
>> +		win          = OPAL_IO_WINDOW_TYPE;
>> +	} else if ((res->flags & IORESOURCE_MEM) &&
>> +		   !pnv_pci_is_mem_pref_64(res->flags)) {
>> +		region.start = res->start -
>> +			       phb->hose->mem_offset[0] -
>> +			       phb->ioda.m32_pci_base;
>> +		region.end   = res->end -
>> +			       phb->hose->mem_offset[0] -
>> +			       phb->ioda.m32_pci_base;
>> +		segsize      = phb->ioda.m32_segsize;
>> +		segmap       = phb->ioda.m32_segmap;
>> +		win          = OPAL_M32_WINDOW_TYPE;
>> +	} else {
>> +		return 0;
>The return codes are currently unused, but should this get a more
>informative return code? Are there any invalid ones that should be
>flagged, or is it just safe to ignore stuff we don't recognise?
>
It's safe to ignore M64 (64-bits prefetchable BARs) whose mapping is
done in different path.
>> +	}
>
>
>> +static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
>> +{
>> +	struct pci_dev *pdev;
>> +	struct resource *res;
>> +	int i;
>> +
>> +	/* This function only works for bus dependent PE */
>> +	WARN_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>> +
>> +	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
>> +		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>> +			res = &pdev->resource[i];
>> +			if (pnv_ioda_setup_one_res(pe, res))
>> +				return;
>As I mentioned earlier, setup_one_res can potentially return -EIO:
>should we be trying to propagate that up?
I think it's a good idea. I'll do in next revision.
>> +		}
>> +
>> +		/*
>> +		 * If the PE contains all subordinate PCI buses, the
>> +		 * windows of the child bridges should be mapped to
>> +		 * the PE as well.
>> +		 */
>> +		if (!(pe->flags & PNV_IODA_PE_BUS_ALL && pci_is_bridge(pdev)))
>> +			continue;
>>  
>> -			region.start += segsize;
>> -			index++;
>> +		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
>> +			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
>> +			if (pnv_ioda_setup_one_res(pe, res))
>> +				return;
>>  		}
>>  	}
>>  }
>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset()
  2015-11-04 13:12 ` [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
@ 2015-11-12  5:11   ` Daniel Axtens
  2015-11-12  6:11     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-12  5:11 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 665 bytes --]
> -			rc = opal_pci_reset(phb->opal_id,
> -					    OPAL_RESET_PHB_ERROR,
> -					    OPAL_ASSERT_RESET);
> -			if (rc != OPAL_SUCCESS) {
> -				pr_warn("%s: Failure %lld clearing "
> -					"error injection registers\n",
> -					__func__, rc);
This is very minor, but is there a good reason to change the error
message from the one above to the one below? I just hesitate to change
error messages that people might be grepping the source for without a
good reason.
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("%s: Error %lld clearing error injection\n",
> +				__func__, rc);
Apart from that this looks good, pending me actually testing it :)
Regards,
Daniel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset()
  2015-11-12  5:11   ` Daniel Axtens
@ 2015-11-12  6:11     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-12  6:11 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Thu, Nov 12, 2015 at 04:11:12PM +1100, Daniel Axtens wrote:
>> -			rc = opal_pci_reset(phb->opal_id,
>> -					    OPAL_RESET_PHB_ERROR,
>> -					    OPAL_ASSERT_RESET);
>> -			if (rc != OPAL_SUCCESS) {
>> -				pr_warn("%s: Failure %lld clearing "
>> -					"error injection registers\n",
>> -					__func__, rc);
>
>This is very minor, but is there a good reason to change the error
>message from the one above to the one below? I just hesitate to change
>error messages that people might be grepping the source for without a
>good reason.
>
About 3 years ago, I think the error message printed by pr_warn() can't
exceed 80 lines each line. Otherwise, scripts/checkpatch.pl will report
warnings. The error message spans multiple lines to avoid that. However,
that turned to be wrong later. If people searchs the code from the error
or warning message, it'd better to keep it in one line, not in multiple
lines. That's the reason I merged them into one line since I have to
refactor the function. At same time, the message is shortened as "Error"
is shorter than "Failure" and "registers" in original message is meaningless.
>> +		if (rc != OPAL_SUCCESS) {
>> +			pr_warn("%s: Error %lld clearing error injection\n",
>> +				__func__, rc);
>
>Apart from that this looks good, pending me actually testing it :)
>
Thanks :-)
Gavin
>Regards,
>Daniel
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 39/50] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2015-11-04 13:12 ` [PATCH v7 39/50] powerpc/powernv: Fundamental reset " Gavin Shan
@ 2015-11-12  6:15   ` Gavin Shan
  2015-11-13  0:08   ` Daniel Axtens
  2015-11-13  0:23   ` Daniel Axtens
  2 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-12  6:15 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, aik, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On Thu, Nov 05, 2015 at 12:12:39AM +1100, Gavin Shan wrote:
>In pnv_pci_reset_secondary_bus(), we should issue fundamental
>reset if any one subordinate device of the specified is requesting
                                        ^^^^^^^^^^^^^^
                                        the specified bus
I put the note reminding me to admend the changelog in next revision.
>that. Otherwise, the device might not come up after the reset.
>
>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>---
> arch/powerpc/platforms/powernv/eeh-powernv.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>index c69b6a1..ab8b93e 100644
>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>@@ -878,9 +878,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
> 	return 0;
> }
>
>+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>+{
>+	int *freset = data;
>+
>+	/*
>+	 * Stop the iteration immediately if there has any one
>+	 * PCI device requesting fundamental reset.
>+	 */
>+	*freset |= pdev->needs_freset;
>+	return *freset;
>+}
>+
> void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
> {
>-	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>+	int option, freset = 0;
>+
>+	if (dev->subordinate)
>+		pci_walk_bus(dev->subordinate,
>+			     pnv_pci_dev_reset_type, &freset);
>+
>+	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
>+	pnv_eeh_bridge_reset(dev, option);
> 	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
> }
>
>-- 
>2.1.0
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  2015-11-04 13:12 ` [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
@ 2015-11-12 22:59   ` Daniel Axtens
  2015-11-12 23:25     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-12 22:59 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 2774 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> When pnv_pci_reset_secondary_bus() is called to issue reset on
> the indicated secondary bus, the bus can't be root bus. So we
> needn't consider root bus in the function.
It took me a while to convince myself that this is correct. For the
record, this is why it's correct:
pnv_pci_reset_secondary_bus fills the reset_secondary_bus callback in
the pci_controller_ops structure, and isn't used elsewhere.
In arch/powerpc/kernel/pci.c, that callback is called (if it exists) in 
pcibios_reset_secondary_bus(). It's not called anywhere else.
The PPC pcibios_reset_secondary_bus overrides the weak version in
drivers/pci/pci.c. It's called from the same file by
pci_reset_bridge_secondary_device() (and nowhere else).
pci_reset_bridge_secondary_device() is nicely documented:
/**
 * pci_reset_bridge_secondary_bus - Reset the secondary bus on a PCI bridge.
 * @dev: Bridge device
 *
 * Use the bridge control register to assert reset on the secondary bus.
 * Devices on the secondary bus are left in power-on state.
 */
Therefore, by the definiton of pci_reset_bridge_secondary_bus,
pnv_pci_reset_secondary_bus() can only be called with a bridge
device. As such, a bridge reset only is appropriate. If this breaks
anything, the caller is broken.
It might be worth including a condensed version of this in the commit
message.
Reviewed-by: Daniel Axtens <dja@axtens.net>
Regards,
Daniel Axtens
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index a7d84a4..c69b6a1 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -880,16 +880,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>  
>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>  {
> -	struct pci_controller *hose;
> -
> -	if (pci_is_root_bus(dev->bus)) {
> -		hose = pci_bus_to_host(dev->bus);
> -		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
> -		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
> -	} else {
> -		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> -		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
> -	}
> +	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> +	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>  }
>  
>  /**
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
  2015-11-12 22:59   ` Daniel Axtens
@ 2015-11-12 23:25     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-12 23:25 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Fri, Nov 13, 2015 at 09:59:27AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> When pnv_pci_reset_secondary_bus() is called to issue reset on
>> the indicated secondary bus, the bus can't be root bus. So we
>> needn't consider root bus in the function.
>
>It took me a while to convince myself that this is correct. For the
>record, this is why it's correct:
>
>pnv_pci_reset_secondary_bus fills the reset_secondary_bus callback in
>the pci_controller_ops structure, and isn't used elsewhere.
>
>In arch/powerpc/kernel/pci.c, that callback is called (if it exists) in 
>pcibios_reset_secondary_bus(). It's not called anywhere else.
>
>The PPC pcibios_reset_secondary_bus overrides the weak version in
>drivers/pci/pci.c. It's called from the same file by
>pci_reset_bridge_secondary_device() (and nowhere else).
>
>pci_reset_bridge_secondary_device() is nicely documented:
>
>/**
> * pci_reset_bridge_secondary_bus - Reset the secondary bus on a PCI bridge.
> * @dev: Bridge device
> *
> * Use the bridge control register to assert reset on the secondary bus.
> * Devices on the secondary bus are left in power-on state.
> */
>
>Therefore, by the definiton of pci_reset_bridge_secondary_bus,
>pnv_pci_reset_secondary_bus() can only be called with a bridge
>device. As such, a bridge reset only is appropriate. If this breaks
>anything, the caller is broken.
>
>It might be worth including a condensed version of this in the commit
>message.
>
Right. I'll add more description to the changelog in next revision.
>Reviewed-by: Daniel Axtens <dja@axtens.net>
>
Thanks,
Gavin
>Regards,
>Daniel Axtens
>
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++----------
>>  1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> index a7d84a4..c69b6a1 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> @@ -880,16 +880,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>  
>>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>  {
>> -	struct pci_controller *hose;
>> -
>> -	if (pci_is_root_bus(dev->bus)) {
>> -		hose = pci_bus_to_host(dev->bus);
>> -		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
>> -		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
>> -	} else {
>> -		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>> -		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>> -	}
>> +	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>> +	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>>  }
>>  
>>  /**
>> -- 
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 39/50] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2015-11-04 13:12 ` [PATCH v7 39/50] powerpc/powernv: Fundamental reset " Gavin Shan
  2015-11-12  6:15   ` Gavin Shan
@ 2015-11-13  0:08   ` Daniel Axtens
  2015-11-13  0:20     ` Gavin Shan
  2015-11-13  0:23     ` Benjamin Herrenschmidt
  2015-11-13  0:23   ` Daniel Axtens
  2 siblings, 2 replies; 157+ messages in thread
From: Daniel Axtens @ 2015-11-13  0:08 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 803 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>  {
> -	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> +	int option, freset = 0;
> +
> +	if (dev->subordinate)
> +		pci_walk_bus(dev->subordinate,
> +			     pnv_pci_dev_reset_type, &freset);
> +
> +	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
> +	pnv_eeh_bridge_reset(dev, option);
According to the skiboot sources, fundamental reset isn't supported on
p5ioc2. As far as I can tell from your corresponding skiboot patches,
this is still the case after they are applied. Do we need a fallback to
EEH_RESET_HOT in this case? Otherwise there will be no reset performed
at all.
Likewise, if the FUNDAMENTAL reset fails for any reason, should we fall
back to a HOT reset?
Regards,
Daniel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 39/50] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2015-11-13  0:08   ` Daniel Axtens
@ 2015-11-13  0:20     ` Gavin Shan
  2015-11-13  0:23     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-13  0:20 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Fri, Nov 13, 2015 at 11:08:29AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>  {
>> -	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>> +	int option, freset = 0;
>> +
>> +	if (dev->subordinate)
>> +		pci_walk_bus(dev->subordinate,
>> +			     pnv_pci_dev_reset_type, &freset);
>> +
>> +	option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
>> +	pnv_eeh_bridge_reset(dev, option);
>
>According to the skiboot sources, fundamental reset isn't supported on
>p5ioc2. As far as I can tell from your corresponding skiboot patches,
>this is still the case after they are applied. Do we need a fallback to
>EEH_RESET_HOT in this case? Otherwise there will be no reset performed
>at all.
>
>Likewise, if the FUNDAMENTAL reset fails for any reason, should we fall
>back to a HOT reset?
>
P5IOC2 won't export any PCI slots. So kernel won't issue fundamental reset
to PCI buses on P5IOC2.
We had the failback: hot reset is picked if fundamental reset can't be
supported on the target PCI bus. In case fundamental reset fails, we
shouldn't go ahead try hot reset.
Thanks,
Gavin
>Regards,
>Daniel
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 39/50] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2015-11-13  0:08   ` Daniel Axtens
  2015-11-13  0:20     ` Gavin Shan
@ 2015-11-13  0:23     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 157+ messages in thread
From: Benjamin Herrenschmidt @ 2015-11-13  0:23 UTC (permalink / raw)
  To: Daniel Axtens, Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On Fri, 2015-11-13 at 11:08 +1100, Daniel Axtens wrote:
> Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> 
> >  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
> >  {
> > -> > 	> > pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> > +> > 	> > int option, freset = 0;
> > +
> > +> > 	> > if (dev->subordinate)
> > +> > 	> > 	> > pci_walk_bus(dev->subordinate,
> > +> > 	> > 	> > 	> >      pnv_pci_dev_reset_type, &freset);
> > +
> > +> > 	> > option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
> > +> > 	> > pnv_eeh_bridge_reset(dev, option);
> 
> According to the skiboot sources, fundamental reset isn't supported on
> p5ioc2. As far as I can tell from your corresponding skiboot patches,
> this is still the case after they are applied. Do we need a fallback to
> EEH_RESET_HOT in this case? Otherwise there will be no reset performed
> at all.
We don't really care that much about what happens on p5ioc2 :-)
> Likewise, if the FUNDAMENTAL reset fails for any reason, should we fall
> back to a HOT reset?
Probably.
Cheers,
Ben.
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 39/50] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  2015-11-04 13:12 ` [PATCH v7 39/50] powerpc/powernv: Fundamental reset " Gavin Shan
  2015-11-12  6:15   ` Gavin Shan
  2015-11-13  0:08   ` Daniel Axtens
@ 2015-11-13  0:23   ` Daniel Axtens
  2 siblings, 0 replies; 157+ messages in thread
From: Daniel Axtens @ 2015-11-13  0:23 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
Following some discussion on IRC, it looks like there are roughly 2
machines on the planet with skiboot and p5ioc2, so I'm not worried about
that any more.
I am still vaguely concerned about a failing fundamental reset.
Regards,
Daniel
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 08/50] powerpc/powernv: Rename PE# fields in struct pnv_phb
  2015-11-04 13:12 ` [PATCH v7 08/50] powerpc/powernv: Rename PE# " Gavin Shan
@ 2015-11-16  8:01   ` Alexey Kardashevskiy
  2015-11-17  1:22     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:01 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This renames the fields related to PE number in "struct pnv_phb"
> for better reflecting of their usages as Alexey suggested. No
> logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c    | 56 ++++++++++++++--------------
>   arch/powerpc/platforms/powernv/pci.c         |  2 +-
>   arch/powerpc/platforms/powernv/pci.h         |  4 +-
>   4 files changed, 32 insertions(+), 32 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index e1c9072..861a7d2 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
>   		 * and P7IOC separately. So we should regard
>   		 * PE#0 as valid for PHB3 and P7IOC.
>   		 */
> -		if (phb->ioda.reserved_pe != 0)
> +		if (phb->ioda.reserved_pe_idx != 0)
>   			eeh_add_flag(EEH_VALID_PE_ZERO);
>
>   		break;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 968da91..b4932c3 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -134,7 +134,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>
>   static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   {
> -	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
> +	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
>   		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>   			__func__, pe_no, phb->hose->global_number);
>   		return;
> @@ -154,8 +154,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>
>   	do {
>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
> -					phb->ioda.total_pe, 0);
> -		if (pe >= phb->ioda.total_pe)
> +					phb->ioda.total_pe_num, 0);
> +		if (pe >= phb->ioda.total_pe_num)
>   			return IODA_INVALID_PE;
>   	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>
> @@ -209,13 +209,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>   	 * expected to be 0 or last one of PE capabicity.
>   	 */
>   	r = &phb->hose->mem_resources[1];
> -	if (phb->ioda.reserved_pe == 0)
> +	if (phb->ioda.reserved_pe_idx == 0)
>   		r->start += phb->ioda.m64_segsize;
> -	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>   		r->end -= phb->ioda.m64_segsize;
>   	else
>   		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
> -			phb->ioda.reserved_pe);
> +			phb->ioda.reserved_pe_idx);
>
>   	return 0;
>
> @@ -284,7 +284,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   		return IODA_INVALID_PE;
>
>   	/* Allocate bitmap */
> -	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
> +	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>   	pe_alloc = kzalloc(size, GFP_KERNEL);
>   	if (!pe_alloc) {
>   		pr_warn("%s: Out of memory !\n",
> @@ -300,7 +300,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 * contributed by its child buses. For the case, we needn't
>   	 * pick M64 dependent PE#.
>   	 */
> -	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
> +	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>   		kfree(pe_alloc);
>   		return IODA_INVALID_PE;
>   	}
> @@ -311,8 +311,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 */
>   	master_pe = NULL;
>   	i = -1;
> -	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
> -		phb->ioda.total_pe) {
> +	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
> +		phb->ioda.total_pe_num) {
>   		pe = &phb->ioda.pe_array[i];
>
>   		if (!master_pe) {
> @@ -364,7 +364,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	hose->mem_offset[1] = res->start - pci_addr;
>
>   	phb->ioda.m64_size = resource_size(res);
> -	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
> +	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
>   	phb->ioda.m64_base = pci_addr;
>
>   	pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
> @@ -465,7 +465,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>   	s64 rc;
>
>   	/* Sanity check on PE number */
> -	if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
> +	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
>   		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
>
>   	/*
> @@ -1394,9 +1394,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   		} else {
>   			mutex_lock(&phb->ioda.pe_alloc_mutex);
>   			*pdn->pe_num_map = bitmap_find_next_zero_area(
> -				phb->ioda.pe_alloc, phb->ioda.total_pe,
> +				phb->ioda.pe_alloc, phb->ioda.total_pe_num,
>   				0, num_vfs, 0);
> -			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
> +			if (*pdn->pe_num_map >= phb->ioda.total_pe_num) {
>   				mutex_unlock(&phb->ioda.pe_alloc_mutex);
>   				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>   				kfree(pdn->pe_num_map);
> @@ -2670,7 +2670,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   	pdn->m64_single_mode = false;
>
>   	total_vfs = pci_sriov_get_totalvfs(pdev);
> -	mul = phb->ioda.total_pe;
> +	mul = phb->ioda.total_pe_num;
>   	total_vf_bar_sz = 0;
>
>   	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> @@ -2772,7 +2772,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   			region.end   = res->end - phb->ioda.io_pci_base;
>   			index = region.start / phb->ioda.io_segsize;
>
> -			while (index < phb->ioda.total_pe &&
> +			while (index < phb->ioda.total_pe_num &&
>   			       region.start <= region.end) {
>   				phb->ioda.io_segmap[index] = pe->pe_number;
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> @@ -2797,7 +2797,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   				       phb->ioda.m32_pci_base;
>   			index = region.start / phb->ioda.m32_segsize;
>
> -			while (index < phb->ioda.total_pe &&
> +			while (index < phb->ioda.total_pe_num &&
>   			       region.start <= region.end) {
>   				phb->ioda.m32_segmap[index] = pe->pe_number;
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> @@ -3067,13 +3067,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   		pr_err("  Failed to map registers !\n");
>
>   	/* Initialize more IODA stuff */
> -	phb->ioda.total_pe = 1;
> +	phb->ioda.total_pe_num = 1;
>   	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
>   	if (prop32)
> -		phb->ioda.total_pe = be32_to_cpup(prop32);
> +		phb->ioda.total_pe_num = be32_to_cpup(prop32);
>   	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
>   	if (prop32)
> -		phb->ioda.reserved_pe = be32_to_cpup(prop32);
> +		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
It is not related to the patch but you initialize total_pe to 1 before 
checking the device tree (which is ok) but you do not initialize 
reserved_pe and I cannot find where @phb would be zeroed - it is allocated 
by memblock_virt_alloc() which does not do that.
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2015-11-05 23:52     ` Gavin Shan
@ 2015-11-16  8:01       ` Alexey Kardashevskiy
  2015-11-17  0:54         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:01 UTC (permalink / raw)
  To: Gavin Shan, Daniel Axtens
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/06/2015 10:52 AM, Gavin Shan wrote:
> On Fri, Nov 06, 2015 at 09:56:06AM +1100, Daniel Axtens wrote:
>> Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>>
>>> The original implementation of pnv_ioda_setup_pe_seg() configures
>>> IO and M32 segments by separate logics, which can be merged by
>>> by caching @segmap, @seg_size, @win in advance. This shouldn't
>>> cause any behavioural changes.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
>>>   1 file changed, 28 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 7ee7cfe..553d3f3 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -2752,8 +2752,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>   	struct pnv_phb *phb = hose->private_data;
>>>   	struct pci_bus_region region;
>>>   	struct resource *res;
>>> -	int i, index;
>>> -	int rc;
>>> +	unsigned int segsize;
>>> +	int *segmap, index, i;
>>> +	uint16_t win;
>>> +	int64_t rc;
>>
>> Good catch! Opal return codes are 64 bit and that should be explicit
>> in the type. However, I seem to remember that we preferred a different
>> type for 64 bit ints in the kernel. I think it's s64, and there are some
>> other uses of that in pci_ioda.c for return codes.
>>
>
> Both int64_t and s64 are fine. I used s64 for the OPAL return value, but
> Alexey likes "int64_t", which is ok to me as well. I won't change it back
> to s64 :-)
>
>> (I'm actually surprised that's not picked up as a compiler
>> warning. Maybe that's something to look at in future.)
>>
>
> Indeed, I didn't see a warning from gcc.
>
>> The rest of the patch looks good on casual inspection - to be sure I'll
>> test the entire series on a machine. (hopefully, time permitting!)
>>
>
> I run scripts/checkpatch.pl on the patchset. Only one warning came from
> [PATCH 44/50], but I won't bother to change that as the warning was
> brought by original code.
None of these patches failed checkpatch.pl check, what was the error in 44/50?
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources
  2015-11-12  4:55     ` Gavin Shan
@ 2015-11-16  8:01       ` Alexey Kardashevskiy
  2015-11-17  1:33         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:01 UTC (permalink / raw)
  To: Gavin Shan, Daniel Axtens
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/12/2015 03:55 PM, Gavin Shan wrote:
> On Thu, Nov 12, 2015 at 02:30:27PM +1100, Daniel Axtens wrote:
>> Hi Gavin,
>>
>> Sorry to have taken so long to resume these reviews!
>>
>
> Thanks for your review, Daniel!
>
>>> Currently, the IO and M32 segments are mapped to the corresponding
>>> PE based on the windows of the parent bridge of PE's primary bus.
>>> It's not going to work when the windows of root port or upstream
>>> port of the PCIe switch behind root port are extended to PHB's
>>> aperatuses in order to support hotplug in subsequent patch.
>> I'm not _entirely_ sure I understand this.
>>
>> I *think* you mean PHB's apertures (i.e. s/aperatuses/apertures/)?
>>
>
> I'll fix the typo in next revision.
>
>>> This fixes the issue by mapping IO and M32 segments based on the
>>> resources of the PCI devices included in the PE, instead of the
>>> windows of the parent bridge of the PE's primary bus.
>>
>> This solution seems to make a lot of sense, but I don't have a very good
>> understanding of PCI yet: why was it done that way and not this way
>> originally? Looking at the code, it looks like the old way was simple
>> but didn't support SR-IOV?
>>
>
> It's not related to SRIOV. Originally, the IO or M32 segments are mapped
> according to the bridge's windows.
Sorry, I do not understand what this means...
> The bridge windows on root port or the
> upstream port of the switch behind that will be extended to PHB's apertures.
What does "extended" mean here and why would the windows be extended anyway?
> If we still use bridge's windows, all IO and M32 resources are mapped/assigned
> to the PE corresponding to PCI bus#1 or PCI bus#2. That's not correct any more.
> So the correct way is to do the mapping based on IO or M32 BARs of the devices
> included in the PE.
In this patch I see quite a lot of code movements and I fail to spot the 
actual change here...
It used to be a single loop:
pci_bus_for_each_resource(pe->pbus, res, i) {
	/* do stuff for each @res */
}
and now it is 2 loops inside another loop:
list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
		res = &pdev->resource[i];
		/* do stuff for each @res */
	}
	for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
	        res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
		/* do stuff for each @res */
	}
}
Is that correct? If yes, why is not the patch as simple as this? If no, 
what did I miss?
>
>> There are a few comments inline as well.
>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 553d3f3..4ab93f8 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -2741,71 +2741,90 @@ truncate_iov:
>>>   }
>>>   #endif /* CONFIG_PCI_IOV */
>>>
>>> -/*
>>> - * This function is supposed to be called on basis of PE from top
>>> - * to bottom style. So the the I/O or MMIO segment assigned to
>>> - * parent PE could be overrided by its child PEs if necessary.
>>> - */
>>> -static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>> -				  struct pnv_ioda_pe *pe)
>>> +static int pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
>>> +				  struct resource *res)
>>>   {
>>> -	struct pnv_phb *phb = hose->private_data;
>>> +	struct pnv_phb *phb = pe->phb;
>>>   	struct pci_bus_region region;
>>> -	struct resource *res;
>>> -	unsigned int segsize;
>>> -	int *segmap, index, i;
>>> +	unsigned int index, segsize;
>>> +	int *segmap;
>>>   	uint16_t win;
>>>   	int64_t rc;
>>
>> s/int64_t/s64/;
>> I think we might also want to change the uint16_t as well.
>>
>
> As I explained before, I changed it from s64 to int64_t and I won't change it
> back since both of them are fine. Same situation to uint16 here. If we really
> want to clean it all at once, I can do that later, but not in this patchset.
>
>>> -	/*
>>> -	 * NOTE: We only care PCI bus based PE for now. For PCI
>>> -	 * device based PE, for example SRIOV sensitive VF should
>>> -	 * be figured out later.
>>> -	 */
>>> -	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>> +	if (!res->parent || !res->flags || res->start > res->end)
>>> +		return 0;
>>>
>>> -	pci_bus_for_each_resource(pe->pbus, res, i) {
>>> -		if (!res || !res->flags ||
>>> -		    res->start > res->end)
>>> -			continue;
>>> +	if (res->flags & IORESOURCE_IO) {
>>> +		region.start = res->start - phb->ioda.io_pci_base;
>>> +		region.end   = res->end - phb->ioda.io_pci_base;
>>> +		segsize      = phb->ioda.io_segsize;
>>> +		segmap       = phb->ioda.io_segmap;
>>> +		win          = OPAL_IO_WINDOW_TYPE;
>>> +	} else if ((res->flags & IORESOURCE_MEM) &&
>>> +		   !pnv_pci_is_mem_pref_64(res->flags)) {
>>> +		region.start = res->start -
>>> +			       phb->hose->mem_offset[0] -
>>> +			       phb->ioda.m32_pci_base;
>>> +		region.end   = res->end -
>>> +			       phb->hose->mem_offset[0] -
>>> +			       phb->ioda.m32_pci_base;
>>> +		segsize      = phb->ioda.m32_segsize;
>>> +		segmap       = phb->ioda.m32_segmap;
>>> +		win          = OPAL_M32_WINDOW_TYPE;
This code asks for a helper function
pnv_ioda_do_setup_one_res(start, end, segsize, segmap, win)
and then you won't need many local variables (region, segsize, segmap, win) ;)
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-04 13:12 ` [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC Gavin Shan
@ 2015-11-16  8:01   ` Alexey Kardashevskiy
  2015-11-17  1:37     ` Gavin Shan
  2015-11-16  8:02   ` Alexey Kardashevskiy
  2015-11-16  8:02   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:01 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This enables M64 window on P7IOC, which has been enabled on PHB3.
> Different from PHB3 where 16 M64 BARs are supported and each of
> them can be owned by one particular PE# exclusively or divided
> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
> of them are divided to 8 segments. So every P7IOC PHB supports
> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
> M64DT, indicating that one M64 segment can only be pinned to the
> fixed PE#. In order to have same code to support M64 on P7IOC and
> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
> of them is pinned to the fixed PE# by bypassing the function of
> M64DT. In turn, we just need different phb->init_m64() for P7IOC
> and PHB3 to support M64.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>   arch/powerpc/platforms/powernv/pci.h      |  3 ++
>   2 files changed, 86 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1f7d985..bfe69f1 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>   	}
>   }
>
> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
> +{
> +	struct resource *r;
> +	int index;
> +
> +	/*
> +	 * There are 16 M64 BARs, each of which has 8 segments. So
> +	 * there are as many M64 segments as the maximum number of
> +	 * PEs, which is 128.
> +	 */
> +	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
> +		unsigned long base, segsz = phb->ioda.m64_segsize;
> +		int64_t rc;
> +
> +		base = phb->ioda.m64_base +
> +		       index * PNV_IODA1_M64_SEGS * segsz;
> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
> +				OPAL_M64_WINDOW_TYPE, index, base, 0,
> +				PNV_IODA1_M64_SEGS * segsz);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, index);
> +			goto fail;
> +		}
> +
> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
> +				OPAL_M64_WINDOW_TYPE, index,
> +				OPAL_ENABLE_M64_SPLIT);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, index);
> +			goto fail;
> +		}
> +	}
> +
> +	/*
> +	 * Exclude the segment used by the reserved PE, which
> +	 * is expected to be 0 or last supported PE#.
> +	 */
> +	r = &phb->hose->mem_resources[1];
> +	if (phb->ioda.reserved_pe_idx == 0)
> +		r->start += phb->ioda.m64_segsize;
> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
> +		r->end -= phb->ioda.m64_segsize;
> +	else
> +		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
> +			phb->ioda.reserved_pe_idx);
> +
> +	return 0;
> +
> +fail:
> +	for ( ; index >= 0; index--)
> +		opal_pci_phb_mmio_enable(phb->opal_id,
> +			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
> +
> +	return -EIO;
> +}
> +
>   static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>   				    unsigned long *pe_bitmap,
>   				    bool all)
> @@ -325,6 +383,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   			pe->master = master_pe;
>   			list_add_tail(&pe->list, &master_pe->slaves);
>   		}
> +
> +		/*
> +		 * P7IOC supports M64DT, which helps mapping M64 segment
> +		 * to one particular PE#. However, PHB3 has fixed mapping
> +		 * between M64 segment and PE#. In order to have same logic
> +		 * for P7IOC and PHB3, we enforce fixed mapping between M64
> +		 * segment and PE# on P7IOC.
> +		 */
> +		if (phb->type == PNV_PHB_IODA1) {
> +			int64_t rc;
> +
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +					pe->pe_number, OPAL_M64_WINDOW_TYPE,
> +					pe->pe_number / PNV_IODA1_M64_SEGS,
> +					pe->pe_number % PNV_IODA1_M64_SEGS);
> +			if (rc != OPAL_SUCCESS)
> +				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
> +					__func__, rc, phb->hose->global_number,
> +					pe->pe_number);
> +		}
>   	}
>
>   	kfree(pe_alloc);
> @@ -339,8 +417,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	const u32 *r;
>   	u64 pci_addr;
>
> -	/* FIXME: Support M64 for P7IOC */
> -	if (phb->type != PNV_PHB_IODA2) {
> +	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>   		pr_info("  Not support M64 window\n");
>   		return;
>   	}
> @@ -373,7 +450,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>
>   	/* Use last M64 BAR to cover M64 window */
>   	phb->ioda.m64_bar_idx = 15;
> -	phb->init_m64 = pnv_ioda2_init_m64;
> +	if (phb->type == PNV_PHB_IODA1)
> +		phb->init_m64 = pnv_ioda1_init_m64;
> +	else
> +		phb->init_m64 = pnv_ioda2_init_m64;
>   	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>   	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
Nit: the callbacks initialization does not seem to relate to parsing any 
window :) They could all go to where pnv_ioda_parse_m64_window() is called, 
no separate patch is needed.
>   }
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 671fd13..c4019ac 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -78,6 +78,9 @@ struct pnv_ioda_pe {
>   	struct list_head	list;
>   };
>
> +#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
> +#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
> +
>   #define PNV_PHB_FLAG_EEH	(1 << 0)
>
>   struct pnv_phb {
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption
  2015-11-04 13:12 ` [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption Gavin Shan
  2015-11-12  4:18   ` Daniel Axtens
@ 2015-11-16  8:01   ` Alexey Kardashevskiy
  2015-11-17  1:04     ` Gavin Shan
  1 sibling, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:01 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> As we track M32 segment consumption, this introduces an array to
> the PHB to track the mapping between M64 segment and PE number.
> The information is going to be used to find M64 segment from the
> PE number during PCI unplugging time in subsequent patches.
It would not hurt to put a few words about how we managed to live without 
such a mapping for M64 before but we needed mapping for M32.
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-04 13:12 ` [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC Gavin Shan
  2015-11-16  8:01   ` Alexey Kardashevskiy
@ 2015-11-16  8:02   ` Alexey Kardashevskiy
  2015-11-17  1:38     ` Gavin Shan
  2015-11-16  8:02   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:02 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This enables M64 window on P7IOC, which has been enabled on PHB3.
> Different from PHB3 where 16 M64 BARs are supported and each of
> them can be owned by one particular PE# exclusively or divided
> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
> of them are divided to 8 segments. So every P7IOC PHB supports
> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
> M64DT, indicating that one M64 segment can only be pinned to the
> fixed PE#. In order to have same code to support M64 on P7IOC and
> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
> of them is pinned to the fixed PE# by bypassing the function of
> M64DT. In turn, we just need different phb->init_m64() for P7IOC
> and PHB3 to support M64.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>   arch/powerpc/platforms/powernv/pci.h      |  3 ++
>   2 files changed, 86 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1f7d985..bfe69f1 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>   	}
>   }
>
> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
> +{
> +	struct resource *r;
> +	int index;
> +
> +	/*
> +	 * There are 16 M64 BARs, each of which has 8 segments. So
> +	 * there are as many M64 segments as the maximum number of
> +	 * PEs, which is 128.
> +	 */
> +	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
> +		unsigned long base, segsz = phb->ioda.m64_segsize;
> +		int64_t rc;
> +
> +		base = phb->ioda.m64_base +
> +		       index * PNV_IODA1_M64_SEGS * segsz;
> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
> +				OPAL_M64_WINDOW_TYPE, index, base, 0,
> +				PNV_IODA1_M64_SEGS * segsz);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, index);
> +			goto fail;
> +		}
> +
> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
> +				OPAL_M64_WINDOW_TYPE, index,
> +				OPAL_ENABLE_M64_SPLIT);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, index);
> +			goto fail;
> +		}
> +	}
> +
> +	/*
> +	 * Exclude the segment used by the reserved PE, which
> +	 * is expected to be 0 or last supported PE#.
> +	 */
> +	r = &phb->hose->mem_resources[1];
What does "1" mean here? A bridge's 64bit prefetchable window?
> +	if (phb->ioda.reserved_pe_idx == 0)
> +		r->start += phb->ioda.m64_segsize;
> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
> +		r->end -= phb->ioda.m64_segsize;
> +	else
> +		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
> +			phb->ioda.reserved_pe_idx);
> +
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-04 13:12 ` [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC Gavin Shan
  2015-11-16  8:01   ` Alexey Kardashevskiy
  2015-11-16  8:02   ` Alexey Kardashevskiy
@ 2015-11-16  8:02   ` Alexey Kardashevskiy
  2015-11-17  1:42     ` Gavin Shan
  2 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-16  8:02 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This enables M64 window on P7IOC, which has been enabled on PHB3.
> Different from PHB3 where 16 M64 BARs are supported and each of
> them can be owned by one particular PE# exclusively or divided
> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
> of them are divided to 8 segments. So every P7IOC PHB supports
> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
> M64DT, indicating that one M64 segment can only be pinned to the
> fixed PE#. In order to have same code to support M64 on P7IOC and
> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
> of them is pinned to the fixed PE# by bypassing the function of
> M64DT. In turn, we just need different phb->init_m64() for P7IOC
> and PHB3 to support M64.
I thought we decided (Ben suggested?) not to push P7IOC code now (or ever) 
as there is no user for it, has this changed?
btw please put ioda1/ioda2/p7ioc/etc to the subject line to make it easier 
to see how much work is there about particular PHB type. You rename quite 
many functions and I generally want to ask you to group all renaming 
patches first but it would also make sense to keep them close to (for 
example) p7ioc-related patches so having more descriptive subject lines may 
help. Thanks.
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption
  2015-11-04 13:12 ` [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption Gavin Shan
@ 2015-11-17  0:28   ` Daniel Axtens
  2015-11-17  1:55     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-17  0:28 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 5378 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> Similar to the mechanism tracking consumed IO/M32/M64 segments,
> this introduces an array for each PHB to track the consumed DMA32
> segments, which are going to be released on PCI unplugging time.
> The index of the array is the DMA32 segment number while the value
> stored in the element is the assigned PE number.
>
> +	/* Setup TCE32 segment mapping */
Do you mean DMA32 rather than TCE32?
> +	for (i = base; i < base + segs; i++)
> +		phb->ioda.dma32_segmap[i] = pe->pe_number;
> +
I'm pretty sure this is right, but can you just confirm that you
intended to index into the array starting at base and going to base + segs,
and not going from 0 to segs? (i.e. not dma32_segmap[i - base]).
Otherwise looks good.
Regards,
Daniel
>  	/* Setup linux iommu table */
>  	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>  				  base * PNV_IODA1_DMA32_SEGSIZE,
> @@ -2378,13 +2382,13 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>  	 * then we assign at least one segment per PE, plus more based
>  	 * on the amount of devices under that PE
>  	 */
> -	if (dma_pe_count > phb->ioda.tce32_count)
> +	if (dma_pe_count > phb->ioda.dma32_count)
>  		residual = 0;
>  	else
> -		residual = phb->ioda.tce32_count - dma_pe_count;
> +		residual = phb->ioda.dma32_count - dma_pe_count;
>  
>  	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
> -		hose->global_number, phb->ioda.tce32_count);
> +		hose->global_number, phb->ioda.dma32_count);
>  	pr_info("PCI: %d PE# for a total weight of %d\n",
>  		dma_pe_count, total_weight);
>  
> @@ -2394,7 +2398,7 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>  	 * out one base segment plus any residual segments based on
>  	 * weight
>  	 */
> -	remaining = phb->ioda.tce32_count;
> +	remaining = phb->ioda.dma32_count;
>  	base = 0;
>  	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>  		weight = pnv_pci_ioda_pe_dma_weight(pe);
> @@ -3094,7 +3098,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  {
>  	struct pci_controller *hose;
>  	struct pnv_phb *phb;
> -	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
> +	unsigned long size, m64map_off, m32map_off, pemap_off;
> +	unsigned long iomap_off = 0, dma32map_off = 0;
>  	const __be64 *prop64;
>  	const __be32 *prop32;
>  	int i, len;
> @@ -3177,6 +3182,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
>  	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>  
> +	/* Calculate how many 32-bit TCE segments we have */
> +	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
> +				PNV_IODA1_DMA32_SEGSIZE;
> +
>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>  	m64map_off = size;
> @@ -3186,6 +3195,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  	if (phb->type == PNV_PHB_IODA1) {
>  		iomap_off = size;
>  		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
> +		dma32map_off = size;
> +		size += phb->ioda.dma32_count *
> +			sizeof(phb->ioda.dma32_segmap[0]);
>  	}
>  	pemap_off = size;
>  	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
> @@ -3201,6 +3213,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  		phb->ioda.io_segmap = aux + iomap_off;
>  		for (i = 0; i < phb->ioda.total_pe_num; i++)
>  			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
> +
> +		phb->ioda.dma32_segmap = aux + dma32map_off;
> +		for (i = 0; i < phb->ioda.dma32_count; i++)
> +			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>  	}
>  	phb->ioda.pe_array = aux + pemap_off;
>  	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
> @@ -3208,10 +3224,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  	INIT_LIST_HEAD(&phb->ioda.pe_list);
>  	mutex_init(&phb->ioda.pe_list_mutex);
>  
> -	/* Calculate how many 32-bit TCE segments we have */
> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
> -				PNV_IODA1_DMA32_SEGSIZE;
> -
>  #if 0 /* We should really do that ... */
>  	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>  					 window_type,
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 2038ef2..0802fcd 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -148,6 +148,10 @@ struct pnv_phb {
>  			int			*m32_segmap;
>  			int			*io_segmap;
>  
> +			/* DMA32 segment maps - IODA1 only */
> +			unsigned long		dma32_count;
> +			int			*dma32_segmap;
> +
>  			/* IRQ chip */
>  			int			irq_chip_init;
>  			struct irq_chip		irq_chip;
> @@ -164,9 +168,6 @@ struct pnv_phb {
>  			 */
>  			unsigned char		pe_rmap[0x10000];
>  
> -			/* 32-bit TCE tables allocation */
> -			unsigned long		tce32_count;
> -
>  			/* TCE cache invalidate registers (physical and
>  			 * remapped)
>  			 */
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity
  2015-11-04 13:12 ` [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity Gavin Shan
@ 2015-11-17  0:29   ` Daniel Axtens
  2015-11-17  1:56     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-17  0:29 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> Each PHB maintains an array helping to translate 2-bytes Request
> ID (RID) to PE# with the assumption that PE# takes one byte, meaning
> that we can't have more than 256 PEs. However, pci_dn->pe_number
> already had 4-bytes for the PE#.
>
> This extends the PE# capacity so that each of them will be 4-bytes
> long. Then we can reuse IODA_INVALID_PE to check the PE# stored in
> phb->pe_rmap[] is valid or not.
Just for clarity, could you make it clear in the commit message that
you're increasing the PE# capacity _in the PHB_? I just found it a bit
confusing the first time I read it.
With that clarified I'll be happy to add my reviewed-by tag.
Regards,
Daniel
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
>  arch/powerpc/platforms/powernv/pci.h      | 7 ++-----
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 0e66c4d..ef93a01 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -766,7 +766,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>  
>  	/* Clear the reverse map */
>  	for (rid = pe->rid; rid < rid_end; rid++)
> -		phb->ioda.pe_rmap[rid] = 0;
> +		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>  
>  	/* Release from all parents PELT-V */
>  	while (parent) {
> @@ -3164,6 +3164,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  	if (prop32)
>  		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>  
> +	/* Invalidate RID to PE# mapping */
> +	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
> +		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
> +
>  	/* Parse 64-bit MMIO range */
>  	pnv_ioda_parse_m64_window(phb);
>  
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 0802fcd..5df945f 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -162,11 +162,8 @@ struct pnv_phb {
>  			struct list_head	pe_list;
>  			struct mutex            pe_list_mutex;
>  
> -			/* Reverse map of PEs, will have to extend if
> -			 * we are to support more than 256 PEs, indexed
> -			 * bus { bus, devfn }
> -			 */
> -			unsigned char		pe_rmap[0x10000];
> +			/* Reverse map of PEs, indexed by {bus, devfn} */
> +			int			pe_rmap[0x10000];
>  
>  			/* TCE cache invalidate registers (physical and
>  			 * remapped)
> -- 
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe()
  2015-11-04 13:12 ` [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe() Gavin Shan
@ 2015-11-17  0:30   ` Daniel Axtens
  2015-11-17  1:58     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Daniel Axtens @ 2015-11-17  0:30 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, aik, bhelgaas, grant.likely,
	robherring2, panto, frowand.list, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]
Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
> This introduces pnv_ioda_init_pe() to initialize the specified PE
> instance (phb->ioda.pe_array[x]). It's used by pnv_ioda_alloc_pe()
> and pnv_ioda_reserve_pe(). No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index ef93a01..488e0f8 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -129,6 +129,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>  		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>  }
>  
> +static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
> +{
> +	phb->ioda.pe_array[pe_no].phb = phb;
> +	phb->ioda.pe_array[pe_no].pe_number = pe_no;
> +
> +	return &phb->ioda.pe_array[pe_no];
You have the function returning the newly initalized PE here...
> +}
> +
>  static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>  {
>  	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
> @@ -141,8 +149,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>  		pr_debug("%s: PE %d was reserved on PHB#%x\n",
>  			 __func__, pe_no, phb->hose->global_number);
>  
> -	phb->ioda.pe_array[pe_no].phb = phb;
> -	phb->ioda.pe_array[pe_no].pe_number = pe_no;
> +	pnv_ioda_init_pe(phb, pe_no);
... but then you ignore the result here and in the other function you've
modified.
It looks like you're using the result in the next patch though, so I
wonder if you would be better to merge this patch with the next
one. However, as I said before I'll defer to Alexey on decisions about
how to split the patch series if he has a different opinion.
Regards,
Daniel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2015-11-16  8:01       ` Alexey Kardashevskiy
@ 2015-11-17  0:54         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  0:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, Daniel Axtens, linuxppc-dev, linux-pci, devicetree,
	benh, mpe, bhelgaas, grant.likely, robherring2, panto,
	frowand.list
On Mon, Nov 16, 2015 at 07:01:18PM +1100, Alexey Kardashevskiy wrote:
>On 11/06/2015 10:52 AM, Gavin Shan wrote:
>>On Fri, Nov 06, 2015 at 09:56:06AM +1100, Daniel Axtens wrote:
>>>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>>>
>>>>The original implementation of pnv_ioda_setup_pe_seg() configures
>>>>IO and M32 segments by separate logics, which can be merged by
>>>>by caching @segmap, @seg_size, @win in advance. This shouldn't
>>>>cause any behavioural changes.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++++++++++++++-----------------
>>>>  1 file changed, 28 insertions(+), 34 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 7ee7cfe..553d3f3 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -2752,8 +2752,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>  	struct pci_bus_region region;
>>>>  	struct resource *res;
>>>>-	int i, index;
>>>>-	int rc;
>>>>+	unsigned int segsize;
>>>>+	int *segmap, index, i;
>>>>+	uint16_t win;
>>>>+	int64_t rc;
>>>
>>>Good catch! Opal return codes are 64 bit and that should be explicit
>>>in the type. However, I seem to remember that we preferred a different
>>>type for 64 bit ints in the kernel. I think it's s64, and there are some
>>>other uses of that in pci_ioda.c for return codes.
>>>
>>
>>Both int64_t and s64 are fine. I used s64 for the OPAL return value, but
>>Alexey likes "int64_t", which is ok to me as well. I won't change it back
>>to s64 :-)
>>
>>>(I'm actually surprised that's not picked up as a compiler
>>>warning. Maybe that's something to look at in future.)
>>>
>>
>>Indeed, I didn't see a warning from gcc.
>>
>>>The rest of the patch looks good on casual inspection - to be sure I'll
>>>test the entire series on a machine. (hopefully, time permitting!)
>>>
>>
>>I run scripts/checkpatch.pl on the patchset. Only one warning came from
>>[PATCH 44/50], but I won't bother to change that as the warning was
>>brought by original code.
>
>None of these patches failed checkpatch.pl check, what was the error in 44/50?
>
You're right that none of those patches failed with checkpatch.pl. I had revison
6.1, 6.2, 6.3 before this revision (v7) was posted. There was a warning for 44/50
in 6.3 (perhaps).
I run checkpatch.pl on all (v7) patches, no warning reported.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption
  2015-11-16  8:01   ` Alexey Kardashevskiy
@ 2015-11-17  1:04     ` Gavin Shan
  2015-11-19  0:10       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Mon, Nov 16, 2015 at 07:01:59PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>As we track M32 segment consumption, this introduces an array to
>>the PHB to track the mapping between M64 segment and PE number.
>>The information is going to be used to find M64 segment from the
>>PE number during PCI unplugging time in subsequent patches.
>
>
>It would not hurt to put a few words about how we managed to live without
>such a mapping for M64 before but we needed mapping for M32.
>
The M32 mapping (phb->ioda.m32_segmap[]) isn't used for anything before
this patcheset. There're no need for M64 mapping before this patchset
similarly, no need to add the words.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3
  2015-11-04 13:12 ` [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3 Gavin Shan
@ 2015-11-17  1:07   ` Alexey Kardashevskiy
  2015-11-17  8:48     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  1:07 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> In pnv_ioda_setup_dma(), it's unnecessary to calculate the DMA32
> segments for PEs on PHB3 as the whole available DMA32 space can
> be assigned to one specific PE on PHB3.
>
> This splits pnv_ioda_setup_dma() to pnv_pci_ioda1_setup_dma() and
> pnv_pci_ioda2_setup_dma() in order to avoid calculating DMA32
> segments for PEs on PHB3. No logical changes introduced.
This patch is not needed as
[PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation
moves this calculation to another place (which already makes this patch 
unnecessary) and
[PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
removes just introduced pnv_pci_ioda1_setup_dma() - if you remove it, then 
there is no point in fixing it in the first place.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 41 ++++++++++++++++++-------------
>   1 file changed, 24 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 5a08e20..4c2e023 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2383,7 +2383,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>   }
>
> -static void pnv_ioda_setup_dma(struct pnv_phb *phb)
> +static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>   {
>   	struct pci_controller *hose = phb->hose;
>   	unsigned int residual, remaining, segs, tw, base;
> @@ -2428,26 +2428,30 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   				segs = remaining;
>   		}
>
> -		/*
> -		 * For IODA2 compliant PHB3, we needn't care about the weight.
> -		 * The all available 32-bits DMA space will be assigned to
> -		 * the specific PE.
> -		 */
> -		if (phb->type == PNV_PHB_IODA1) {
> -			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
> -				pe->dma_weight, segs);
> -			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
> -		} else {
> -			pe_info(pe, "Assign DMA32 space\n");
> -			segs = 0;
> -			pnv_pci_ioda2_setup_dma_pe(phb, pe);
> -		}
> +		pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
> +			pe->dma_weight, segs);
> +		pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>
>   		remaining -= segs;
>   		base += segs;
>   	}
>   }
>
> +static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
> +{
> +	struct pnv_ioda_pe *pe;
> +
> +	pnv_pci_ioda_setup_opal_tce_kill(phb);
> +
> +	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> +		if (!pe->dma_weight)
> +			continue;
> +
> +		pe_info(pe, "Assign DMA32 space\n");
> +		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> +	}
> +}
> +
>   #ifdef CONFIG_PCI_MSI
>   static void pnv_ioda2_msi_eoi(struct irq_data *d)
>   {
> @@ -2931,10 +2935,13 @@ static void pnv_pci_ioda_setup_DMA(void)
>   	struct pnv_phb *phb;
>
>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		pnv_ioda_setup_dma(hose->private_data);
> +		phb = hose->private_data;
> +		if (phb->type == PNV_PHB_IODA1)
> +			pnv_pci_ioda1_setup_dma(phb);
> +		else
> +			pnv_pci_ioda2_setup_dma(phb);
>
>   		/* Mark the PHB initialization done */
> -		phb = hose->private_data;
>   		phb->initialized = 1;
>   	}
>   }
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 08/50] powerpc/powernv: Rename PE# fields in struct pnv_phb
  2015-11-16  8:01   ` Alexey Kardashevskiy
@ 2015-11-17  1:22     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:22 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Mon, Nov 16, 2015 at 07:01:06PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This renames the fields related to PE number in "struct pnv_phb"
>>for better reflecting of their usages as Alexey suggested. No
>>logical changes introduced.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c    | 56 ++++++++++++++--------------
>>  arch/powerpc/platforms/powernv/pci.c         |  2 +-
>>  arch/powerpc/platforms/powernv/pci.h         |  4 +-
>>  4 files changed, 32 insertions(+), 32 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index e1c9072..861a7d2 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
>>  		 * and P7IOC separately. So we should regard
>>  		 * PE#0 as valid for PHB3 and P7IOC.
>>  		 */
>>-		if (phb->ioda.reserved_pe != 0)
>>+		if (phb->ioda.reserved_pe_idx != 0)
>>  			eeh_add_flag(EEH_VALID_PE_ZERO);
>>
>>  		break;
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 968da91..b4932c3 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -134,7 +134,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>
>>  static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>  {
>>-	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
>>+	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
>>  		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>>  			__func__, pe_no, phb->hose->global_number);
>>  		return;
>>@@ -154,8 +154,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>
>>  	do {
>>  		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>-					phb->ioda.total_pe, 0);
>>-		if (pe >= phb->ioda.total_pe)
>>+					phb->ioda.total_pe_num, 0);
>>+		if (pe >= phb->ioda.total_pe_num)
>>  			return IODA_INVALID_PE;
>>  	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>
>>@@ -209,13 +209,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>>  	 * expected to be 0 or last one of PE capabicity.
>>  	 */
>>  	r = &phb->hose->mem_resources[1];
>>-	if (phb->ioda.reserved_pe == 0)
>>+	if (phb->ioda.reserved_pe_idx == 0)
>>  		r->start += phb->ioda.m64_segsize;
>>-	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
>>+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>  		r->end -= phb->ioda.m64_segsize;
>>  	else
>>  		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
>>-			phb->ioda.reserved_pe);
>>+			phb->ioda.reserved_pe_idx);
>>
>>  	return 0;
>>
>>@@ -284,7 +284,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>  		return IODA_INVALID_PE;
>>
>>  	/* Allocate bitmap */
>>-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>  	pe_alloc = kzalloc(size, GFP_KERNEL);
>>  	if (!pe_alloc) {
>>  		pr_warn("%s: Out of memory !\n",
>>@@ -300,7 +300,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	 * contributed by its child buses. For the case, we needn't
>>  	 * pick M64 dependent PE#.
>>  	 */
>>-	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
>>+	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>>  		kfree(pe_alloc);
>>  		return IODA_INVALID_PE;
>>  	}
>>@@ -311,8 +311,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	 */
>>  	master_pe = NULL;
>>  	i = -1;
>>-	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
>>-		phb->ioda.total_pe) {
>>+	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
>>+		phb->ioda.total_pe_num) {
>>  		pe = &phb->ioda.pe_array[i];
>>
>>  		if (!master_pe) {
>>@@ -364,7 +364,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>  	hose->mem_offset[1] = res->start - pci_addr;
>>
>>  	phb->ioda.m64_size = resource_size(res);
>>-	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
>>+	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
>>  	phb->ioda.m64_base = pci_addr;
>>
>>  	pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
>>@@ -465,7 +465,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>>  	s64 rc;
>>
>>  	/* Sanity check on PE number */
>>-	if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
>>+	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
>>  		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
>>
>>  	/*
>>@@ -1394,9 +1394,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  		} else {
>>  			mutex_lock(&phb->ioda.pe_alloc_mutex);
>>  			*pdn->pe_num_map = bitmap_find_next_zero_area(
>>-				phb->ioda.pe_alloc, phb->ioda.total_pe,
>>+				phb->ioda.pe_alloc, phb->ioda.total_pe_num,
>>  				0, num_vfs, 0);
>>-			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
>>+			if (*pdn->pe_num_map >= phb->ioda.total_pe_num) {
>>  				mutex_unlock(&phb->ioda.pe_alloc_mutex);
>>  				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>>  				kfree(pdn->pe_num_map);
>>@@ -2670,7 +2670,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  	pdn->m64_single_mode = false;
>>
>>  	total_vfs = pci_sriov_get_totalvfs(pdev);
>>-	mul = phb->ioda.total_pe;
>>+	mul = phb->ioda.total_pe_num;
>>  	total_vf_bar_sz = 0;
>>
>>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>@@ -2772,7 +2772,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  			region.end   = res->end - phb->ioda.io_pci_base;
>>  			index = region.start / phb->ioda.io_segsize;
>>
>>-			while (index < phb->ioda.total_pe &&
>>+			while (index < phb->ioda.total_pe_num &&
>>  			       region.start <= region.end) {
>>  				phb->ioda.io_segmap[index] = pe->pe_number;
>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>@@ -2797,7 +2797,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  				       phb->ioda.m32_pci_base;
>>  			index = region.start / phb->ioda.m32_segsize;
>>
>>-			while (index < phb->ioda.total_pe &&
>>+			while (index < phb->ioda.total_pe_num &&
>>  			       region.start <= region.end) {
>>  				phb->ioda.m32_segmap[index] = pe->pe_number;
>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>@@ -3067,13 +3067,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  		pr_err("  Failed to map registers !\n");
>>
>>  	/* Initialize more IODA stuff */
>>-	phb->ioda.total_pe = 1;
>>+	phb->ioda.total_pe_num = 1;
>>  	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
>>  	if (prop32)
>>-		phb->ioda.total_pe = be32_to_cpup(prop32);
>>+		phb->ioda.total_pe_num = be32_to_cpup(prop32);
>>  	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
>>  	if (prop32)
>>-		phb->ioda.reserved_pe = be32_to_cpup(prop32);
>>+		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>
>
>It is not related to the patch but you initialize total_pe to 1 before
>checking the device tree (which is ok) but you do not initialize reserved_pe
>and I cannot find where @phb would be zeroed - it is allocated by
>memblock_virt_alloc() which does not do that.
>
There is a call "memset(ptr, 0, size)" in memblock_virt_alloc_internal().
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources
  2015-11-16  8:01       ` Alexey Kardashevskiy
@ 2015-11-17  1:33         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, Daniel Axtens, linuxppc-dev, linux-pci, devicetree,
	benh, mpe, bhelgaas, grant.likely, robherring2, panto,
	frowand.list
On Mon, Nov 16, 2015 at 07:01:43PM +1100, Alexey Kardashevskiy wrote:
>On 11/12/2015 03:55 PM, Gavin Shan wrote:
>>On Thu, Nov 12, 2015 at 02:30:27PM +1100, Daniel Axtens wrote:
>>>Hi Gavin,
>>>
>>>Sorry to have taken so long to resume these reviews!
>>>
>>
>>Thanks for your review, Daniel!
>>
>>>>Currently, the IO and M32 segments are mapped to the corresponding
>>>>PE based on the windows of the parent bridge of PE's primary bus.
>>>>It's not going to work when the windows of root port or upstream
>>>>port of the PCIe switch behind root port are extended to PHB's
>>>>aperatuses in order to support hotplug in subsequent patch.
>>>I'm not _entirely_ sure I understand this.
>>>
>>>I *think* you mean PHB's apertures (i.e. s/aperatuses/apertures/)?
>>>
>>
>>I'll fix the typo in next revision.
>>
>>>>This fixes the issue by mapping IO and M32 segments based on the
>>>>resources of the PCI devices included in the PE, instead of the
>>>>windows of the parent bridge of the PE's primary bus.
>>>
>>>This solution seems to make a lot of sense, but I don't have a very good
>>>understanding of PCI yet: why was it done that way and not this way
>>>originally? Looking at the code, it looks like the old way was simple
>>>but didn't support SR-IOV?
>>>
>>
>>It's not related to SRIOV. Originally, the IO or M32 segments are mapped
>>according to the bridge's windows.
>
>
>Sorry, I do not understand what this means...
>
Before this patchset, the IO or M32 segments consumed by one PE are mapped
according to the windows of PCI bridge of the PE's primary bus if the PE
isn't including all subordinate PCI buses orginated from the bridge. Otherwise,
the bridge's windows should be taken into account as well.
>
>>The bridge windows on root port or the
>>upstream port of the switch behind that will be extended to PHB's apertures.
>
>What does "extended" mean here and why would the windows be extended anyway?
>
It's reserving IO or memory resource for possible plugged adapters in the future.
>>If we still use bridge's windows, all IO and M32 resources are mapped/assigned
>>to the PE corresponding to PCI bus#1 or PCI bus#2. That's not correct any more.
>>So the correct way is to do the mapping based on IO or M32 BARs of the devices
>>included in the PE.
>
>In this patch I see quite a lot of code movements and I fail to spot the
>actual change here...
>
>
>It used to be a single loop:
>
>pci_bus_for_each_resource(pe->pbus, res, i) {
>	/* do stuff for each @res */
>}
>
>and now it is 2 loops inside another loop:
>
>list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
>	for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>		res = &pdev->resource[i];
>		/* do stuff for each @res */
>	}
>
>	for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
>	        res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
>		/* do stuff for each @res */
>	}
>}
>
>
>Is that correct? If yes, why is not the patch as simple as this? If no, what
>did I miss?
>
That's correct. The resource tracked by the PCI bridge windows can belonged
to another PE instead the one tracked by @pe in the code.
>
>
>>
>>>There are a few comments inline as well.
>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 553d3f3..4ab93f8 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -2741,71 +2741,90 @@ truncate_iov:
>>>>  }
>>>>  #endif /* CONFIG_PCI_IOV */
>>>>
>>>>-/*
>>>>- * This function is supposed to be called on basis of PE from top
>>>>- * to bottom style. So the the I/O or MMIO segment assigned to
>>>>- * parent PE could be overrided by its child PEs if necessary.
>>>>- */
>>>>-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>-				  struct pnv_ioda_pe *pe)
>>>>+static int pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
>>>>+				  struct resource *res)
>>>>  {
>>>>-	struct pnv_phb *phb = hose->private_data;
>>>>+	struct pnv_phb *phb = pe->phb;
>>>>  	struct pci_bus_region region;
>>>>-	struct resource *res;
>>>>-	unsigned int segsize;
>>>>-	int *segmap, index, i;
>>>>+	unsigned int index, segsize;
>>>>+	int *segmap;
>>>>  	uint16_t win;
>>>>  	int64_t rc;
>>>
>>>s/int64_t/s64/;
>>>I think we might also want to change the uint16_t as well.
>>>
>>
>>As I explained before, I changed it from s64 to int64_t and I won't change it
>>back since both of them are fine. Same situation to uint16 here. If we really
>>want to clean it all at once, I can do that later, but not in this patchset.
>>
>>>>-	/*
>>>>-	 * NOTE: We only care PCI bus based PE for now. For PCI
>>>>-	 * device based PE, for example SRIOV sensitive VF should
>>>>-	 * be figured out later.
>>>>-	 */
>>>>-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>>>+	if (!res->parent || !res->flags || res->start > res->end)
>>>>+		return 0;
>>>>
>>>>-	pci_bus_for_each_resource(pe->pbus, res, i) {
>>>>-		if (!res || !res->flags ||
>>>>-		    res->start > res->end)
>>>>-			continue;
>>>>+	if (res->flags & IORESOURCE_IO) {
>>>>+		region.start = res->start - phb->ioda.io_pci_base;
>>>>+		region.end   = res->end - phb->ioda.io_pci_base;
>>>>+		segsize      = phb->ioda.io_segsize;
>>>>+		segmap       = phb->ioda.io_segmap;
>>>>+		win          = OPAL_IO_WINDOW_TYPE;
>>>>+	} else if ((res->flags & IORESOURCE_MEM) &&
>>>>+		   !pnv_pci_is_mem_pref_64(res->flags)) {
>>>>+		region.start = res->start -
>>>>+			       phb->hose->mem_offset[0] -
>>>>+			       phb->ioda.m32_pci_base;
>>>>+		region.end   = res->end -
>>>>+			       phb->hose->mem_offset[0] -
>>>>+			       phb->ioda.m32_pci_base;
>>>>+		segsize      = phb->ioda.m32_segsize;
>>>>+		segmap       = phb->ioda.m32_segmap;
>>>>+		win          = OPAL_M32_WINDOW_TYPE;
>
>
>
>This code asks for a helper function
>
>pnv_ioda_do_setup_one_res(start, end, segsize, segmap, win)
>
>and then you won't need many local variables (region, segsize, segmap, win) ;)
>
Sounds like a good idea. I'll change accordingly in next revision.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-16  8:01   ` Alexey Kardashevskiy
@ 2015-11-17  1:37     ` Gavin Shan
  2015-11-19  0:18       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Mon, Nov 16, 2015 at 07:01:46PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>Different from PHB3 where 16 M64 BARs are supported and each of
>>them can be owned by one particular PE# exclusively or divided
>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>of them are divided to 8 segments. So every P7IOC PHB supports
>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>M64DT, indicating that one M64 segment can only be pinned to the
>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>of them is pinned to the fixed PE# by bypassing the function of
>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>and PHB3 to support M64.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>  arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>  2 files changed, 86 insertions(+), 3 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1f7d985..bfe69f1 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>  	}
>>  }
>>
>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>+{
>>+	struct resource *r;
>>+	int index;
>>+
>>+	/*
>>+	 * There are 16 M64 BARs, each of which has 8 segments. So
>>+	 * there are as many M64 segments as the maximum number of
>>+	 * PEs, which is 128.
>>+	 */
>>+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>+		unsigned long base, segsz = phb->ioda.m64_segsize;
>>+		int64_t rc;
>>+
>>+		base = phb->ioda.m64_base +
>>+		       index * PNV_IODA1_M64_SEGS * segsz;
>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>+				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>+				PNV_IODA1_M64_SEGS * segsz);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, index);
>>+			goto fail;
>>+		}
>>+
>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>+				OPAL_M64_WINDOW_TYPE, index,
>>+				OPAL_ENABLE_M64_SPLIT);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, index);
>>+			goto fail;
>>+		}
>>+	}
>>+
>>+	/*
>>+	 * Exclude the segment used by the reserved PE, which
>>+	 * is expected to be 0 or last supported PE#.
>>+	 */
>>+	r = &phb->hose->mem_resources[1];
>>+	if (phb->ioda.reserved_pe_idx == 0)
>>+		r->start += phb->ioda.m64_segsize;
>>+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>+		r->end -= phb->ioda.m64_segsize;
>>+	else
>>+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>+			phb->ioda.reserved_pe_idx);
>>+
>>+	return 0;
>>+
>>+fail:
>>+	for ( ; index >= 0; index--)
>>+		opal_pci_phb_mmio_enable(phb->opal_id,
>>+			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
>>+
>>+	return -EIO;
>>+}
>>+
>>  static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>  				    unsigned long *pe_bitmap,
>>  				    bool all)
>>@@ -325,6 +383,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  			pe->master = master_pe;
>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>  		}
>>+
>>+		/*
>>+		 * P7IOC supports M64DT, which helps mapping M64 segment
>>+		 * to one particular PE#. However, PHB3 has fixed mapping
>>+		 * between M64 segment and PE#. In order to have same logic
>>+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>+		 * segment and PE# on P7IOC.
>>+		 */
>>+		if (phb->type == PNV_PHB_IODA1) {
>>+			int64_t rc;
>>+
>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+					pe->pe_number, OPAL_M64_WINDOW_TYPE,
>>+					pe->pe_number / PNV_IODA1_M64_SEGS,
>>+					pe->pe_number % PNV_IODA1_M64_SEGS);
>>+			if (rc != OPAL_SUCCESS)
>>+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>+					__func__, rc, phb->hose->global_number,
>>+					pe->pe_number);
>>+		}
>>  	}
>>
>>  	kfree(pe_alloc);
>>@@ -339,8 +417,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>  	const u32 *r;
>>  	u64 pci_addr;
>>
>>-	/* FIXME: Support M64 for P7IOC */
>>-	if (phb->type != PNV_PHB_IODA2) {
>>+	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>>  		pr_info("  Not support M64 window\n");
>>  		return;
>>  	}
>>@@ -373,7 +450,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>
>>  	/* Use last M64 BAR to cover M64 window */
>>  	phb->ioda.m64_bar_idx = 15;
>>-	phb->init_m64 = pnv_ioda2_init_m64;
>>+	if (phb->type == PNV_PHB_IODA1)
>>+		phb->init_m64 = pnv_ioda1_init_m64;
>>+	else
>>+		phb->init_m64 = pnv_ioda2_init_m64;
>>  	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>  	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>
>
>Nit: the callbacks initialization does not seem to relate to parsing any
>window :) They could all go to where pnv_ioda_parse_m64_window() is called,
>no separate patch is needed.
>
Well, what's the benifit for that? I personally prefer the way I had: initialize
all callbacks in one place, not in separate places. However, if you have good
reason to support your suggestion, I'll change accordingly for sure.
>
>>  }
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 671fd13..c4019ac 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -78,6 +78,9 @@ struct pnv_ioda_pe {
>>  	struct list_head	list;
>>  };
>>
>>+#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>+#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>+
>>  #define PNV_PHB_FLAG_EEH	(1 << 0)
>>
>>  struct pnv_phb {
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-16  8:02   ` Alexey Kardashevskiy
@ 2015-11-17  1:38     ` Gavin Shan
  2015-11-17  2:11       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Mon, Nov 16, 2015 at 07:02:03PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>Different from PHB3 where 16 M64 BARs are supported and each of
>>them can be owned by one particular PE# exclusively or divided
>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>of them are divided to 8 segments. So every P7IOC PHB supports
>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>M64DT, indicating that one M64 segment can only be pinned to the
>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>of them is pinned to the fixed PE# by bypassing the function of
>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>and PHB3 to support M64.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>  arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>  2 files changed, 86 insertions(+), 3 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1f7d985..bfe69f1 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>  	}
>>  }
>>
>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>+{
>>+	struct resource *r;
>>+	int index;
>>+
>>+	/*
>>+	 * There are 16 M64 BARs, each of which has 8 segments. So
>>+	 * there are as many M64 segments as the maximum number of
>>+	 * PEs, which is 128.
>>+	 */
>>+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>+		unsigned long base, segsz = phb->ioda.m64_segsize;
>>+		int64_t rc;
>>+
>>+		base = phb->ioda.m64_base +
>>+		       index * PNV_IODA1_M64_SEGS * segsz;
>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>+				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>+				PNV_IODA1_M64_SEGS * segsz);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, index);
>>+			goto fail;
>>+		}
>>+
>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>+				OPAL_M64_WINDOW_TYPE, index,
>>+				OPAL_ENABLE_M64_SPLIT);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, index);
>>+			goto fail;
>>+		}
>>+	}
>>+
>>+	/*
>>+	 * Exclude the segment used by the reserved PE, which
>>+	 * is expected to be 0 or last supported PE#.
>>+	 */
>>+	r = &phb->hose->mem_resources[1];
>
>
>What does "1" mean here? A bridge's 64bit prefetchable window?
>
It's PHB's M64 window.
>
>>+	if (phb->ioda.reserved_pe_idx == 0)
>>+		r->start += phb->ioda.m64_segsize;
>>+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>+		r->end -= phb->ioda.m64_segsize;
>>+	else
>>+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>+			phb->ioda.reserved_pe_idx);
>>+
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-16  8:02   ` Alexey Kardashevskiy
@ 2015-11-17  1:42     ` Gavin Shan
  2015-11-17  2:37       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Mon, Nov 16, 2015 at 07:02:18PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>Different from PHB3 where 16 M64 BARs are supported and each of
>>them can be owned by one particular PE# exclusively or divided
>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>of them are divided to 8 segments. So every P7IOC PHB supports
>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>M64DT, indicating that one M64 segment can only be pinned to the
>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>of them is pinned to the fixed PE# by bypassing the function of
>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>and PHB3 to support M64.
>
>I thought we decided (Ben suggested?) not to push P7IOC code now (or ever) as
>there is no user for it, has this changed?
>
Remember that the code is mixed for P7IOC/PHB3. It's not harmful to support
M64 window on P7IOC, which is much larger than M32.
>btw please put ioda1/ioda2/p7ioc/etc to the subject line to make it easier to
>see how much work is there about particular PHB type. You rename quite many
>functions and I generally want to ask you to group all renaming patches first
>but it would also make sense to keep them close to (for example)
>p7ioc-related patches so having more descriptive subject lines may help.
>Thanks.
>
As the code is mixed for P7IOC/PHB3, I'm not following the line (IODA1/IODA2/p7ioc/phb3)
in this patchset. Instead, the sequence of patchset is order related to: cod refactoring,
IO/M32/M64, DMA, PE allocation/releaseing.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list
  2015-11-04 13:12 ` [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list Gavin Shan
@ 2015-11-17  1:54   ` Alexey Kardashevskiy
  2015-11-17  2:01     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  1:54 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
> to their DMA32 weight. The PEs on the list are iterated to setup
> their TCE32 tables at system booting time. The list is used for
> once and there is no good reason for it to survive.
 From the above I concluded that you need a list, just do not need to keep 
after the configuration is done but in fact you remove the list completely 
so just remove "to survive" (s/for it to survive/for keep having it/) :)
>
> This moves the logic calculating DMA32 weight of PHB and PE to
> pnv_pci_ioda1_setup_dma() to drop PHB's DMA32 list.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 150 ++++++++++++++----------------
>   arch/powerpc/platforms/powernv/pci.h      |  19 ----
>   2 files changed, 68 insertions(+), 101 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 4c2e023..20ebe6e 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -891,44 +891,6 @@ out:
>   	return 0;
>   }
>
> -static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
> -				       struct pnv_ioda_pe *pe)
> -{
> -	struct pnv_ioda_pe *lpe;
> -
> -	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (lpe->dma_weight < pe->dma_weight) {
> -			list_add_tail(&pe->dma_link, &lpe->dma_link);
> -			return;
> -		}
> -	}
> -	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
> -}
> -
> -static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
> -{
> -	/* This is quite simplistic. The "base" weight of a device
> -	 * is 10. 0 means no DMA is to be accounted for it.
> -	 */
> -
> -	/* If it's a bridge, no DMA */
> -	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> -		return 0;
> -
> -	/* Reduce the weight of slow USB controllers */
> -	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
> -	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
> -	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		return 3;
> -
> -	/* Increase the weight of RAID (includes Obsidian) */
> -	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> -		return 15;
> -
> -	/* Default */
> -	return 10;
> -}
> -
>   #ifdef CONFIG_PCI_IOV
>   static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>   {
> @@ -1009,7 +971,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   			continue;
>   		}
>   		pdn->pe_number = pe->pe_number;
> -		pe->dma_weight += pnv_ioda_dma_weight(dev);
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>   			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>   	}
> @@ -1046,10 +1007,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> -	pe->tce32_seg = -1;
>   	pe->mve_number = -1;
>   	pe->rid = bus->busn_res.start << 8;
> -	pe->dma_weight = 0;
>
>   	if (all)
>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
> @@ -1071,17 +1030,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	/* Put PE to the list */
>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
> -
> -	/* Account for one DMA PE if at least one DMA capable device exist
> -	 * below the bridge
> -	 */
> -	if (pe->dma_weight != 0) {
> -		phb->ioda.dma_weight += pe->dma_weight;
> -		phb->ioda.dma_pe_count++;
> -	}
> -
> -	/* Link the PE */
> -	pnv_ioda_link_pe_by_weight(phb, pe);
>   }
>
>   static void pnv_ioda_setup_PEs(struct pci_bus *bus)
> @@ -1389,7 +1337,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   		pe->flags = PNV_IODA_PE_VF;
>   		pe->pbus = NULL;
>   		pe->parent_dev = pdev;
> -		pe->tce32_seg = -1;
>   		pe->mve_number = -1;
>   		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>   			   pci_iov_virtfn_devfn(pdev, vf_index);
> @@ -1842,6 +1789,47 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
>   	.free = pnv_ioda2_table_free,
>   };
>
> +static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
> +{
> +	unsigned int *weight = (unsigned int *)data;
> +
> +	/* This is quite simplistic. The "base" weight of a device
> +	 * is 10. 0 means no DMA is to be accounted for it.
> +	 */
> +
> +	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> +		return 0;
> +
> +	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
> +	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
> +	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> +		*weight += 3;
> +	else if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> +		*weight += 15;
> +	else
> +		*weight += 10;
> +
> +	return 0;
> +}
> +
> +static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
> +{
> +	unsigned int weight = 0;
> +
> +	if ((pe->flags & PNV_IODA_PE_DEV) && pe->pdev) {
> +		pnv_pci_ioda_dev_dma_weight(pe->pdev, &weight);
> +	} else if ((pe->flags & PNV_IODA_PE_BUS) && pe->pbus) {
> +		struct pci_dev *pdev;
> +
> +		list_for_each_entry(pdev, &pe->pbus->devices, bus_list)
> +			pnv_pci_ioda_dev_dma_weight(pdev, &weight);
> +	} else if ((pe->flags & PNV_IODA_PE_BUS_ALL) && pe->pbus) {
> +		pci_walk_bus(pe->pbus, pnv_pci_ioda_dev_dma_weight, &weight);
> +	}
> +
> +	return weight;
> +}
> +
>   static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   				       struct pnv_ioda_pe *pe,
>   				       unsigned int base,
> @@ -1858,17 +1846,12 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>   	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>
> -	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->tce32_seg >= 0))
> -		return;
> -
>   	tbl = pnv_pci_table_alloc(phb->hose->node);
>   	iommu_register_group(&pe->table_group, phb->hose->global_number,
>   			pe->pe_number);
>   	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>
>   	/* Grab a 32-bit TCE table */
> -	pe->tce32_seg = base;
>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>   		base * PNV_IODA1_DMA32_SEGSIZE,
>   		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
> @@ -1932,8 +1915,6 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>   	return;
>    fail:
>   	/* XXX Failure: Try to fallback to 64-bit only ? */
> -	if (pe->tce32_seg >= 0)
> -		pe->tce32_seg = -1;
>   	if (tce_mem)
>   		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>   	if (tbl) {
> @@ -2344,10 +2325,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   {
>   	int64_t rc;
>
> -	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->tce32_seg >= 0))
> -		return;
> -
>   	/* TVE #1 is selected by PCI address bit 59 */
>   	pe->tce_bypass_base = 1ull << 59;
>
> @@ -2355,7 +2332,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   			pe->pe_number);
>
>   	/* The PE will reserve all possible 32-bits space */
> -	pe->tce32_seg = 0;
>   	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>   		phb->ioda.m32_pci_base);
>
> @@ -2371,11 +2347,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   #endif
>
>   	rc = pnv_pci_ioda2_setup_default_config(pe);
> -	if (rc) {
> -		if (pe->tce32_seg >= 0)
> -			pe->tce32_seg = -1;
> +	if (rc)
>   		return;
> -	}
>
>   	if (pe->flags & PNV_IODA_PE_DEV)
>   		iommu_add_device(&pe->pdev->dev);
> @@ -2386,24 +2359,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	unsigned int residual, remaining, segs, tw, base;
> +	unsigned int weight, total_weight, dma_pe_count;
> +	unsigned int residual, remaining, segs, base;
>   	struct pnv_ioda_pe *pe;
>
> +	total_weight = 0;
> +	dma_pe_count = 0;
> +	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +		weight = pnv_pci_ioda_pe_dma_weight(pe);
> +		if (weight > 0)
> +			dma_pe_count++;
> +
> +		total_weight += weight;
> +	}
> +
>   	/* If we have more PE# than segments available, hand out one
>   	 * per PE until we run out and let the rest fail. If not,
>   	 * then we assign at least one segment per PE, plus more based
>   	 * on the amount of devices under that PE
>   	 */
> -	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
> +	if (dma_pe_count > phb->ioda.tce32_count)
>   		residual = 0;
>   	else
> -		residual = phb->ioda.tce32_count -
> -			phb->ioda.dma_pe_count;
> +		residual = phb->ioda.tce32_count - dma_pe_count;
>
>   	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>   		hose->global_number, phb->ioda.tce32_count);
>   	pr_info("PCI: %d PE# for a total weight of %d\n",
> -		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
> +		dma_pe_count, total_weight);
>
>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> @@ -2412,24 +2395,26 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>   	 * weight
>   	 */
>   	remaining = phb->ioda.tce32_count;
> -	tw = phb->ioda.dma_weight;
>   	base = 0;
> -	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (!pe->dma_weight)
> +	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +		weight = pnv_pci_ioda_pe_dma_weight(pe);
> +		if (!weight)
>   			continue;
> +
Unrelated new line.
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption
  2015-11-17  0:28   ` Daniel Axtens
@ 2015-11-17  1:55     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:55 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 11:28:20AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> Similar to the mechanism tracking consumed IO/M32/M64 segments,
>> this introduces an array for each PHB to track the consumed DMA32
>> segments, which are going to be released on PCI unplugging time.
>> The index of the array is the DMA32 segment number while the value
>> stored in the element is the assigned PE number.
>>
>
>
>> +	/* Setup TCE32 segment mapping */
>Do you mean DMA32 rather than TCE32?
Right, will change it to "DMA32" in next revision.
>> +	for (i = base; i < base + segs; i++)
>> +		phb->ioda.dma32_segmap[i] = pe->pe_number;
>> +
>I'm pretty sure this is right, but can you just confirm that you
>intended to index into the array starting at base and going to base + segs,
>and not going from 0 to segs? (i.e. not dma32_segmap[i - base]).
>
I think you're saying "I'm not pretty sure this is right". The code doesn't
have problem here: it starts from @base.
>Otherwise looks good.
>
>
>>  	/* Setup linux iommu table */
>>  	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>  				  base * PNV_IODA1_DMA32_SEGSIZE,
>> @@ -2378,13 +2382,13 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>  	 * then we assign at least one segment per PE, plus more based
>>  	 * on the amount of devices under that PE
>>  	 */
>> -	if (dma_pe_count > phb->ioda.tce32_count)
>> +	if (dma_pe_count > phb->ioda.dma32_count)
>>  		residual = 0;
>>  	else
>> -		residual = phb->ioda.tce32_count - dma_pe_count;
>> +		residual = phb->ioda.dma32_count - dma_pe_count;
>>  
>>  	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>> -		hose->global_number, phb->ioda.tce32_count);
>> +		hose->global_number, phb->ioda.dma32_count);
>>  	pr_info("PCI: %d PE# for a total weight of %d\n",
>>  		dma_pe_count, total_weight);
>>  
>> @@ -2394,7 +2398,7 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>  	 * out one base segment plus any residual segments based on
>>  	 * weight
>>  	 */
>> -	remaining = phb->ioda.tce32_count;
>> +	remaining = phb->ioda.dma32_count;
>>  	base = 0;
>>  	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>  		weight = pnv_pci_ioda_pe_dma_weight(pe);
>> @@ -3094,7 +3098,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  {
>>  	struct pci_controller *hose;
>>  	struct pnv_phb *phb;
>> -	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>> +	unsigned long size, m64map_off, m32map_off, pemap_off;
>> +	unsigned long iomap_off = 0, dma32map_off = 0;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>  	int i, len;
>> @@ -3177,6 +3182,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
>>  	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>>  
>> +	/* Calculate how many 32-bit TCE segments we have */
>> +	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>> +				PNV_IODA1_DMA32_SEGSIZE;
>> +
>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>  	m64map_off = size;
>> @@ -3186,6 +3195,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (phb->type == PNV_PHB_IODA1) {
>>  		iomap_off = size;
>>  		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
>> +		dma32map_off = size;
>> +		size += phb->ioda.dma32_count *
>> +			sizeof(phb->ioda.dma32_segmap[0]);
>>  	}
>>  	pemap_off = size;
>>  	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>> @@ -3201,6 +3213,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  		phb->ioda.io_segmap = aux + iomap_off;
>>  		for (i = 0; i < phb->ioda.total_pe_num; i++)
>>  			phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>> +
>> +		phb->ioda.dma32_segmap = aux + dma32map_off;
>> +		for (i = 0; i < phb->ioda.dma32_count; i++)
>> +			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>>  	}
>>  	phb->ioda.pe_array = aux + pemap_off;
>>  	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>> @@ -3208,10 +3224,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	INIT_LIST_HEAD(&phb->ioda.pe_list);
>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>  
>> -	/* Calculate how many 32-bit TCE segments we have */
>> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>> -				PNV_IODA1_DMA32_SEGSIZE;
>> -
>>  #if 0 /* We should really do that ... */
>>  	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>  					 window_type,
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index 2038ef2..0802fcd 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -148,6 +148,10 @@ struct pnv_phb {
>>  			int			*m32_segmap;
>>  			int			*io_segmap;
>>  
>> +			/* DMA32 segment maps - IODA1 only */
>> +			unsigned long		dma32_count;
>> +			int			*dma32_segmap;
>> +
>>  			/* IRQ chip */
>>  			int			irq_chip_init;
>>  			struct irq_chip		irq_chip;
>> @@ -164,9 +168,6 @@ struct pnv_phb {
>>  			 */
>>  			unsigned char		pe_rmap[0x10000];
>>  
>> -			/* 32-bit TCE tables allocation */
>> -			unsigned long		tce32_count;
>> -
>>  			/* TCE cache invalidate registers (physical and
>>  			 * remapped)
>>  			 */
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity
  2015-11-17  0:29   ` Daniel Axtens
@ 2015-11-17  1:56     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:56 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 11:29:26AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> Each PHB maintains an array helping to translate 2-bytes Request
>> ID (RID) to PE# with the assumption that PE# takes one byte, meaning
>> that we can't have more than 256 PEs. However, pci_dn->pe_number
>> already had 4-bytes for the PE#.
>>
>> This extends the PE# capacity so that each of them will be 4-bytes
>> long. Then we can reuse IODA_INVALID_PE to check the PE# stored in
>> phb->pe_rmap[] is valid or not.
>
>Just for clarity, could you make it clear in the commit message that
>you're increasing the PE# capacity _in the PHB_? I just found it a bit
>confusing the first time I read it.
>
>With that clarified I'll be happy to add my reviewed-by tag.
>
Sure, will add it and thanks for your happiness :-)
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++++-
>>  arch/powerpc/platforms/powernv/pci.h      | 7 ++-----
>>  2 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 0e66c4d..ef93a01 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -766,7 +766,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>  
>>  	/* Clear the reverse map */
>>  	for (rid = pe->rid; rid < rid_end; rid++)
>> -		phb->ioda.pe_rmap[rid] = 0;
>> +		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>  
>>  	/* Release from all parents PELT-V */
>>  	while (parent) {
>> @@ -3164,6 +3164,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (prop32)
>>  		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>>  
>> +	/* Invalidate RID to PE# mapping */
>> +	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
>> +		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
>> +
>>  	/* Parse 64-bit MMIO range */
>>  	pnv_ioda_parse_m64_window(phb);
>>  
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index 0802fcd..5df945f 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -162,11 +162,8 @@ struct pnv_phb {
>>  			struct list_head	pe_list;
>>  			struct mutex            pe_list_mutex;
>>  
>> -			/* Reverse map of PEs, will have to extend if
>> -			 * we are to support more than 256 PEs, indexed
>> -			 * bus { bus, devfn }
>> -			 */
>> -			unsigned char		pe_rmap[0x10000];
>> +			/* Reverse map of PEs, indexed by {bus, devfn} */
>> +			int			pe_rmap[0x10000];
>>  
>>  			/* TCE cache invalidate registers (physical and
>>  			 * remapped)
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe()
  2015-11-17  0:30   ` Daniel Axtens
@ 2015-11-17  1:58     ` Gavin Shan
  2015-11-17  2:37       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  1:58 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe, aik,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 11:30:49AM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>
>> This introduces pnv_ioda_init_pe() to initialize the specified PE
>> instance (phb->ioda.pe_array[x]). It's used by pnv_ioda_alloc_pe()
>> and pnv_ioda_reserve_pe(). No logical changes introduced.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index ef93a01..488e0f8 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -129,6 +129,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>  		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>  }
>>  
>> +static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>> +{
>> +	phb->ioda.pe_array[pe_no].phb = phb;
>> +	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>> +
>> +	return &phb->ioda.pe_array[pe_no];
>You have the function returning the newly initalized PE here...
>
>> +}
>> +
>>  static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>  {
>>  	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
>> @@ -141,8 +149,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>  		pr_debug("%s: PE %d was reserved on PHB#%x\n",
>>  			 __func__, pe_no, phb->hose->global_number);
>>  
>> -	phb->ioda.pe_array[pe_no].phb = phb;
>> -	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>> +	pnv_ioda_init_pe(phb, pe_no);
>... but then you ignore the result here and in the other function you've
>modified.
>
>It looks like you're using the result in the next patch though, so I
>wonder if you would be better to merge this patch with the next
>one. However, as I said before I'll defer to Alexey on decisions about
>how to split the patch series if he has a different opinion.
>
I'd like to keep this separate when thinking about the rule I was told before:
one patch does one thing if it can. Also, merging it to next one will make
next one harder to be reiview.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list
  2015-11-17  1:54   ` Alexey Kardashevskiy
@ 2015-11-17  2:01     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  2:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 12:54:04PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
>>to their DMA32 weight. The PEs on the list are iterated to setup
>>their TCE32 tables at system booting time. The list is used for
>>once and there is no good reason for it to survive.
>
>From the above I concluded that you need a list, just do not need to keep
>after the configuration is done but in fact you remove the list completely so
>just remove "to survive" (s/for it to survive/for keep having it/) :)
>
Thanks & will change it accordingly in next revision :)
>>
>>This moves the logic calculating DMA32 weight of PHB and PE to
>>pnv_pci_ioda1_setup_dma() to drop PHB's DMA32 list.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 150 ++++++++++++++----------------
>>  arch/powerpc/platforms/powernv/pci.h      |  19 ----
>>  2 files changed, 68 insertions(+), 101 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 4c2e023..20ebe6e 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -891,44 +891,6 @@ out:
>>  	return 0;
>>  }
>>
>>-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>>-				       struct pnv_ioda_pe *pe)
>>-{
>>-	struct pnv_ioda_pe *lpe;
>>-
>>-	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (lpe->dma_weight < pe->dma_weight) {
>>-			list_add_tail(&pe->dma_link, &lpe->dma_link);
>>-			return;
>>-		}
>>-	}
>>-	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
>>-}
>>-
>>-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>>-{
>>-	/* This is quite simplistic. The "base" weight of a device
>>-	 * is 10. 0 means no DMA is to be accounted for it.
>>-	 */
>>-
>>-	/* If it's a bridge, no DMA */
>>-	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>>-		return 0;
>>-
>>-	/* Reduce the weight of slow USB controllers */
>>-	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>-	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>-	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>-		return 3;
>>-
>>-	/* Increase the weight of RAID (includes Obsidian) */
>>-	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>-		return 15;
>>-
>>-	/* Default */
>>-	return 10;
>>-}
>>-
>>  #ifdef CONFIG_PCI_IOV
>>  static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  {
>>@@ -1009,7 +971,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  			continue;
>>  		}
>>  		pdn->pe_number = pe->pe_number;
>>-		pe->dma_weight += pnv_ioda_dma_weight(dev);
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>  			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>  	}
>>@@ -1046,10 +1007,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  	pe->pbus = bus;
>>  	pe->pdev = NULL;
>>-	pe->tce32_seg = -1;
>>  	pe->mve_number = -1;
>>  	pe->rid = bus->busn_res.start << 8;
>>-	pe->dma_weight = 0;
>>
>>  	if (all)
>>  		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>@@ -1071,17 +1030,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>
>>  	/* Put PE to the list */
>>  	list_add_tail(&pe->list, &phb->ioda.pe_list);
>>-
>>-	/* Account for one DMA PE if at least one DMA capable device exist
>>-	 * below the bridge
>>-	 */
>>-	if (pe->dma_weight != 0) {
>>-		phb->ioda.dma_weight += pe->dma_weight;
>>-		phb->ioda.dma_pe_count++;
>>-	}
>>-
>>-	/* Link the PE */
>>-	pnv_ioda_link_pe_by_weight(phb, pe);
>>  }
>>
>>  static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>@@ -1389,7 +1337,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  		pe->flags = PNV_IODA_PE_VF;
>>  		pe->pbus = NULL;
>>  		pe->parent_dev = pdev;
>>-		pe->tce32_seg = -1;
>>  		pe->mve_number = -1;
>>  		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>>  			   pci_iov_virtfn_devfn(pdev, vf_index);
>>@@ -1842,6 +1789,47 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
>>  	.free = pnv_ioda2_table_free,
>>  };
>>
>>+static int pnv_pci_ioda_dev_dma_weight(struct pci_dev *dev, void *data)
>>+{
>>+	unsigned int *weight = (unsigned int *)data;
>>+
>>+	/* This is quite simplistic. The "base" weight of a device
>>+	 * is 10. 0 means no DMA is to be accounted for it.
>>+	 */
>>+
>>+	if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>>+		return 0;
>>+
>>+	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>+	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>+	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>+		*weight += 3;
>>+	else if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>+		*weight += 15;
>>+	else
>>+		*weight += 10;
>>+
>>+	return 0;
>>+}
>>+
>>+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
>>+{
>>+	unsigned int weight = 0;
>>+
>>+	if ((pe->flags & PNV_IODA_PE_DEV) && pe->pdev) {
>>+		pnv_pci_ioda_dev_dma_weight(pe->pdev, &weight);
>>+	} else if ((pe->flags & PNV_IODA_PE_BUS) && pe->pbus) {
>>+		struct pci_dev *pdev;
>>+
>>+		list_for_each_entry(pdev, &pe->pbus->devices, bus_list)
>>+			pnv_pci_ioda_dev_dma_weight(pdev, &weight);
>>+	} else if ((pe->flags & PNV_IODA_PE_BUS_ALL) && pe->pbus) {
>>+		pci_walk_bus(pe->pbus, pnv_pci_ioda_dev_dma_weight, &weight);
>>+	}
>>+
>>+	return weight;
>>+}
>>+
>>  static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  				       struct pnv_ioda_pe *pe,
>>  				       unsigned int base,
>>@@ -1858,17 +1846,12 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>>  	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>
>>-	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->tce32_seg >= 0))
>>-		return;
>>-
>>  	tbl = pnv_pci_table_alloc(phb->hose->node);
>>  	iommu_register_group(&pe->table_group, phb->hose->global_number,
>>  			pe->pe_number);
>>  	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>>
>>  	/* Grab a 32-bit TCE table */
>>-	pe->tce32_seg = base;
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>  		base * PNV_IODA1_DMA32_SEGSIZE,
>>  		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>@@ -1932,8 +1915,6 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>  	return;
>>   fail:
>>  	/* XXX Failure: Try to fallback to 64-bit only ? */
>>-	if (pe->tce32_seg >= 0)
>>-		pe->tce32_seg = -1;
>>  	if (tce_mem)
>>  		__free_pages(tce_mem, get_order(tce32_segsz * segs));
>>  	if (tbl) {
>>@@ -2344,10 +2325,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  {
>>  	int64_t rc;
>>
>>-	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->tce32_seg >= 0))
>>-		return;
>>-
>>  	/* TVE #1 is selected by PCI address bit 59 */
>>  	pe->tce_bypass_base = 1ull << 59;
>>
>>@@ -2355,7 +2332,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  			pe->pe_number);
>>
>>  	/* The PE will reserve all possible 32-bits space */
>>-	pe->tce32_seg = 0;
>>  	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>>  		phb->ioda.m32_pci_base);
>>
>>@@ -2371,11 +2347,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  #endif
>>
>>  	rc = pnv_pci_ioda2_setup_default_config(pe);
>>-	if (rc) {
>>-		if (pe->tce32_seg >= 0)
>>-			pe->tce32_seg = -1;
>>+	if (rc)
>>  		return;
>>-	}
>>
>>  	if (pe->flags & PNV_IODA_PE_DEV)
>>  		iommu_add_device(&pe->pdev->dev);
>>@@ -2386,24 +2359,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	unsigned int residual, remaining, segs, tw, base;
>>+	unsigned int weight, total_weight, dma_pe_count;
>>+	unsigned int residual, remaining, segs, base;
>>  	struct pnv_ioda_pe *pe;
>>
>>+	total_weight = 0;
>>+	dma_pe_count = 0;
>>+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>+		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+		if (weight > 0)
>>+			dma_pe_count++;
>>+
>>+		total_weight += weight;
>>+	}
>>+
>>  	/* If we have more PE# than segments available, hand out one
>>  	 * per PE until we run out and let the rest fail. If not,
>>  	 * then we assign at least one segment per PE, plus more based
>>  	 * on the amount of devices under that PE
>>  	 */
>>-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
>>+	if (dma_pe_count > phb->ioda.tce32_count)
>>  		residual = 0;
>>  	else
>>-		residual = phb->ioda.tce32_count -
>>-			phb->ioda.dma_pe_count;
>>+		residual = phb->ioda.tce32_count - dma_pe_count;
>>
>>  	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>>  		hose->global_number, phb->ioda.tce32_count);
>>  	pr_info("PCI: %d PE# for a total weight of %d\n",
>>-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
>>+		dma_pe_count, total_weight);
>>
>>  	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>@@ -2412,24 +2395,26 @@ static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>  	 * weight
>>  	 */
>>  	remaining = phb->ioda.tce32_count;
>>-	tw = phb->ioda.dma_weight;
>>  	base = 0;
>>-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (!pe->dma_weight)
>>+	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>+		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+		if (!weight)
>>  			continue;
>>+
>
>
>Unrelated new line.
>
Will remove it in next revision.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  1:38     ` Gavin Shan
@ 2015-11-17  2:11       ` Alexey Kardashevskiy
  2015-11-17  2:44         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  2:11 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 12:38 PM, Gavin Shan wrote:
> On Mon, Nov 16, 2015 at 07:02:03PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> This enables M64 window on P7IOC, which has been enabled on PHB3.
>>> Different from PHB3 where 16 M64 BARs are supported and each of
>>> them can be owned by one particular PE# exclusively or divided
>>> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>> of them are divided to 8 segments. So every P7IOC PHB supports
>>> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>> M64DT, indicating that one M64 segment can only be pinned to the
>>> fixed PE#. In order to have same code to support M64 on P7IOC and
>>> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>> of them is pinned to the fixed PE# by bypassing the function of
>>> M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>> and PHB3 to support M64.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>>   arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>>   2 files changed, 86 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 1f7d985..bfe69f1 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>   	}
>>>   }
>>>
>>> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>> +{
>>> +	struct resource *r;
>>> +	int index;
>>> +
>>> +	/*
>>> +	 * There are 16 M64 BARs, each of which has 8 segments. So
>>> +	 * there are as many M64 segments as the maximum number of
>>> +	 * PEs, which is 128.
>>> +	 */
>>> +	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>> +		unsigned long base, segsz = phb->ioda.m64_segsize;
>>> +		int64_t rc;
>>> +
>>> +		base = phb->ioda.m64_base +
>>> +		       index * PNV_IODA1_M64_SEGS * segsz;
>>> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>> +				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>> +				PNV_IODA1_M64_SEGS * segsz);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, index);
>>> +			goto fail;
>>> +		}
>>> +
>>> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>> +				OPAL_M64_WINDOW_TYPE, index,
>>> +				OPAL_ENABLE_M64_SPLIT);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, index);
>>> +			goto fail;
>>> +		}
>>> +	}
>>> +
>>> +	/*
>>> +	 * Exclude the segment used by the reserved PE, which
>>> +	 * is expected to be 0 or last supported PE#.
>>> +	 */
>>> +	r = &phb->hose->mem_resources[1];
>>
>>
>> What does "1" mean here? A bridge's 64bit prefetchable window?
>>
>
> It's PHB's M64 window.
mem_resources[] of a hose are not windows of the root PCI bridge?
>
>>
>>> +	if (phb->ioda.reserved_pe_idx == 0)
>>> +		r->start += phb->ioda.m64_segsize;
>>> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>> +		r->end -= phb->ioda.m64_segsize;
>>> +	else
>>> +		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>> +			phb->ioda.reserved_pe_idx);
>>> +
>
> Thanks,
> Gavin
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  1:42     ` Gavin Shan
@ 2015-11-17  2:37       ` Alexey Kardashevskiy
  2015-11-17  3:04         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  2:37 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 12:42 PM, Gavin Shan wrote:
> On Mon, Nov 16, 2015 at 07:02:18PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> This enables M64 window on P7IOC, which has been enabled on PHB3.
>>> Different from PHB3 where 16 M64 BARs are supported and each of
>>> them can be owned by one particular PE# exclusively or divided
>>> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>> of them are divided to 8 segments. So every P7IOC PHB supports
>>> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>> M64DT, indicating that one M64 segment can only be pinned to the
>>> fixed PE#. In order to have same code to support M64 on P7IOC and
>>> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>> of them is pinned to the fixed PE# by bypassing the function of
>>> M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>> and PHB3 to support M64.
>>
>> I thought we decided (Ben suggested?) not to push P7IOC code now (or ever) as
>> there is no user for it, has this changed?
>>
>
> Remember that the code is mixed for P7IOC/PHB3. It's not harmful to support
> M64 window on P7IOC, which is much larger than M32.
The patchset starts with removing dead code and then adds more dead code. 
This is not right...
>> btw please put ioda1/ioda2/p7ioc/etc to the subject line to make it easier to
>> see how much work is there about particular PHB type. You rename quite many
>> functions and I generally want to ask you to group all renaming patches first
>> but it would also make sense to keep them close to (for example)
>> p7ioc-related patches so having more descriptive subject lines may help.
>> Thanks.
>>
>
> As the code is mixed for P7IOC/PHB3, I'm not following the line (IODA1/IODA2/p7ioc/phb3)
> in this patchset.
But you should draw the bold line between PHB types imho.
> Instead, the sequence of patchset is order related to: cod refactoring,
> IO/M32/M64, DMA, PE allocation/releaseing.
Some patches from this patchset are about P7IOC only. All I am asking is to 
say specifically in the subject line what the patch touches - 
IODA1/IODA2/p7ioc/phb3/all_of_them. Or I can walk through all of them, pick 
P7IOC's ones, evaluate the amount of code and entropy they actually add and 
then ask Ben what we do about it, it will just take longer rather than if 
you did it.
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe()
  2015-11-17  1:58     ` Gavin Shan
@ 2015-11-17  2:37       ` Alexey Kardashevskiy
  2015-11-17  2:53         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  2:37 UTC (permalink / raw)
  To: Gavin Shan, Daniel Axtens
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 12:58 PM, Gavin Shan wrote:
> On Tue, Nov 17, 2015 at 11:30:49AM +1100, Daniel Axtens wrote:
>> Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>>
>>> This introduces pnv_ioda_init_pe() to initialize the specified PE
>>> instance (phb->ioda.pe_array[x]). It's used by pnv_ioda_alloc_pe()
>>> and pnv_ioda_reserve_pe(). No logical changes introduced.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++++++----
>>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index ef93a01..488e0f8 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -129,6 +129,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>>   }
>>>
>>> +static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>>> +{
>>> +	phb->ioda.pe_array[pe_no].phb = phb;
>>> +	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>>> +
>>> +	return &phb->ioda.pe_array[pe_no];
>> You have the function returning the newly initalized PE here...
>>
>>> +}
>>> +
>>>   static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>   {
>>>   	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
>>> @@ -141,8 +149,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>   		pr_debug("%s: PE %d was reserved on PHB#%x\n",
>>>   			 __func__, pe_no, phb->hose->global_number);
>>>
>>> -	phb->ioda.pe_array[pe_no].phb = phb;
>>> -	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>>> +	pnv_ioda_init_pe(phb, pe_no);
>> ... but then you ignore the result here and in the other function you've
>> modified.
>>
>> It looks like you're using the result in the next patch though, so I
>> wonder if you would be better to merge this patch with the next
>> one. However, as I said before I'll defer to Alexey on decisions about
>> how to split the patch series if he has a different opinion.
>>
>
> I'd like to keep this separate when thinking about the rule I was told before:
> one patch does one thing if it can. Also, merging it to next one will make
> next one harder to be reiview.
This patch merged into the next one will make the next one easier to review 
because you won't have to change there the code which you just added in 
this patch (which is always good).
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  2:11       ` Alexey Kardashevskiy
@ 2015-11-17  2:44         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  2:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 01:11:56PM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 12:38 PM, Gavin Shan wrote:
>>On Mon, Nov 16, 2015 at 07:02:03PM +1100, Alexey Kardashevskiy wrote:
>>>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>>>Different from PHB3 where 16 M64 BARs are supported and each of
>>>>them can be owned by one particular PE# exclusively or divided
>>>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>>>of them are divided to 8 segments. So every P7IOC PHB supports
>>>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>>>M64DT, indicating that one M64 segment can only be pinned to the
>>>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>>>of them is pinned to the fixed PE# by bypassing the function of
>>>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>>>and PHB3 to support M64.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>>>  arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>>>  2 files changed, 86 insertions(+), 3 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 1f7d985..bfe69f1 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>>  	}
>>>>  }
>>>>
>>>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>>>+{
>>>>+	struct resource *r;
>>>>+	int index;
>>>>+
>>>>+	/*
>>>>+	 * There are 16 M64 BARs, each of which has 8 segments. So
>>>>+	 * there are as many M64 segments as the maximum number of
>>>>+	 * PEs, which is 128.
>>>>+	 */
>>>>+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>>>+		unsigned long base, segsz = phb->ioda.m64_segsize;
>>>>+		int64_t rc;
>>>>+
>>>>+		base = phb->ioda.m64_base +
>>>>+		       index * PNV_IODA1_M64_SEGS * segsz;
>>>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>>>+				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>>>+				PNV_IODA1_M64_SEGS * segsz);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>>>+				rc, phb->hose->global_number, index);
>>>>+			goto fail;
>>>>+		}
>>>>+
>>>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>+				OPAL_M64_WINDOW_TYPE, index,
>>>>+				OPAL_ENABLE_M64_SPLIT);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>>>+				rc, phb->hose->global_number, index);
>>>>+			goto fail;
>>>>+		}
>>>>+	}
>>>>+
>>>>+	/*
>>>>+	 * Exclude the segment used by the reserved PE, which
>>>>+	 * is expected to be 0 or last supported PE#.
>>>>+	 */
>>>>+	r = &phb->hose->mem_resources[1];
>>>
>>>
>>>What does "1" mean here? A bridge's 64bit prefetchable window?
>>>
>>
>>It's PHB's M64 window.
>
>mem_resources[] of a hose are not windows of the root PCI bridge?
>
No. They're windows for root bus, but not for root PCI bridge.
>>
>>>
>>>>+	if (phb->ioda.reserved_pe_idx == 0)
>>>>+		r->start += phb->ioda.m64_segsize;
>>>>+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>>>+		r->end -= phb->ioda.m64_segsize;
>>>>+	else
>>>>+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>>>+			phb->ioda.reserved_pe_idx);
>>>>+
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe()
  2015-11-17  2:37       ` Alexey Kardashevskiy
@ 2015-11-17  2:53         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  2:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, Daniel Axtens, linuxppc-dev, linux-pci, devicetree,
	benh, mpe, bhelgaas, grant.likely, robherring2, panto,
	frowand.list
On Tue, Nov 17, 2015 at 01:37:33PM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 12:58 PM, Gavin Shan wrote:
>>On Tue, Nov 17, 2015 at 11:30:49AM +1100, Daniel Axtens wrote:
>>>Gavin Shan <gwshan@linux.vnet.ibm.com> writes:
>>>
>>>>This introduces pnv_ioda_init_pe() to initialize the specified PE
>>>>instance (phb->ioda.pe_array[x]). It's used by pnv_ioda_alloc_pe()
>>>>and pnv_ioda_reserve_pe(). No logical changes introduced.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++++++----
>>>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index ef93a01..488e0f8 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -129,6 +129,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>>>  		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>>>  }
>>>>
>>>>+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>>>>+{
>>>>+	phb->ioda.pe_array[pe_no].phb = phb;
>>>>+	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>>>>+
>>>>+	return &phb->ioda.pe_array[pe_no];
>>>You have the function returning the newly initalized PE here...
>>>
>>>>+}
>>>>+
>>>>  static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>>  {
>>>>  	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
>>>>@@ -141,8 +149,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>>  		pr_debug("%s: PE %d was reserved on PHB#%x\n",
>>>>  			 __func__, pe_no, phb->hose->global_number);
>>>>
>>>>-	phb->ioda.pe_array[pe_no].phb = phb;
>>>>-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>>>>+	pnv_ioda_init_pe(phb, pe_no);
>>>... but then you ignore the result here and in the other function you've
>>>modified.
>>>
>>>It looks like you're using the result in the next patch though, so I
>>>wonder if you would be better to merge this patch with the next
>>>one. However, as I said before I'll defer to Alexey on decisions about
>>>how to split the patch series if he has a different opinion.
>>>
>>
>>I'd like to keep this separate when thinking about the rule I was told before:
>>one patch does one thing if it can. Also, merging it to next one will make
>>next one harder to be reiview.
>
>This patch merged into the next one will make the next one easier to review
>because you won't have to change there the code which you just added in this
>patch (which is always good).
>
Ok & will do in next revision.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  2:37       ` Alexey Kardashevskiy
@ 2015-11-17  3:04         ` Gavin Shan
  2015-11-17  3:40           ` Benjamin Herrenschmidt
  2015-11-17  4:43           ` Alexey Kardashevskiy
  0 siblings, 2 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  3:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 01:37:22PM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 12:42 PM, Gavin Shan wrote:
>>On Mon, Nov 16, 2015 at 07:02:18PM +1100, Alexey Kardashevskiy wrote:
>>>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>>>Different from PHB3 where 16 M64 BARs are supported and each of
>>>>them can be owned by one particular PE# exclusively or divided
>>>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>>>of them are divided to 8 segments. So every P7IOC PHB supports
>>>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>>>M64DT, indicating that one M64 segment can only be pinned to the
>>>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>>>of them is pinned to the fixed PE# by bypassing the function of
>>>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>>>and PHB3 to support M64.
>>>
>>>I thought we decided (Ben suggested?) not to push P7IOC code now (or ever) as
>>>there is no user for it, has this changed?
>>>
>>
>>Remember that the code is mixed for P7IOC/PHB3. It's not harmful to support
>>M64 window on P7IOC, which is much larger than M32.
>
>
>The patchset starts with removing dead code and then adds more dead code.
>This is not right...
>
Sorry, you mean it's fine to break the code on P7IOC as it's going to be dead.
But I'm curious when it's going happen, any idea about that?
>>>btw please put ioda1/ioda2/p7ioc/etc to the subject line to make it easier to
>>>see how much work is there about particular PHB type. You rename quite many
>>>functions and I generally want to ask you to group all renaming patches first
>>>but it would also make sense to keep them close to (for example)
>>>p7ioc-related patches so having more descriptive subject lines may help.
>>>Thanks.
>>>
>>
>>As the code is mixed for P7IOC/PHB3, I'm not following the line (IODA1/IODA2/p7ioc/phb3)
>>in this patchset.
>
>But you should draw the bold line between PHB types imho.
>
>>Instead, the sequence of patchset is order related to: cod refactoring,
>>IO/M32/M64, DMA, PE allocation/releaseing.
>
>
>Some patches from this patchset are about P7IOC only. All I am asking is to
>say specifically in the subject line what the patch touches -
>IODA1/IODA2/p7ioc/phb3/all_of_them. Or I can walk through all of them, pick
>P7IOC's ones, evaluate the amount of code and entropy they actually add and
>then ask Ben what we do about it, it will just take longer rather than if you
>did it.
>
Please give me a clear command what key words you need in the subject in next revision.
What I understood is you want to see one of them:
powerpc/powernv/ioda1:
powerpc/powernv/ioda2:
powerpc/powernv/all:
Thanks,
Gavin
>
>-- 
>Alexey
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  3:04         ` Gavin Shan
@ 2015-11-17  3:40           ` Benjamin Herrenschmidt
  2015-11-17  4:43           ` Alexey Kardashevskiy
  1 sibling, 0 replies; 157+ messages in thread
From: Benjamin Herrenschmidt @ 2015-11-17  3:40 UTC (permalink / raw)
  To: Gavin Shan, Alexey Kardashevskiy
  Cc: linuxppc-dev, linux-pci, devicetree, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On Tue, 2015-11-17 at 14:04 +1100, Gavin Shan wrote:
> 
> Sorry, you mean it's fine to break the code on P7IOC as it's going to be dead.
> But I'm curious when it's going happen, any idea about that?
Is it ? I think it's ok to not add support for M64, hotpug etc.... for
IODA1, but we can do that without *breaking* basic function that we
have today.
It's not completely dead, there are still a number of machines inside
of IBM with P7 that we can support for a little while longer (including
a few inside ozlabs).
Cheers,
Ben.
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  3:04         ` Gavin Shan
  2015-11-17  3:40           ` Benjamin Herrenschmidt
@ 2015-11-17  4:43           ` Alexey Kardashevskiy
  2015-11-17  8:44             ` Gavin Shan
  1 sibling, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  4:43 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 02:04 PM, Gavin Shan wrote:
> On Tue, Nov 17, 2015 at 01:37:22PM +1100, Alexey Kardashevskiy wrote:
>> On 11/17/2015 12:42 PM, Gavin Shan wrote:
>>> On Mon, Nov 16, 2015 at 07:02:18PM +1100, Alexey Kardashevskiy wrote:
>>>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>> This enables M64 window on P7IOC, which has been enabled on PHB3.
>>>>> Different from PHB3 where 16 M64 BARs are supported and each of
>>>>> them can be owned by one particular PE# exclusively or divided
>>>>> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>>>> of them are divided to 8 segments. So every P7IOC PHB supports
>>>>> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>>>> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>>>> M64DT, indicating that one M64 segment can only be pinned to the
>>>>> fixed PE#. In order to have same code to support M64 on P7IOC and
>>>>> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>>>> of them is pinned to the fixed PE# by bypassing the function of
>>>>> M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>>>> and PHB3 to support M64.
>>>>
>>>> I thought we decided (Ben suggested?) not to push P7IOC code now (or ever) as
>>>> there is no user for it, has this changed?
>>>>
>>>
>>> Remember that the code is mixed for P7IOC/PHB3. It's not harmful to support
>>> M64 window on P7IOC, which is much larger than M32.
>>
>>
>> The patchset starts with removing dead code and then adds more dead code.
>> This is not right...
>>
>
> Sorry, you mean it's fine to break the code on P7IOC as it's going to be dead.
I am saying that the _new_ code which implements PCI hotplug on P7IOC is 
dead, not the existing P7IOC support which needs to keep working. Reworks 
you make should keep P7IOC alive but they do not have to add hotplug.
imho it is more likely that we drop P7IOC support in the mainline kernel in 
next 5 years than someone plugs a PCI device to a running P7IOC machine 
anywhere.
> But I'm curious when it's going happen, any idea about that?
>
>>>> btw please put ioda1/ioda2/p7ioc/etc to the subject line to make it easier to
>>>> see how much work is there about particular PHB type. You rename quite many
>>>> functions and I generally want to ask you to group all renaming patches first
>>>> but it would also make sense to keep them close to (for example)
>>>> p7ioc-related patches so having more descriptive subject lines may help.
>>>> Thanks.
>>>>
>>>
>>> As the code is mixed for P7IOC/PHB3, I'm not following the line (IODA1/IODA2/p7ioc/phb3)
>>> in this patchset.
>>
>> But you should draw the bold line between PHB types imho.
>>
>>> Instead, the sequence of patchset is order related to: cod refactoring,
>>> IO/M32/M64, DMA, PE allocation/releaseing.
>>
>>
>> Some patches from this patchset are about P7IOC only. All I am asking is to
>> say specifically in the subject line what the patch touches -
>> IODA1/IODA2/p7ioc/phb3/all_of_them. Or I can walk through all of them, pick
>> P7IOC's ones, evaluate the amount of code and entropy they actually add and
>> then ask Ben what we do about it, it will just take longer rather than if you
>> did it.
>>
>
> Please give me a clear command what key words you need in the subject in next revision.
> What I understood is you want to see one of them:
>
> powerpc/powernv/ioda1:
Yes, looks good.
> powerpc/powernv/ioda2:
Yes.
> powerpc/powernv/all:
Just "powerpc/powernv".
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release
  2015-11-04 13:12 ` [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
@ 2015-11-17  5:08   ` Alexey Kardashevskiy
  2015-11-17  9:03     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  5:08 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: devicetree, linux-pci, panto, grant.likely, robherring2, bhelgaas,
	frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> In current implementation, the PEs that are allocated or picked
> from the reserved list are identified by PE number. The PE instance
> has to be picked according to the PE number eventually. We have
> same issue when PE is released.
>
> For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
> PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
> or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
> returns the reserved/allocated PE instance to be used in subsequent
> patches. On the other hand, pnv_ioda_free_pe() uses PE instance
> (not number) as its argument. No logical changes introduced.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++++++++++++++++--------------
>   arch/powerpc/platforms/powernv/pci.h      |  2 +-
>   2 files changed, 46 insertions(+), 37 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 488e0f8..ae82df1 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -152,7 +152,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   	pnv_ioda_init_pe(phb, pe_no);
>   }
>
> -static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
> +static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   {
>   	unsigned long pe;
>
> @@ -160,19 +160,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>   					phb->ioda.total_pe_num, 0);
>   		if (pe >= phb->ioda.total_pe_num)
> -			return IODA_INVALID_PE;
> +			return NULL;
>   	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>
> -	pnv_ioda_init_pe(phb, pe);
> -	return pe;
> +	return pnv_ioda_init_pe(phb, pe);
>   }
>
> -static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
> +static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>   {
> -	WARN_ON(phb->ioda.pe_array[pe].pdev);
> +	struct pnv_phb *phb = pe->phb;
> +
> +	WARN_ON(pe->pdev);
>
> -	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
> -	clear_bit(pe, phb->ioda.pe_alloc);
> +	memset(pe, 0, sizeof(struct pnv_ioda_pe));
> +	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
>   }
>
>   /* The default M64 BAR is shared by all PEs */
> @@ -332,7 +333,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>   	}
>   }
>
> -static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
> +static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> @@ -342,7 +343,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>
>   	/* Root bus shouldn't use M64 */
>   	if (pci_is_root_bus(bus))
> -		return IODA_INVALID_PE;
> +		return NULL;
>
>   	/* Allocate bitmap */
>   	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
> @@ -350,7 +351,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	if (!pe_alloc) {
>   		pr_warn("%s: Out of memory !\n",
>   			__func__);
> -		return IODA_INVALID_PE;
> +		return NULL;
>   	}
>
>   	/* Figure out reserved PE numbers by the PE */
> @@ -363,7 +364,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 */
>   	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>   		kfree(pe_alloc);
> -		return IODA_INVALID_PE;
> +		return NULL;
>   	}
>
>   	/*
> @@ -409,7 +410,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	}
>
>   	kfree(pe_alloc);
> -	return master_pe->pe_number;
> +	return master_pe;
>   }
>
>   static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
> @@ -988,28 +989,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>    * subordinate PCI devices and buses. The second type of PE is normally
>    * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>    */
> -static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
> +static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> -	struct pnv_ioda_pe *pe;
> -	int pe_num = IODA_INVALID_PE;
> +	struct pnv_ioda_pe *pe = NULL;
>
>   	/* Check if PE is determined by M64 */
>   	if (phb->pick_m64_pe)
> -		pe_num = phb->pick_m64_pe(bus, all);
> +		pe = phb->pick_m64_pe(bus, all);
>
>   	/* The PE number isn't pinned by M64 */
> -	if (pe_num == IODA_INVALID_PE)
> -		pe_num = pnv_ioda_alloc_pe(phb);
> +	if (!pe)
> +		pe = pnv_ioda_alloc_pe(phb);
>
> -	if (pe_num == IODA_INVALID_PE) {
> +	if (!pe) {
>   		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
>   			__func__, pci_domain_nr(bus), bus->number);
> -		return;
> +		return NULL;
>   	}
>
> -	pe = &phb->ioda.pe_array[pe_num];
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> @@ -1018,17 +1017,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	if (all)
>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
> -			bus->busn_res.start, bus->busn_res.end, pe_num);
> +			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>   	else
>   		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
> -			bus->busn_res.start, pe_num);
> +			bus->busn_res.start, pe->pe_number);
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
>   		/* XXX What do we do here ? */
> -		if (pe_num)
> -			pnv_ioda_free_pe(phb, pe_num);
> +		pnv_ioda_free_pe(pe);
>   		pe->pbus = NULL;
> -		return;
> +		return NULL;
>   	}
>
>   	/* Associate it with all child devices */
> @@ -1036,6 +1034,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	/* Put PE to the list */
>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
> +
> +	return pe;
>   }
>
>   static void pnv_ioda_setup_PEs(struct pci_bus *bus)
> @@ -1267,7 +1267,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>
>   		pnv_ioda_deconfigure_pe(phb, pe);
>
> -		pnv_ioda_free_pe(phb, pe->pe_number);
> +		pnv_ioda_free_pe(pe);
>   	}
>   }
>
> @@ -1276,6 +1276,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>   	struct pci_bus        *bus;
>   	struct pci_controller *hose;
>   	struct pnv_phb        *phb;
> +	struct pnv_ioda_pe    *pe;
>   	struct pci_dn         *pdn;
>   	struct pci_sriov      *iov;
>   	u16                    num_vfs, i;
> @@ -1300,8 +1301,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>   		/* Release PE numbers */
>   		if (pdn->m64_single_mode) {
>   			for (i = 0; i < num_vfs; i++) {
> -				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
> -					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
> +				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
> +					continue;
> +
> +				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
> +				pnv_ioda_free_pe(pe);
>   			}
>   		} else
>   			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> @@ -1354,9 +1358,8 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
>   		if (pnv_ioda_configure_pe(phb, pe)) {
>   			/* XXX What do we do here ? */
> -			if (pe_num)
> -				pnv_ioda_free_pe(phb, pe_num);
>   			pe->pdev = NULL;
> +			pnv_ioda_free_pe(pe);
pnv_ioda_free_pe() does WARN_ON(pdev). Before this patch you would free PE 
first and then reset pe->pdev, now you reset it first, then call 
pnv_ioda_free_pe(). This change is not just about "Use PE instead of number 
during setup and release", is/was that a bug?
And I fail to see when pe->pdev could get initialized in 
pnv_ioda_configure_pe() as pnv_pci_dma_dev_setup() should not be called 
while pnv_ioda_setup_vf_PE() is working.
>   			continue;
>   		}
>
> @@ -1374,6 +1377,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   	struct pci_bus        *bus;
>   	struct pci_controller *hose;
>   	struct pnv_phb        *phb;
> +	struct pnv_ioda_pe    *pe;
>   	struct pci_dn         *pdn;
>   	int                    ret;
>   	u16                    i;
> @@ -1416,11 +1420,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   		/* Calculate available PE for required VFs */
>   		if (pdn->m64_single_mode) {
>   			for (i = 0; i < num_vfs; i++) {
> -				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
> -				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
> +				pe = pnv_ioda_alloc_pe(phb);
> +				if (!pe) {
>   					ret = -EBUSY;
>   					goto m64_failed;
>   				}
> +
> +				pdn->pe_num_map[i] = pe->pe_number;
>   			}
>   		} else {
>   			mutex_lock(&phb->ioda.pe_alloc_mutex);
> @@ -1465,8 +1471,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   m64_failed:
>   	if (pdn->m64_single_mode) {
>   		for (i = 0; i < num_vfs; i++) {
> -			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
> -				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
> +			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
> +				continue;
> +
> +			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
> +			pnv_ioda_free_pe(pe);
>   		}
>   	} else
>   		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 5df945f..e55ab0e 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -105,7 +105,7 @@ struct pnv_phb {
>   	int (*init_m64)(struct pnv_phb *phb);
>   	void (*reserve_m64_pe)(struct pci_bus *bus,
>   			       unsigned long *pe_bitmap, bool all);
> -	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
> +	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
>   	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>   	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>   	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus
  2015-11-04 13:12 ` [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus Gavin Shan
@ 2015-11-17  6:04   ` Alexey Kardashevskiy
  2015-11-17  9:06     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  6:04 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> We're going to reserve/assign PEs when pcibios_setup_bridge() is
> called. The function won't be called for root bus as it doesn't
> have parent bridge. However, the root bus still needs a PE to be
> covered.
>
> This reserves PE numbers that are adjacent to the reserved one
> for root buses.
Somewhere in the patchset you need to describe why you need a separate PE 
for a root bus and why reserved_pe_idx is not enough for this.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++++++++++++++++++++++---------
>   arch/powerpc/platforms/powernv/pci.h      |  1 +
>   2 files changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index eea1c96..5e6745f 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -207,14 +207,14 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>   	set_bit(phb->ioda.m64_bar_idx, &phb->ioda.m64_bar_alloc);
>
>   	/*
> -	 * Strip off the segment used by the reserved PE, which is
> -	 * expected to be 0 or last one of PE capabicity.
> +	 * Exclude the segments for reserved and root bus PE, which
> +	 * are first or last two PEs.
>   	 */
>   	r = &phb->hose->mem_resources[1];
>   	if (phb->ioda.reserved_pe_idx == 0)
> -		r->start += phb->ioda.m64_segsize;
> +		r->start += (2 * phb->ioda.m64_segsize);
>   	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
> -		r->end -= phb->ioda.m64_segsize;
> +		r->end -= (2 * phb->ioda.m64_segsize);
>   	else
>   		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
>   			phb->ioda.reserved_pe_idx);
> @@ -294,14 +294,14 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>   	}
>
>   	/*
> -	 * Exclude the segment used by the reserved PE, which
> -	 * is expected to be 0 or last supported PE#.
> +	 * Exclude the segments for reserved and root bus PE, which
> +	 * are first or last two PEs.
>   	 */
>   	r = &phb->hose->mem_resources[1];
>   	if (phb->ioda.reserved_pe_idx == 0)
> -		r->start += phb->ioda.m64_segsize;
> +		r->start += (2 * phb->ioda.m64_segsize);
>   	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
> -		r->end -= phb->ioda.m64_segsize;
> +		r->end -= (2 * phb->ioda.m64_segsize);
>   	else
>   		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>   			phb->ioda.reserved_pe_idx);
> @@ -3231,7 +3231,22 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>   	}
>   	phb->ioda.pe_array = aux + pemap_off;
> -	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
> +
> +	/*
> +	 * Choose PE number for root bus, which shouldn't have
> +	 * M64 resources consumed by its child devices. To pick
> +	 * the PE number adjacent to the reserved one if possible.
> +	 */
> +	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe_idx);
> +	if (phb->ioda.reserved_pe_idx == 0) {
> +		phb->ioda.root_pe_idx = 1;
> +		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
> +	} else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1)) {
> +		phb->ioda.root_pe_idx = phb->ioda.reserved_pe_idx - 1;
> +		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
> +	} else {
> +		phb->ioda.root_pe_idx = IODA_INVALID_PE;
> +	}
>
>   	INIT_LIST_HEAD(&phb->ioda.pe_list);
>   	mutex_init(&phb->ioda.pe_list_mutex);
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index e55ab0e..a8ba97f 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -120,6 +120,7 @@ struct pnv_phb {
>   			/* Global bridge info */
>   			unsigned int		total_pe_num;
>   			unsigned int		reserved_pe_idx;
> +			unsigned int		root_pe_idx;
>
>   			/* 32-bit MMIO window */
>   			unsigned int		m32_size;
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
  2015-11-04 13:12 ` [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
@ 2015-11-17  7:57   ` Alexey Kardashevskiy
  2015-11-17  9:12     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17  7:57 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> Currently, the PEs and their associated resources are assigned
> in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The
> function is called for once after PCI probing and resources
> assignment is completed. So it isn't hotplug friendly.
>
> This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
> is called on the event during system bootup and PCI hotplug: updating
> PCI bridge's windows after resource assignment/reassignment are done.
> For partial hotplug case, where not all PCI devices belonging to the
> PE are unplugged and plugged again, we just need unbinding/binding
> the affected PCI devices with the corresponding PE without creating
> new one.
>
> As there is no upstream bridge for root bus that needs to be covered
> by PE,
Does "that needs" part relate to a root bus or a an upstream bridge?
> we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
> before any other PEs can be created, as PE for root bus is the ancestor
> to anyone else.
>
> On the other hand, the windows of root port or the upstream port
s/On the other hand, /Also/ ?
> of PCIe switch behind root port are extended to be PHB's aperatuses
apertures?
> to accommodate the additonal resources needed by newly plugged devices
s/additonal/additional
> based on the fact: hotpluggable slot is behind root port or downstream
> port of the PCIe switch behind root port. The extension for those
> PCI brdiges' windows is done in ppc_md.pcibios_setup_bridge() as
> well.
I find it quite difficult to separate "cut-n-paste" changes from functional 
changes... May be it is just me.
I would suggest splitting this patch into several. First define the 
setup_bridge() callback, then rework pnv_pci_ioda_setup_PEs(),
pnv_pci_ioda_setup_seg(), pnv_pci_ioda_setup_DMA(), and then add "partial 
hotplug" handling may be.
Or just get "reviewed-by" from Ben :)
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 240 +++++++++++++++++-------------
>   arch/powerpc/platforms/powernv/pci.h      |   1 +
>   2 files changed, 138 insertions(+), 103 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 5e6745f..0bb0056 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -975,6 +975,15 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   				pci_name(dev));
>   			continue;
>   		}
> +
> +		/*
> +		 * In partial hotplug case, the PCI device might be still
> +		 * associated with the PE and needn't be attached to the
> +		 * PE again.
> +		 */
> +		if (pdn->pe_number != IODA_INVALID_PE)
> +			continue;
> +
>   		pdn->pe_number = pe->pe_number;
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>   			pnv_ioda_setup_same_PE(dev->subordinate, pe);
> @@ -992,9 +1001,26 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
>   	struct pnv_ioda_pe *pe = NULL;
> +	int pe_num;
> +
> +	/*
> +	 * In partial hotplug case, the PE instance might be still alive.
> +	 * We should reuse it instead of allocating a new one.
> +	 */
> +	pe_num = phb->ioda.pe_rmap[bus->number << 8];
> +	if (pe_num != IODA_INVALID_PE) {
> +		pe = &phb->ioda.pe_array[pe_num];
> +		pnv_ioda_setup_same_PE(bus, pe);
> +		return NULL;
> +	}
> +
> +	/* PE number for root bus should have been reserved */
> +	if (pci_is_root_bus(bus) &&
> +	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
> +		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
>
>   	/* Check if PE is determined by M64 */
> -	if (phb->pick_m64_pe)
> +	if (!pe && phb->pick_m64_pe)
>   		pe = phb->pick_m64_pe(bus, all);
>
>   	/* The PE number isn't pinned by M64 */
> @@ -1036,46 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	return pe;
>   }
>
> -static void pnv_ioda_setup_PEs(struct pci_bus *bus)
> -{
> -	struct pci_dev *dev;
> -
> -	pnv_ioda_setup_bus_PE(bus, false);
> -
> -	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		if (dev->subordinate) {
> -			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
> -				pnv_ioda_setup_bus_PE(dev->subordinate, true);
> -			else
> -				pnv_ioda_setup_PEs(dev->subordinate);
> -		}
> -	}
> -}
> -
> -/*
> - * Configure PEs so that the downstream PCI buses and devices
> - * could have their associated PE#. Unfortunately, we didn't
> - * figure out the way to identify the PLX bridge yet. So we
> - * simply put the PCI bus and the subordinate behind the root
> - * port to PE# here. The game rule here is expected to be changed
> - * as soon as we can detected PLX bridge correctly.
> - */
> -static void pnv_pci_ioda_setup_PEs(void)
> -{
> -	struct pci_controller *hose, *tmp;
> -	struct pnv_phb *phb;
> -
> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		phb = hose->private_data;
> -
> -		/* M64 layout might affect PE allocation */
> -		if (phb->reserve_m64_pe)
> -			phb->reserve_m64_pe(hose->bus, NULL, true);
> -
> -		pnv_ioda_setup_PEs(hose->bus);
> -	}
> -}
> -
>   #ifdef CONFIG_PCI_IOV
>   static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>   {
> @@ -2391,8 +2377,13 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>   static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   				       struct pnv_ioda_pe *pe)
>   {
> +	unsigned int weight;
>   	int64_t rc;
>
> +	weight = pnv_pci_ioda_pe_dma_weight(pe);
> +	if (!weight)
> +		return;
> +
>   	/* TVE #1 is selected by PCI address bit 59 */
>   	pe->tce_bypass_base = 1ull << 59;
>
> @@ -2424,33 +2415,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>   }
>
> -static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
> -{
> -	struct pnv_ioda_pe *pe;
> -
> -	pnv_pci_ioda_setup_opal_tce_kill(phb);
> -
> -	list_for_each_entry(pe, &phb->ioda.pe_list, list)
> -		pnv_pci_ioda1_setup_dma_pe(phb, pe);
> -}
> -
> -static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
> -{
> -	struct pnv_ioda_pe *pe;
> -	unsigned int weight;
> -
> -	pnv_pci_ioda_setup_opal_tce_kill(phb);
> -
> -	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> -		weight = pnv_pci_ioda_pe_dma_weight(pe);
> -		if (!weight)
> -			continue;
> -
> -		pe_info(pe, "Assign DMA32 space\n");
> -		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> -	}
> -}
> -
>   #ifdef CONFIG_PCI_MSI
>   static void pnv_ioda2_msi_eoi(struct irq_data *d)
>   {
> @@ -2914,37 +2878,6 @@ static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
>   	}
>   }
>
> -static void pnv_pci_ioda_setup_seg(void)
> -{
> -	struct pci_controller *tmp, *hose;
> -	struct pnv_phb *phb;
> -	struct pnv_ioda_pe *pe;
> -
> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		phb = hose->private_data;
> -		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> -			pnv_ioda_setup_pe_seg(pe);
> -		}
> -	}
> -}
> -
> -static void pnv_pci_ioda_setup_DMA(void)
> -{
> -	struct pci_controller *hose, *tmp;
> -	struct pnv_phb *phb;
> -
> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		phb = hose->private_data;
> -		if (phb->type == PNV_PHB_IODA1)
> -			pnv_pci_ioda1_setup_dma(phb);
> -		else
> -			pnv_pci_ioda2_setup_dma(phb);
> -
> -		/* Mark the PHB initialization done */
> -		phb->initialized = 1;
> -	}
> -}
> -
>   static void pnv_pci_ioda_create_dbgfs(void)
>   {
>   #ifdef CONFIG_DEBUG_FS
> @@ -2955,6 +2888,9 @@ static void pnv_pci_ioda_create_dbgfs(void)
>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>   		phb = hose->private_data;
>
> +		/* Notify initialization of PHB done */
> +		phb->initialized = 1;
> +
>   		sprintf(name, "PCI%04x", hose->global_number);
>   		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
>   		if (!phb->dbgfs)
> @@ -2966,10 +2902,6 @@ static void pnv_pci_ioda_create_dbgfs(void)
>
>   static void pnv_pci_ioda_fixup(void)
>   {
> -	pnv_pci_ioda_setup_PEs();
> -	pnv_pci_ioda_setup_seg();
> -	pnv_pci_ioda_setup_DMA();
> -
>   	pnv_pci_ioda_create_dbgfs();
>
>   #ifdef CONFIG_EEH
> @@ -3019,6 +2951,104 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>   	return phb->ioda.io_segsize;
>   }
>
> +/*
> + * We are updating root port or the upstream port of the
> + * bridge behind the root port with PHB's windows in order
> + * to accommodate the changes on required resources during
> + * PCI (slot) hotplug, which is connected to either root
> + * port or the downstream ports of PCIe switch behind the
> + * root port.
> + */
> +static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
> +					   unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dev *bridge = bus->self;
> +	struct resource *r, *w;
> +	int i;
> +
> +	/* Check if we need apply fixup to the bridge's windows */
> +	if (!pci_is_root_bus(bridge->bus) &&
> +	    !pci_is_root_bus(bridge->bus->self->bus))
> +		return;
> +
> +	/* Fixup the resoureces */
s/resoureces/resources/
> +	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
> +		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
> +		if (!r->flags || !r->parent)
> +			continue;
> +
> +		w = NULL;
> +		if (r->flags & type & IORESOURCE_IO)
> +			w = &hose->io_resource;
> +		else if (pnv_pci_is_mem_pref_64(r->flags) &&
> +			 (type & IORESOURCE_PREFETCH) &&
> +			 phb->ioda.m64_segsize)
> +			w = &hose->mem_resources[1];
> +		else if (r->flags & type & IORESOURCE_MEM)
> +			w = &hose->mem_resources[0];
> +
> +		r->start = w->start;
> +		r->end = w->end;
> +	}
> +}
> +
> +static void pnv_pci_setup_bridge(struct pci_bus *bus,
> +				 unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dev *bridge = bus->self;
> +	struct pnv_ioda_pe *pe;
> +	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
> +
> +	 /* The PE for root bus should be realized before any one else */
> +	if (!phb->ioda.root_pe_populated) {
> +		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
> +		if (pe) {
> +			phb->ioda.root_pe_idx = pe->pe_number;
> +			phb->ioda.root_pe_populated = true;
> +		}
> +	}
> +
> +	/* Extend bridge's windows if necessary */
> +	pnv_pci_fixup_bridge_resources(bus, type);
> +
> +	/* Don't assign PE to PCI bus, which doesn't have subordinate devices */
> +	if (list_empty(&bus->devices))
> +		return;
> +
> +	/* Reserve PEs according to used M64 resources */
> +	if (phb->reserve_m64_pe)
> +		phb->reserve_m64_pe(bus, NULL, all);
> +
> +	/*
> +	 * Assign PE. We might run here because of partial hotplug.
> +	 * For the case, we just pick up the existing PE and should
> +	 * not allocate resources again.
> +	 */
> +	pe = pnv_ioda_setup_bus_PE(bus, all);
> +	if (!pe)
> +		return;
> +
> +	/* Setup MMIO mapping */
> +	pnv_ioda_setup_pe_seg(pe);
> +
> +	/* Setup DMA */
> +	switch (phb->type) {
> +	case PNV_PHB_IODA1:
> +		pnv_pci_ioda1_setup_dma_pe(phb, pe);
> +		break;
> +	case PNV_PHB_IODA2:
> +		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> +		break;
> +	default:
> +		pr_warn("%s: No DMA for PHB#%d (type %d)\n",
> +			__func__, phb->hose->global_number, phb->type);
> +	}
> +}
> +
>   #ifdef CONFIG_PCI_IOV
>   static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>   						      int resno)
> @@ -3095,6 +3125,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>   #endif
>   	.enable_device_hook	= pnv_pci_enable_device_hook,
>   	.window_alignment	= pnv_pci_window_alignment,
> +	.setup_bridge		= pnv_pci_setup_bridge,
>   	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>   	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
>   	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
> @@ -3168,6 +3199,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	if (phb->regs == NULL)
>   		pr_err("  Failed to map registers !\n");
>
> +	/* Initialize TCE kill register */
> +	pnv_pci_ioda_setup_opal_tce_kill(phb);
> +
>   	/* Initialize more IODA stuff */
>   	phb->ioda.total_pe_num = 1;
>   	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index a8ba97f..ef5271a 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -121,6 +121,7 @@ struct pnv_phb {
>   			unsigned int		total_pe_num;
>   			unsigned int		reserved_pe_idx;
>   			unsigned int		root_pe_idx;
> +			bool			root_pe_populated;
>
>   			/* 32-bit MMIO window */
>   			unsigned int		m32_size;
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  4:43           ` Alexey Kardashevskiy
@ 2015-11-17  8:44             ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  8:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 03:43:28PM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 02:04 PM, Gavin Shan wrote:
>>On Tue, Nov 17, 2015 at 01:37:22PM +1100, Alexey Kardashevskiy wrote:
>>>On 11/17/2015 12:42 PM, Gavin Shan wrote:
>>>>On Mon, Nov 16, 2015 at 07:02:18PM +1100, Alexey Kardashevskiy wrote:
>>>>>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>>>>>Different from PHB3 where 16 M64 BARs are supported and each of
>>>>>>them can be owned by one particular PE# exclusively or divided
>>>>>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>>>>>of them are divided to 8 segments. So every P7IOC PHB supports
>>>>>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>>>>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>>>>>M64DT, indicating that one M64 segment can only be pinned to the
>>>>>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>>>>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>>>>>of them is pinned to the fixed PE# by bypassing the function of
>>>>>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>>>>>and PHB3 to support M64.
>>>>>
>>>>>I thought we decided (Ben suggested?) not to push P7IOC code now (or ever) as
>>>>>there is no user for it, has this changed?
>>>>>
>>>>
>>>>Remember that the code is mixed for P7IOC/PHB3. It's not harmful to support
>>>>M64 window on P7IOC, which is much larger than M32.
>>>
>>>
>>>The patchset starts with removing dead code and then adds more dead code.
>>>This is not right...
>>>
>>
>>Sorry, you mean it's fine to break the code on P7IOC as it's going to be dead.
>
>
>I am saying that the _new_ code which implements PCI hotplug on P7IOC is
>dead, not the existing P7IOC support which needs to keep working. Reworks you
>make should keep P7IOC alive but they do not have to add hotplug.
>
>imho it is more likely that we drop P7IOC support in the mainline kernel in
>next 5 years than someone plugs a PCI device to a running P7IOC machine
>anywhere.
>
This patchset isn't supporting PCI hotplug on P7IOC. At the code is mixed
for PHB3/P7IOC, I have to make some changes for P7IOC in order to support
PCI hotplug on PHB3. It's one of the reason that I introduced M64 support
for P7IOC. Another readon is M32 has limited space size (2GB) on P7IOC, one
adapter with more than 2GB memory resource will fail to work on P7IOC. So
it's still nice to support M64 on P7IOC.
>>But I'm curious when it's going happen, any idea about that?
>>
>>>>>btw please put ioda1/ioda2/p7ioc/etc to the subject line to make it easier to
>>>>>see how much work is there about particular PHB type. You rename quite many
>>>>>functions and I generally want to ask you to group all renaming patches first
>>>>>but it would also make sense to keep them close to (for example)
>>>>>p7ioc-related patches so having more descriptive subject lines may help.
>>>>>Thanks.
>>>>>
>>>>
>>>>As the code is mixed for P7IOC/PHB3, I'm not following the line (IODA1/IODA2/p7ioc/phb3)
>>>>in this patchset.
>>>
>>>But you should draw the bold line between PHB types imho.
>>>
>>>>Instead, the sequence of patchset is order related to: cod refactoring,
>>>>IO/M32/M64, DMA, PE allocation/releaseing.
>>>
>>>
>>>Some patches from this patchset are about P7IOC only. All I am asking is to
>>>say specifically in the subject line what the patch touches -
>>>IODA1/IODA2/p7ioc/phb3/all_of_them. Or I can walk through all of them, pick
>>>P7IOC's ones, evaluate the amount of code and entropy they actually add and
>>>then ask Ben what we do about it, it will just take longer rather than if you
>>>did it.
>>>
>>
>>Please give me a clear command what key words you need in the subject in next revision.
>>What I understood is you want to see one of them:
>>
>>powerpc/powernv/ioda1:
>
>Yes, looks good.
>
>>powerpc/powernv/ioda2:
>
>
>Yes.
>
>>powerpc/powernv/all:
>
>Just "powerpc/powernv".
>
Thanks for confirm, I'll put them into the subject in next revision.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3
  2015-11-17  1:07   ` Alexey Kardashevskiy
@ 2015-11-17  8:48     ` Gavin Shan
  2015-11-17 23:59       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  8:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 12:07:17PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>In pnv_ioda_setup_dma(), it's unnecessary to calculate the DMA32
>>segments for PEs on PHB3 as the whole available DMA32 space can
>>be assigned to one specific PE on PHB3.
>>
>>This splits pnv_ioda_setup_dma() to pnv_pci_ioda1_setup_dma() and
>>pnv_pci_ioda2_setup_dma() in order to avoid calculating DMA32
>>segments for PEs on PHB3. No logical changes introduced.
>
>
>This patch is not needed as
>
>[PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation
>
>moves this calculation to another place (which already makes this patch
>unnecessary) and
>
I don't follow your comments, can you tell me how to split/merge the patches?
>[PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
>
>removes just introduced pnv_pci_ioda1_setup_dma() - if you remove it, then
>there is no point in fixing it in the first place.
>
This function isn't removed in 26/50, could you double check?
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 41 ++++++++++++++++++-------------
>>  1 file changed, 24 insertions(+), 17 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 5a08e20..4c2e023 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2383,7 +2383,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>  }
>>
>>-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>+static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>  	unsigned int residual, remaining, segs, tw, base;
>>@@ -2428,26 +2428,30 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  				segs = remaining;
>>  		}
>>
>>-		/*
>>-		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>-		 * The all available 32-bits DMA space will be assigned to
>>-		 * the specific PE.
>>-		 */
>>-		if (phb->type == PNV_PHB_IODA1) {
>>-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>-				pe->dma_weight, segs);
>>-			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>-		} else {
>>-			pe_info(pe, "Assign DMA32 space\n");
>>-			segs = 0;
>>-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>-		}
>>+		pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>+			pe->dma_weight, segs);
>>+		pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>
>>  		remaining -= segs;
>>  		base += segs;
>>  	}
>>  }
>>
>>+static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
>>+{
>>+	struct pnv_ioda_pe *pe;
>>+
>>+	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>+
>>+	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>+		if (!pe->dma_weight)
>>+			continue;
>>+
>>+		pe_info(pe, "Assign DMA32 space\n");
>>+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>+	}
>>+}
>>+
>>  #ifdef CONFIG_PCI_MSI
>>  static void pnv_ioda2_msi_eoi(struct irq_data *d)
>>  {
>>@@ -2931,10 +2935,13 @@ static void pnv_pci_ioda_setup_DMA(void)
>>  	struct pnv_phb *phb;
>>
>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		pnv_ioda_setup_dma(hose->private_data);
>>+		phb = hose->private_data;
>>+		if (phb->type == PNV_PHB_IODA1)
>>+			pnv_pci_ioda1_setup_dma(phb);
>>+		else
>>+			pnv_pci_ioda2_setup_dma(phb);
>>
>>  		/* Mark the PHB initialization done */
>>-		phb = hose->private_data;
>>  		phb->initialized = 1;
>>  	}
>>  }
>>
>
>
>-- 
>Alexey
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release
  2015-11-17  5:08   ` Alexey Kardashevskiy
@ 2015-11-17  9:03     ` Gavin Shan
  2015-11-18  0:13       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  9:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, devicetree, linux-pci, panto,
	grant.likely, robherring2, bhelgaas, frowand.list
On Tue, Nov 17, 2015 at 04:08:30PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>In current implementation, the PEs that are allocated or picked
>>from the reserved list are identified by PE number. The PE instance
>>has to be picked according to the PE number eventually. We have
>>same issue when PE is released.
>>
>>For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
>>PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
>>or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
>>returns the reserved/allocated PE instance to be used in subsequent
>>patches. On the other hand, pnv_ioda_free_pe() uses PE instance
>>(not number) as its argument. No logical changes introduced.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++++++++++++++++--------------
>>  arch/powerpc/platforms/powernv/pci.h      |  2 +-
>>  2 files changed, 46 insertions(+), 37 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 488e0f8..ae82df1 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -152,7 +152,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>  	pnv_ioda_init_pe(phb, pe_no);
>>  }
>>
>>-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  {
>>  	unsigned long pe;
>>
>>@@ -160,19 +160,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>  					phb->ioda.total_pe_num, 0);
>>  		if (pe >= phb->ioda.total_pe_num)
>>-			return IODA_INVALID_PE;
>>+			return NULL;
>>  	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>
>>-	pnv_ioda_init_pe(phb, pe);
>>-	return pe;
>>+	return pnv_ioda_init_pe(phb, pe);
>>  }
>>
>>-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>>  {
>>-	WARN_ON(phb->ioda.pe_array[pe].pdev);
>>+	struct pnv_phb *phb = pe->phb;
>>+
>>+	WARN_ON(pe->pdev);
>>
>>-	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
>>-	clear_bit(pe, phb->ioda.pe_alloc);
>>+	memset(pe, 0, sizeof(struct pnv_ioda_pe));
>>+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
>>  }
>>
>>  /* The default M64 BAR is shared by all PEs */
>>@@ -332,7 +333,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>  	}
>>  }
>>
>>-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  {
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>@@ -342,7 +343,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>
>>  	/* Root bus shouldn't use M64 */
>>  	if (pci_is_root_bus(bus))
>>-		return IODA_INVALID_PE;
>>+		return NULL;
>>
>>  	/* Allocate bitmap */
>>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>@@ -350,7 +351,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	if (!pe_alloc) {
>>  		pr_warn("%s: Out of memory !\n",
>>  			__func__);
>>-		return IODA_INVALID_PE;
>>+		return NULL;
>>  	}
>>
>>  	/* Figure out reserved PE numbers by the PE */
>>@@ -363,7 +364,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	 */
>>  	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>>  		kfree(pe_alloc);
>>-		return IODA_INVALID_PE;
>>+		return NULL;
>>  	}
>>
>>  	/*
>>@@ -409,7 +410,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	}
>>
>>  	kfree(pe_alloc);
>>-	return master_pe->pe_number;
>>+	return master_pe;
>>  }
>>
>>  static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>@@ -988,28 +989,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>   * subordinate PCI devices and buses. The second type of PE is normally
>>   * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>>   */
>>-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  {
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>-	struct pnv_ioda_pe *pe;
>>-	int pe_num = IODA_INVALID_PE;
>>+	struct pnv_ioda_pe *pe = NULL;
>>
>>  	/* Check if PE is determined by M64 */
>>  	if (phb->pick_m64_pe)
>>-		pe_num = phb->pick_m64_pe(bus, all);
>>+		pe = phb->pick_m64_pe(bus, all);
>>
>>  	/* The PE number isn't pinned by M64 */
>>-	if (pe_num == IODA_INVALID_PE)
>>-		pe_num = pnv_ioda_alloc_pe(phb);
>>+	if (!pe)
>>+		pe = pnv_ioda_alloc_pe(phb);
>>
>>-	if (pe_num == IODA_INVALID_PE) {
>>+	if (!pe) {
>>  		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
>>  			__func__, pci_domain_nr(bus), bus->number);
>>-		return;
>>+		return NULL;
>>  	}
>>
>>-	pe = &phb->ioda.pe_array[pe_num];
>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  	pe->pbus = bus;
>>  	pe->pdev = NULL;
>>@@ -1018,17 +1017,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>
>>  	if (all)
>>  		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>-			bus->busn_res.start, bus->busn_res.end, pe_num);
>>+			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>>  	else
>>  		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
>>-			bus->busn_res.start, pe_num);
>>+			bus->busn_res.start, pe->pe_number);
>>
>>  	if (pnv_ioda_configure_pe(phb, pe)) {
>>  		/* XXX What do we do here ? */
>>-		if (pe_num)
>>-			pnv_ioda_free_pe(phb, pe_num);
>>+		pnv_ioda_free_pe(pe);
>>  		pe->pbus = NULL;
>>-		return;
>>+		return NULL;
>>  	}
>>
>>  	/* Associate it with all child devices */
>>@@ -1036,6 +1034,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>
>>  	/* Put PE to the list */
>>  	list_add_tail(&pe->list, &phb->ioda.pe_list);
>>+
>>+	return pe;
>>  }
>>
>>  static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>@@ -1267,7 +1267,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>
>>  		pnv_ioda_deconfigure_pe(phb, pe);
>>
>>-		pnv_ioda_free_pe(phb, pe->pe_number);
>>+		pnv_ioda_free_pe(pe);
>>  	}
>>  }
>>
>>@@ -1276,6 +1276,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>  	struct pci_bus        *bus;
>>  	struct pci_controller *hose;
>>  	struct pnv_phb        *phb;
>>+	struct pnv_ioda_pe    *pe;
>>  	struct pci_dn         *pdn;
>>  	struct pci_sriov      *iov;
>>  	u16                    num_vfs, i;
>>@@ -1300,8 +1301,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>  		/* Release PE numbers */
>>  		if (pdn->m64_single_mode) {
>>  			for (i = 0; i < num_vfs; i++) {
>>-				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>-					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>+				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
>>+					continue;
>>+
>>+				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
>>+				pnv_ioda_free_pe(pe);
>>  			}
>>  		} else
>>  			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>@@ -1354,9 +1358,8 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>
>>  		if (pnv_ioda_configure_pe(phb, pe)) {
>>  			/* XXX What do we do here ? */
>>-			if (pe_num)
>>-				pnv_ioda_free_pe(phb, pe_num);
>>  			pe->pdev = NULL;
>>+			pnv_ioda_free_pe(pe);
>
>
>
>pnv_ioda_free_pe() does WARN_ON(pdev). Before this patch you would free PE
>first and then reset pe->pdev, now you reset it first, then call
>pnv_ioda_free_pe(). This change is not just about "Use PE instead of number
>during setup and release", is/was that a bug?
>
>And I fail to see when pe->pdev could get initialized in
>pnv_ioda_configure_pe() as pnv_pci_dma_dev_setup() should not be called while
>pnv_ioda_setup_vf_PE() is working.
>
It wasn't or isn't a bug as pe->pdev is initialized in arch/powerpc/platform/powernv/pci.c::
pnv_pci_dma_dev_setup()
>
>>  			continue;
>>  		}
>>
>>@@ -1374,6 +1377,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  	struct pci_bus        *bus;
>>  	struct pci_controller *hose;
>>  	struct pnv_phb        *phb;
>>+	struct pnv_ioda_pe    *pe;
>>  	struct pci_dn         *pdn;
>>  	int                    ret;
>>  	u16                    i;
>>@@ -1416,11 +1420,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  		/* Calculate available PE for required VFs */
>>  		if (pdn->m64_single_mode) {
>>  			for (i = 0; i < num_vfs; i++) {
>>-				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
>>-				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
>>+				pe = pnv_ioda_alloc_pe(phb);
>>+				if (!pe) {
>>  					ret = -EBUSY;
>>  					goto m64_failed;
>>  				}
>>+
>>+				pdn->pe_num_map[i] = pe->pe_number;
>>  			}
>>  		} else {
>>  			mutex_lock(&phb->ioda.pe_alloc_mutex);
>>@@ -1465,8 +1471,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  m64_failed:
>>  	if (pdn->m64_single_mode) {
>>  		for (i = 0; i < num_vfs; i++) {
>>-			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>-				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>+			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
>>+				continue;
>>+
>>+			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
>>+			pnv_ioda_free_pe(pe);
>>  		}
>>  	} else
>>  		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 5df945f..e55ab0e 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -105,7 +105,7 @@ struct pnv_phb {
>>  	int (*init_m64)(struct pnv_phb *phb);
>>  	void (*reserve_m64_pe)(struct pci_bus *bus,
>>  			       unsigned long *pe_bitmap, bool all);
>>-	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
>>+	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
>>  	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>>  	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>>  	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus
  2015-11-17  6:04   ` Alexey Kardashevskiy
@ 2015-11-17  9:06     ` Gavin Shan
  2015-11-19  0:21       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  9:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 05:04:42PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>We're going to reserve/assign PEs when pcibios_setup_bridge() is
>>called. The function won't be called for root bus as it doesn't
>>have parent bridge. However, the root bus still needs a PE to be
>>covered.
>>
>>This reserves PE numbers that are adjacent to the reserved one
>>for root buses.
>
>
>Somewhere in the patchset you need to describe why you need a separate PE for
>a root bus and why reserved_pe_idx is not enough for this.
>
Please confirm if it's fine to add the descrption in this patch's chagelog.
>
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++++++++++++++++++++++---------
>>  arch/powerpc/platforms/powernv/pci.h      |  1 +
>>  2 files changed, 25 insertions(+), 9 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index eea1c96..5e6745f 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -207,14 +207,14 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>>  	set_bit(phb->ioda.m64_bar_idx, &phb->ioda.m64_bar_alloc);
>>
>>  	/*
>>-	 * Strip off the segment used by the reserved PE, which is
>>-	 * expected to be 0 or last one of PE capabicity.
>>+	 * Exclude the segments for reserved and root bus PE, which
>>+	 * are first or last two PEs.
>>  	 */
>>  	r = &phb->hose->mem_resources[1];
>>  	if (phb->ioda.reserved_pe_idx == 0)
>>-		r->start += phb->ioda.m64_segsize;
>>+		r->start += (2 * phb->ioda.m64_segsize);
>>  	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>-		r->end -= phb->ioda.m64_segsize;
>>+		r->end -= (2 * phb->ioda.m64_segsize);
>>  	else
>>  		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
>>  			phb->ioda.reserved_pe_idx);
>>@@ -294,14 +294,14 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>  	}
>>
>>  	/*
>>-	 * Exclude the segment used by the reserved PE, which
>>-	 * is expected to be 0 or last supported PE#.
>>+	 * Exclude the segments for reserved and root bus PE, which
>>+	 * are first or last two PEs.
>>  	 */
>>  	r = &phb->hose->mem_resources[1];
>>  	if (phb->ioda.reserved_pe_idx == 0)
>>-		r->start += phb->ioda.m64_segsize;
>>+		r->start += (2 * phb->ioda.m64_segsize);
>>  	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>-		r->end -= phb->ioda.m64_segsize;
>>+		r->end -= (2 * phb->ioda.m64_segsize);
>>  	else
>>  		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>  			phb->ioda.reserved_pe_idx);
>>@@ -3231,7 +3231,22 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  			phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>>  	}
>>  	phb->ioda.pe_array = aux + pemap_off;
>>-	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>+
>>+	/*
>>+	 * Choose PE number for root bus, which shouldn't have
>>+	 * M64 resources consumed by its child devices. To pick
>>+	 * the PE number adjacent to the reserved one if possible.
>>+	 */
>>+	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe_idx);
>>+	if (phb->ioda.reserved_pe_idx == 0) {
>>+		phb->ioda.root_pe_idx = 1;
>>+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
>>+	} else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1)) {
>>+		phb->ioda.root_pe_idx = phb->ioda.reserved_pe_idx - 1;
>>+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
>>+	} else {
>>+		phb->ioda.root_pe_idx = IODA_INVALID_PE;
>>+	}
>>
>>  	INIT_LIST_HEAD(&phb->ioda.pe_list);
>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index e55ab0e..a8ba97f 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -120,6 +120,7 @@ struct pnv_phb {
>>  			/* Global bridge info */
>>  			unsigned int		total_pe_num;
>>  			unsigned int		reserved_pe_idx;
>>+			unsigned int		root_pe_idx;
>>
>>  			/* 32-bit MMIO window */
>>  			unsigned int		m32_size;
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
  2015-11-17  7:57   ` Alexey Kardashevskiy
@ 2015-11-17  9:12     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-17  9:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Tue, Nov 17, 2015 at 06:57:20PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>Currently, the PEs and their associated resources are assigned
>>in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The
>>function is called for once after PCI probing and resources
>>assignment is completed. So it isn't hotplug friendly.
>>
>>This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
>>is called on the event during system bootup and PCI hotplug: updating
>>PCI bridge's windows after resource assignment/reassignment are done.
>>For partial hotplug case, where not all PCI devices belonging to the
>>PE are unplugged and plugged again, we just need unbinding/binding
>>the affected PCI devices with the corresponding PE without creating
>>new one.
>>
>>As there is no upstream bridge for root bus that needs to be covered
>>by PE,
>
>
>Does "that needs" part relate to a root bus or a an upstream bridge?
>
root bus.
>>we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
>>before any other PEs can be created, as PE for root bus is the ancestor
>>to anyone else.
>>
>>On the other hand, the windows of root port or the upstream port
>
>s/On the other hand, /Also/ ?
>
It's fine to keep "On the other hand".
>>of PCIe switch behind root port are extended to be PHB's aperatuses
>
>apertures?
>
Will fix in next revision.
>>to accommodate the additonal resources needed by newly plugged devices
>
>s/additonal/additional
>
Will fix in next revision.
>>based on the fact: hotpluggable slot is behind root port or downstream
>>port of the PCIe switch behind root port. The extension for those
>>PCI brdiges' windows is done in ppc_md.pcibios_setup_bridge() as
>>well.
>
>
>I find it quite difficult to separate "cut-n-paste" changes from functional
>changes... May be it is just me.
>I would suggest splitting this patch into several. First define the
>setup_bridge() callback, then rework pnv_pci_ioda_setup_PEs(),
>pnv_pci_ioda_setup_seg(), pnv_pci_ioda_setup_DMA(), and then add "partial
>hotplug" handling may be.
>
>Or just get "reviewed-by" from Ben :)
>
Ok. I'll try to split it accordingly in next revision. Nope, I need your
reviewed-by :-)
>
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 240 +++++++++++++++++-------------
>>  arch/powerpc/platforms/powernv/pci.h      |   1 +
>>  2 files changed, 138 insertions(+), 103 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 5e6745f..0bb0056 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -975,6 +975,15 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  				pci_name(dev));
>>  			continue;
>>  		}
>>+
>>+		/*
>>+		 * In partial hotplug case, the PCI device might be still
>>+		 * associated with the PE and needn't be attached to the
>>+		 * PE again.
>>+		 */
>>+		if (pdn->pe_number != IODA_INVALID_PE)
>>+			continue;
>>+
>>  		pdn->pe_number = pe->pe_number;
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>  			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>@@ -992,9 +1001,26 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>  	struct pnv_ioda_pe *pe = NULL;
>>+	int pe_num;
>>+
>>+	/*
>>+	 * In partial hotplug case, the PE instance might be still alive.
>>+	 * We should reuse it instead of allocating a new one.
>>+	 */
>>+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
>>+	if (pe_num != IODA_INVALID_PE) {
>>+		pe = &phb->ioda.pe_array[pe_num];
>>+		pnv_ioda_setup_same_PE(bus, pe);
>>+		return NULL;
>>+	}
>>+
>>+	/* PE number for root bus should have been reserved */
>>+	if (pci_is_root_bus(bus) &&
>>+	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
>>+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
>>
>>  	/* Check if PE is determined by M64 */
>>-	if (phb->pick_m64_pe)
>>+	if (!pe && phb->pick_m64_pe)
>>  		pe = phb->pick_m64_pe(bus, all);
>>
>>  	/* The PE number isn't pinned by M64 */
>>@@ -1036,46 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	return pe;
>>  }
>>
>>-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>-{
>>-	struct pci_dev *dev;
>>-
>>-	pnv_ioda_setup_bus_PE(bus, false);
>>-
>>-	list_for_each_entry(dev, &bus->devices, bus_list) {
>>-		if (dev->subordinate) {
>>-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
>>-				pnv_ioda_setup_bus_PE(dev->subordinate, true);
>>-			else
>>-				pnv_ioda_setup_PEs(dev->subordinate);
>>-		}
>>-	}
>>-}
>>-
>>-/*
>>- * Configure PEs so that the downstream PCI buses and devices
>>- * could have their associated PE#. Unfortunately, we didn't
>>- * figure out the way to identify the PLX bridge yet. So we
>>- * simply put the PCI bus and the subordinate behind the root
>>- * port to PE# here. The game rule here is expected to be changed
>>- * as soon as we can detected PLX bridge correctly.
>>- */
>>-static void pnv_pci_ioda_setup_PEs(void)
>>-{
>>-	struct pci_controller *hose, *tmp;
>>-	struct pnv_phb *phb;
>>-
>>-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		phb = hose->private_data;
>>-
>>-		/* M64 layout might affect PE allocation */
>>-		if (phb->reserve_m64_pe)
>>-			phb->reserve_m64_pe(hose->bus, NULL, true);
>>-
>>-		pnv_ioda_setup_PEs(hose->bus);
>>-	}
>>-}
>>-
>>  #ifdef CONFIG_PCI_IOV
>>  static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>  {
>>@@ -2391,8 +2377,13 @@ static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>>  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  				       struct pnv_ioda_pe *pe)
>>  {
>>+	unsigned int weight;
>>  	int64_t rc;
>>
>>+	weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+	if (!weight)
>>+		return;
>>+
>>  	/* TVE #1 is selected by PCI address bit 59 */
>>  	pe->tce_bypass_base = 1ull << 59;
>>
>>@@ -2424,33 +2415,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>  }
>>
>>-static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>-{
>>-	struct pnv_ioda_pe *pe;
>>-
>>-	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>-
>>-	list_for_each_entry(pe, &phb->ioda.pe_list, list)
>>-		pnv_pci_ioda1_setup_dma_pe(phb, pe);
>>-}
>>-
>>-static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
>>-{
>>-	struct pnv_ioda_pe *pe;
>>-	unsigned int weight;
>>-
>>-	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>-
>>-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>-		weight = pnv_pci_ioda_pe_dma_weight(pe);
>>-		if (!weight)
>>-			continue;
>>-
>>-		pe_info(pe, "Assign DMA32 space\n");
>>-		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>-	}
>>-}
>>-
>>  #ifdef CONFIG_PCI_MSI
>>  static void pnv_ioda2_msi_eoi(struct irq_data *d)
>>  {
>>@@ -2914,37 +2878,6 @@ static void pnv_ioda_setup_pe_seg(struct pnv_ioda_pe *pe)
>>  	}
>>  }
>>
>>-static void pnv_pci_ioda_setup_seg(void)
>>-{
>>-	struct pci_controller *tmp, *hose;
>>-	struct pnv_phb *phb;
>>-	struct pnv_ioda_pe *pe;
>>-
>>-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		phb = hose->private_data;
>>-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>-			pnv_ioda_setup_pe_seg(pe);
>>-		}
>>-	}
>>-}
>>-
>>-static void pnv_pci_ioda_setup_DMA(void)
>>-{
>>-	struct pci_controller *hose, *tmp;
>>-	struct pnv_phb *phb;
>>-
>>-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		phb = hose->private_data;
>>-		if (phb->type == PNV_PHB_IODA1)
>>-			pnv_pci_ioda1_setup_dma(phb);
>>-		else
>>-			pnv_pci_ioda2_setup_dma(phb);
>>-
>>-		/* Mark the PHB initialization done */
>>-		phb->initialized = 1;
>>-	}
>>-}
>>-
>>  static void pnv_pci_ioda_create_dbgfs(void)
>>  {
>>  #ifdef CONFIG_DEBUG_FS
>>@@ -2955,6 +2888,9 @@ static void pnv_pci_ioda_create_dbgfs(void)
>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>  		phb = hose->private_data;
>>
>>+		/* Notify initialization of PHB done */
>>+		phb->initialized = 1;
>>+
>>  		sprintf(name, "PCI%04x", hose->global_number);
>>  		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
>>  		if (!phb->dbgfs)
>>@@ -2966,10 +2902,6 @@ static void pnv_pci_ioda_create_dbgfs(void)
>>
>>  static void pnv_pci_ioda_fixup(void)
>>  {
>>-	pnv_pci_ioda_setup_PEs();
>>-	pnv_pci_ioda_setup_seg();
>>-	pnv_pci_ioda_setup_DMA();
>>-
>>  	pnv_pci_ioda_create_dbgfs();
>>
>>  #ifdef CONFIG_EEH
>>@@ -3019,6 +2951,104 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>  	return phb->ioda.io_segsize;
>>  }
>>
>>+/*
>>+ * We are updating root port or the upstream port of the
>>+ * bridge behind the root port with PHB's windows in order
>>+ * to accommodate the changes on required resources during
>>+ * PCI (slot) hotplug, which is connected to either root
>>+ * port or the downstream ports of PCIe switch behind the
>>+ * root port.
>>+ */
>>+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
>>+					   unsigned long type)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+	struct pci_dev *bridge = bus->self;
>>+	struct resource *r, *w;
>>+	int i;
>>+
>>+	/* Check if we need apply fixup to the bridge's windows */
>>+	if (!pci_is_root_bus(bridge->bus) &&
>>+	    !pci_is_root_bus(bridge->bus->self->bus))
>>+		return;
>>+
>>+	/* Fixup the resoureces */
>
>
>s/resoureces/resources/
>
Will fix.
>>+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
>>+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
>>+		if (!r->flags || !r->parent)
>>+			continue;
>>+
>>+		w = NULL;
>>+		if (r->flags & type & IORESOURCE_IO)
>>+			w = &hose->io_resource;
>>+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
>>+			 (type & IORESOURCE_PREFETCH) &&
>>+			 phb->ioda.m64_segsize)
>>+			w = &hose->mem_resources[1];
>>+		else if (r->flags & type & IORESOURCE_MEM)
>>+			w = &hose->mem_resources[0];
>>+
>>+		r->start = w->start;
>>+		r->end = w->end;
>>+	}
>>+}
>>+
>>+static void pnv_pci_setup_bridge(struct pci_bus *bus,
>>+				 unsigned long type)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+	struct pci_dev *bridge = bus->self;
>>+	struct pnv_ioda_pe *pe;
>>+	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
>>+
>>+	 /* The PE for root bus should be realized before any one else */
>>+	if (!phb->ioda.root_pe_populated) {
>>+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
>>+		if (pe) {
>>+			phb->ioda.root_pe_idx = pe->pe_number;
>>+			phb->ioda.root_pe_populated = true;
>>+		}
>>+	}
>>+
>>+	/* Extend bridge's windows if necessary */
>>+	pnv_pci_fixup_bridge_resources(bus, type);
>>+
>>+	/* Don't assign PE to PCI bus, which doesn't have subordinate devices */
>>+	if (list_empty(&bus->devices))
>>+		return;
>>+
>>+	/* Reserve PEs according to used M64 resources */
>>+	if (phb->reserve_m64_pe)
>>+		phb->reserve_m64_pe(bus, NULL, all);
>>+
>>+	/*
>>+	 * Assign PE. We might run here because of partial hotplug.
>>+	 * For the case, we just pick up the existing PE and should
>>+	 * not allocate resources again.
>>+	 */
>>+	pe = pnv_ioda_setup_bus_PE(bus, all);
>>+	if (!pe)
>>+		return;
>>+
>>+	/* Setup MMIO mapping */
>>+	pnv_ioda_setup_pe_seg(pe);
>>+
>>+	/* Setup DMA */
>>+	switch (phb->type) {
>>+	case PNV_PHB_IODA1:
>>+		pnv_pci_ioda1_setup_dma_pe(phb, pe);
>>+		break;
>>+	case PNV_PHB_IODA2:
>>+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>+		break;
>>+	default:
>>+		pr_warn("%s: No DMA for PHB#%d (type %d)\n",
>>+			__func__, phb->hose->global_number, phb->type);
>>+	}
>>+}
>>+
>>  #ifdef CONFIG_PCI_IOV
>>  static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>  						      int resno)
>>@@ -3095,6 +3125,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>>  #endif
>>  	.enable_device_hook	= pnv_pci_enable_device_hook,
>>  	.window_alignment	= pnv_pci_window_alignment,
>>+	.setup_bridge		= pnv_pci_setup_bridge,
>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>  	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
>>  	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
>>@@ -3168,6 +3199,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (phb->regs == NULL)
>>  		pr_err("  Failed to map registers !\n");
>>
>>+	/* Initialize TCE kill register */
>>+	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>+
>>  	/* Initialize more IODA stuff */
>>  	phb->ioda.total_pe_num = 1;
>>  	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index a8ba97f..ef5271a 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -121,6 +121,7 @@ struct pnv_phb {
>>  			unsigned int		total_pe_num;
>>  			unsigned int		reserved_pe_idx;
>>  			unsigned int		root_pe_idx;
>>+			bool			root_pe_populated;
>>
>>  			/* 32-bit MMIO window */
>>  			unsigned int		m32_size;
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3
  2015-11-17  8:48     ` Gavin Shan
@ 2015-11-17 23:59       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-17 23:59 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 07:48 PM, Gavin Shan wrote:
> On Tue, Nov 17, 2015 at 12:07:17PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> In pnv_ioda_setup_dma(), it's unnecessary to calculate the DMA32
>>> segments for PEs on PHB3 as the whole available DMA32 space can
>>> be assigned to one specific PE on PHB3.
>>>
>>> This splits pnv_ioda_setup_dma() to pnv_pci_ioda1_setup_dma() and
>>> pnv_pci_ioda2_setup_dma() in order to avoid calculating DMA32
>>> segments for PEs on PHB3. No logical changes introduced.
>>
>>
>> This patch is not needed as
>>
>> [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation
>>
>> moves this calculation to another place (which already makes this patch
>> unnecessary) and
>>
>
> I don't follow your comments, can you tell me how to split/merge the patches?
Remove this patch, it is useless. Just do git rebase, remove this one and 
resolve conflicts in the next ones. I would suggest merging it into 26/50 
but I think you'll have conflicts between 17/50 and 26/50 anyway.
>> [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
>>
>> removes just introduced pnv_pci_ioda1_setup_dma() - if you remove it, then
>> there is no point in fixing it in the first place.
>>
>
> This function isn't removed in 26/50, could you double check?
Sure:
[PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time
@@ -2424,33 +2415,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct 
pnv_phb *phb,
  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
  }
-static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
-{
-	struct pnv_ioda_pe *pe;
-
-	pnv_pci_ioda_setup_opal_tce_kill(phb);
-
-	list_for_each_entry(pe, &phb->ioda.pe_list, list)
-		pnv_pci_ioda1_setup_dma_pe(phb, pe);
-}
>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 41 ++++++++++++++++++-------------
>>>   1 file changed, 24 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 5a08e20..4c2e023 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -2383,7 +2383,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>>   		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>>   }
>>>
>>> -static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>> +static void pnv_pci_ioda1_setup_dma(struct pnv_phb *phb)
>>>   {
>>>   	struct pci_controller *hose = phb->hose;
>>>   	unsigned int residual, remaining, segs, tw, base;
>>> @@ -2428,26 +2428,30 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>>   				segs = remaining;
>>>   		}
>>>
>>> -		/*
>>> -		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>> -		 * The all available 32-bits DMA space will be assigned to
>>> -		 * the specific PE.
>>> -		 */
>>> -		if (phb->type == PNV_PHB_IODA1) {
>>> -			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>> -				pe->dma_weight, segs);
>>> -			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>> -		} else {
>>> -			pe_info(pe, "Assign DMA32 space\n");
>>> -			segs = 0;
>>> -			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> -		}
>>> +		pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>> +			pe->dma_weight, segs);
>>> +		pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>>>
>>>   		remaining -= segs;
>>>   		base += segs;
>>>   	}
>>>   }
>>>
>>> +static void pnv_pci_ioda2_setup_dma(struct pnv_phb *phb)
>>> +{
>>> +	struct pnv_ioda_pe *pe;
>>> +
>>> +	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>> +
>>> +	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>> +		if (!pe->dma_weight)
>>> +			continue;
>>> +
>>> +		pe_info(pe, "Assign DMA32 space\n");
>>> +		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> +	}
>>> +}
>>> +
>>>   #ifdef CONFIG_PCI_MSI
>>>   static void pnv_ioda2_msi_eoi(struct irq_data *d)
>>>   {
>>> @@ -2931,10 +2935,13 @@ static void pnv_pci_ioda_setup_DMA(void)
>>>   	struct pnv_phb *phb;
>>>
>>>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>> -		pnv_ioda_setup_dma(hose->private_data);
>>> +		phb = hose->private_data;
>>> +		if (phb->type == PNV_PHB_IODA1)
>>> +			pnv_pci_ioda1_setup_dma(phb);
>>> +		else
>>> +			pnv_pci_ioda2_setup_dma(phb);
>>>
>>>   		/* Mark the PHB initialization done */
>>> -		phb = hose->private_data;
>>>   		phb->initialized = 1;
>>>   	}
>>>   }
>>>
>>
>>
>> --
>> Alexey
>>
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release
  2015-11-17  9:03     ` Gavin Shan
@ 2015-11-18  0:13       ` Alexey Kardashevskiy
  2015-11-22 22:52         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  0:13 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, devicetree, linux-pci, panto, grant.likely,
	robherring2, bhelgaas, frowand.list
On 11/17/2015 08:03 PM, Gavin Shan wrote:
> On Tue, Nov 17, 2015 at 04:08:30PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> In current implementation, the PEs that are allocated or picked
>> >from the reserved list are identified by PE number. The PE instance
>>> has to be picked according to the PE number eventually. We have
>>> same issue when PE is released.
>>>
>>> For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
>>> PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
>>> or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
>>> returns the reserved/allocated PE instance to be used in subsequent
>>> patches. On the other hand, pnv_ioda_free_pe() uses PE instance
>>> (not number) as its argument. No logical changes introduced.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++++++++++++++++--------------
>>>   arch/powerpc/platforms/powernv/pci.h      |  2 +-
>>>   2 files changed, 46 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 488e0f8..ae82df1 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -152,7 +152,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>   	pnv_ioda_init_pe(phb, pe_no);
>>>   }
>>>
>>> -static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>> +static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>   {
>>>   	unsigned long pe;
>>>
>>> @@ -160,19 +160,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>>   					phb->ioda.total_pe_num, 0);
>>>   		if (pe >= phb->ioda.total_pe_num)
>>> -			return IODA_INVALID_PE;
>>> +			return NULL;
>>>   	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>>
>>> -	pnv_ioda_init_pe(phb, pe);
>>> -	return pe;
>>> +	return pnv_ioda_init_pe(phb, pe);
>>>   }
>>>
>>> -static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>> +static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>>>   {
>>> -	WARN_ON(phb->ioda.pe_array[pe].pdev);
>>> +	struct pnv_phb *phb = pe->phb;
>>> +
>>> +	WARN_ON(pe->pdev);
>>>
>>> -	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
>>> -	clear_bit(pe, phb->ioda.pe_alloc);
>>> +	memset(pe, 0, sizeof(struct pnv_ioda_pe));
>>> +	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
>>>   }
>>>
>>>   /* The default M64 BAR is shared by all PEs */
>>> @@ -332,7 +333,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>>   	}
>>>   }
>>>
>>> -static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>> +static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   {
>>>   	struct pci_controller *hose = pci_bus_to_host(bus);
>>>   	struct pnv_phb *phb = hose->private_data;
>>> @@ -342,7 +343,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>
>>>   	/* Root bus shouldn't use M64 */
>>>   	if (pci_is_root_bus(bus))
>>> -		return IODA_INVALID_PE;
>>> +		return NULL;
>>>
>>>   	/* Allocate bitmap */
>>>   	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>> @@ -350,7 +351,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   	if (!pe_alloc) {
>>>   		pr_warn("%s: Out of memory !\n",
>>>   			__func__);
>>> -		return IODA_INVALID_PE;
>>> +		return NULL;
>>>   	}
>>>
>>>   	/* Figure out reserved PE numbers by the PE */
>>> @@ -363,7 +364,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   	 */
>>>   	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>>>   		kfree(pe_alloc);
>>> -		return IODA_INVALID_PE;
>>> +		return NULL;
>>>   	}
>>>
>>>   	/*
>>> @@ -409,7 +410,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   	}
>>>
>>>   	kfree(pe_alloc);
>>> -	return master_pe->pe_number;
>>> +	return master_pe;
>>>   }
>>>
>>>   static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>> @@ -988,28 +989,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>>    * subordinate PCI devices and buses. The second type of PE is normally
>>>    * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>>>    */
>>> -static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>> +static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>   {
>>>   	struct pci_controller *hose = pci_bus_to_host(bus);
>>>   	struct pnv_phb *phb = hose->private_data;
>>> -	struct pnv_ioda_pe *pe;
>>> -	int pe_num = IODA_INVALID_PE;
>>> +	struct pnv_ioda_pe *pe = NULL;
>>>
>>>   	/* Check if PE is determined by M64 */
>>>   	if (phb->pick_m64_pe)
>>> -		pe_num = phb->pick_m64_pe(bus, all);
>>> +		pe = phb->pick_m64_pe(bus, all);
>>>
>>>   	/* The PE number isn't pinned by M64 */
>>> -	if (pe_num == IODA_INVALID_PE)
>>> -		pe_num = pnv_ioda_alloc_pe(phb);
>>> +	if (!pe)
>>> +		pe = pnv_ioda_alloc_pe(phb);
>>>
>>> -	if (pe_num == IODA_INVALID_PE) {
>>> +	if (!pe) {
>>>   		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
>>>   			__func__, pci_domain_nr(bus), bus->number);
>>> -		return;
>>> +		return NULL;
>>>   	}
>>>
>>> -	pe = &phb->ioda.pe_array[pe_num];
>>>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>>   	pe->pbus = bus;
>>>   	pe->pdev = NULL;
>>> @@ -1018,17 +1017,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>
>>>   	if (all)
>>>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>> -			bus->busn_res.start, bus->busn_res.end, pe_num);
>>> +			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>>>   	else
>>>   		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
>>> -			bus->busn_res.start, pe_num);
>>> +			bus->busn_res.start, pe->pe_number);
>>>
>>>   	if (pnv_ioda_configure_pe(phb, pe)) {
>>>   		/* XXX What do we do here ? */
>>> -		if (pe_num)
>>> -			pnv_ioda_free_pe(phb, pe_num);
>>> +		pnv_ioda_free_pe(pe);
>>>   		pe->pbus = NULL;
>>> -		return;
>>> +		return NULL;
>>>   	}
>>>
>>>   	/* Associate it with all child devices */
>>> @@ -1036,6 +1034,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>
>>>   	/* Put PE to the list */
>>>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
>>> +
>>> +	return pe;
>>>   }
>>>
>>>   static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>> @@ -1267,7 +1267,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>
>>>   		pnv_ioda_deconfigure_pe(phb, pe);
>>>
>>> -		pnv_ioda_free_pe(phb, pe->pe_number);
>>> +		pnv_ioda_free_pe(pe);
>>>   	}
>>>   }
>>>
>>> @@ -1276,6 +1276,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>   	struct pci_bus        *bus;
>>>   	struct pci_controller *hose;
>>>   	struct pnv_phb        *phb;
>>> +	struct pnv_ioda_pe    *pe;
>>>   	struct pci_dn         *pdn;
>>>   	struct pci_sriov      *iov;
>>>   	u16                    num_vfs, i;
>>> @@ -1300,8 +1301,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>   		/* Release PE numbers */
>>>   		if (pdn->m64_single_mode) {
>>>   			for (i = 0; i < num_vfs; i++) {
>>> -				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>> -					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>> +				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
>>> +					continue;
>>> +
>>> +				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
>>> +				pnv_ioda_free_pe(pe);
>>>   			}
>>>   		} else
>>>   			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>> @@ -1354,9 +1358,8 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>
>>>   		if (pnv_ioda_configure_pe(phb, pe)) {
>>>   			/* XXX What do we do here ? */
>>> -			if (pe_num)
>>> -				pnv_ioda_free_pe(phb, pe_num);
>>>   			pe->pdev = NULL;
>>> +			pnv_ioda_free_pe(pe);
>>
>>
>>
>> pnv_ioda_free_pe() does WARN_ON(pdev). Before this patch you would free PE
>> first and then reset pe->pdev, now you reset it first, then call
>> pnv_ioda_free_pe(). This change is not just about "Use PE instead of number
>> during setup and release", is/was that a bug?
>>
>> And I fail to see when pe->pdev could get initialized in
>> pnv_ioda_configure_pe() as pnv_pci_dma_dev_setup() should not be called while
>> pnv_ioda_setup_vf_PE() is working.
>>
>
> It wasn't or isn't a bug as
There is an unexplained change in behavior - after the patch pe->pdev gets 
cleaned before pnv_ioda_free_pe(), before the patch it was opposite. Your 
options are:
- remove "No logical changes introduced" from the commit log and explain 
the change or
- move "pe->pdev = NULL;" after pnv_ioda_free_pe().
> pe->pdev is initialized in arch/powerpc/platform/powernv/pci.c::
> pnv_pci_dma_dev_setup()
So when pnv_ioda_setup_vf_PE() starts working, it is valid for pe->pdev to 
have not-NULL pointer? Because nothing in pnv_ioda_setup_vf_PE() calls 
pnv_pci_dma_dev_setup(), explicitly or implicitly.
>>
>>>   			continue;
>>>   		}
>>>
>>> @@ -1374,6 +1377,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>   	struct pci_bus        *bus;
>>>   	struct pci_controller *hose;
>>>   	struct pnv_phb        *phb;
>>> +	struct pnv_ioda_pe    *pe;
>>>   	struct pci_dn         *pdn;
>>>   	int                    ret;
>>>   	u16                    i;
>>> @@ -1416,11 +1420,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>   		/* Calculate available PE for required VFs */
>>>   		if (pdn->m64_single_mode) {
>>>   			for (i = 0; i < num_vfs; i++) {
>>> -				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
>>> -				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
>>> +				pe = pnv_ioda_alloc_pe(phb);
>>> +				if (!pe) {
>>>   					ret = -EBUSY;
>>>   					goto m64_failed;
>>>   				}
>>> +
>>> +				pdn->pe_num_map[i] = pe->pe_number;
>>>   			}
>>>   		} else {
>>>   			mutex_lock(&phb->ioda.pe_alloc_mutex);
>>> @@ -1465,8 +1471,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>   m64_failed:
>>>   	if (pdn->m64_single_mode) {
>>>   		for (i = 0; i < num_vfs; i++) {
>>> -			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>> -				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>> +			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
>>> +				continue;
>>> +
>>> +			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
>>> +			pnv_ioda_free_pe(pe);
>>>   		}
>>>   	} else
>>>   		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 5df945f..e55ab0e 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -105,7 +105,7 @@ struct pnv_phb {
>>>   	int (*init_m64)(struct pnv_phb *phb);
>>>   	void (*reserve_m64_pe)(struct pci_bus *bus,
>>>   			       unsigned long *pe_bitmap, bool all);
>>> -	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
>>> +	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
>>>   	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>>>   	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>>>   	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>>>
>
> Thanks,
> Gavin
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs
  2015-11-04 13:12 ` [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs Gavin Shan
@ 2015-11-18  2:23   ` Alexey Kardashevskiy
  2015-11-23 23:06     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  2:23 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This adds a reference count of PE, representing the number of PCI
> devices associated with the PE. The reference count is increased
> or decreased when PCI devices join or leave the PE. Once it becomes
> zero, the PE together with its used resources (IO, MMIO, DMA, PELTM,
> PELTV) are released to support PCI hot unplug.
The commit log suggest the patch only adds a counter, initializes it, and 
replaces unconditional release of an object (in this case - PE) with the 
conditional one. But it is more that that...
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 245 ++++++++++++++++++++++++++----
>   arch/powerpc/platforms/powernv/pci.h      |   1 +
>   2 files changed, 218 insertions(+), 28 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 0bb0056..dcffce5 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -129,6 +129,215 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>   }
>
> +static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	struct iommu_table *tbl;
> +	int start, count, i;
> +	int64_t rc;
> +
> +	/* Search for the used DMA32 segments */
> +	start = -1;
> +	count = 0;
> +	for (i = 0; i < phb->ioda.dma32_count; i++) {
> +		if (phb->ioda.dma32_segmap[i] != pe->pe_number)
> +			continue;
> +
> +		count++;
> +		if (start < 0)
> +			start = i;
> +	}
> +
> +	if (!count)
> +		return;
imho checking pe->table_group.tables[0] != NULL is shorter than the loop above.
> +
> +	/* Unlink IOMMU table from group */
> +	tbl = pe->table_group.tables[0];
> +	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
> +	if (pe->table_group.group) {
> +		iommu_group_put(pe->table_group.group);
> +		WARN_ON(pe->table_group.group);
> +	}
> +
> +	/* Release IOMMU table */
> +	pnv_pci_ioda2_table_free_pages(tbl);
This is IODA2 helper with multilevel support, does IODA1 support multilevel 
TCE tables? If not, it should WARN_ON on levels!=1.
Another thing is you should first unprogram TVEs (via 
opal_pci_map_pe_dma_window), then invalidate the cache (if required, not 
sure if this is needed on IODA1), only then free the actual table.
> +	iommu_free_table(tbl, of_node_full_name(pci_bus_to_OF_node(pe->pbus)));
> +
> +	/* Disable TVE */
> +	for (i = start; i < start + count; i++) {
> +		rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
> +						i, 0, 0ul, 0ul, 0ul);
> +		if (rc)
> +			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
> +				rc, i);
> +
> +		phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
> +	}
You could implement pnv_pci_ioda1_unset_window/pnv_ioda1_table_free as 
callbacks, change pnv_pci_ioda2_release_dma_pe() to use them (and rename it 
to reflect that it supports IODA1 and IODA2).
> +}
> +
> +static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
> +static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
> +		int num);
> +static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
> +
> +static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)
You moved this function and changed it, please do one thing at once (which 
is "change", not "move").
> +{
> +	struct iommu_table *tbl;
> +	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
> +	int64_t rc;
> +
> +	if (!weight)
> +		return;
Checking for pe->table_group.group is better because if we ever change the 
logic of what gets included to an IOMMU group, we will have to do the 
change where we add devices to a group but we won't have to touch releasing 
code.
> +
> +	tbl = pe->table_group.tables[0];
> +	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> +	if (rc)
> +		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
> +
> +	pnv_pci_ioda2_set_bypass(pe, false);
> +	if (pe->table_group.group) {
> +		iommu_group_put(pe->table_group.group);
> +		WARN_ON(pe->table_group.group);
> +	}
> +
> +	pnv_pci_ioda2_table_free_pages(tbl);
> +	iommu_free_table(tbl, "pnv");
> +}
> +
> +static void pnv_ioda_release_dma_pe(struct pnv_ioda_pe *pe)
Merge this into pnv_ioda_release_pe() - it is small and called just once.
> +{
> +	struct pnv_phb *phb = pe->phb;
> +
> +	switch (phb->type) {
> +	case PNV_PHB_IODA1:
> +		pnv_pci_ioda1_release_dma_pe(pe);
> +		break;
> +	case PNV_PHB_IODA2:
> +		pnv_pci_ioda2_release_dma_pe(pe);
> +		break;
> +	default:
> +		WARN_ON(1);
> +	}
> +}
> +
> +static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	int index, *segmap = NULL;
> +	int64_t rc;
> +
> +	switch (win) {
> +	case OPAL_IO_WINDOW_TYPE:
> +		segmap = phb->ioda.io_segmap;
> +		break;
> +	case OPAL_M32_WINDOW_TYPE:
> +		segmap = phb->ioda.m32_segmap;
> +		break;
> +	case OPAL_M64_WINDOW_TYPE:
> +		if (phb->type != PNV_PHB_IODA1)
> +			return;
> +		segmap = phb->ioda.m64_segmap;
> +		break;
> +	default:
> +		return;
Unnecessary return.
> +	}
> +
> +	for (index = 0; index < phb->ioda.total_pe_num; index++) {
> +		if (segmap[index] != pe->pe_number)
> +			continue;
> +
> +		if (win == OPAL_M64_WINDOW_TYPE)
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +					phb->ioda.reserved_pe_idx, win,
> +					index / PNV_IODA1_M64_SEGS,
> +					index % PNV_IODA1_M64_SEGS);
> +		else
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +					phb->ioda.reserved_pe_idx, win,
> +					0, index);
> +
> +		if (rc != OPAL_SUCCESS)
> +			pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
> +				rc, win, index);
> +
> +		segmap[index] = IODA_INVALID_PE;
> +	}
> +}
> +
> +static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	int win;
> +
> +	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {
> +		if (phb->type == PNV_PHB_IODA2 && win == OPAL_IO_WINDOW_TYPE)
> +			continue;
Move this check to pnv_ioda_release_window() or move case(win == 
OPAL_M64_WINDOW_TYPE):if(phb->type != PNV_PHB_IODA1) from that function here.
> +
> +		pnv_ioda_release_window(pe, win);
> +	}
> +}
This is shorter and cleaner:
static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win, int 
*segmap
{
         struct pnv_phb *phb = pe->phb;
         int index;
         int64_t rc;
         for (index = 0; index < phb->ioda.total_pe_num; index++) {
                 if (segmap[index] != pe->pe_number)
                         continue;
                 if (win == OPAL_M64_WINDOW_TYPE)
                         rc = opal_pci_map_pe_mmio_window(phb->opal_id,
                                         phb->ioda.reserved_pe_idx, win,
                                         index / PNV_IODA1_M64_SEGS,
                                         index % PNV_IODA1_M64_SEGS);
                 else
                         rc = opal_pci_map_pe_mmio_window(phb->opal_id,
                                         phb->ioda.reserved_pe_idx, win,
                                         0, index);
                 if (rc != OPAL_SUCCESS)
                         pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
                                 rc, win, index);
                 segmap[index] = IODA_INVALID_PE;
         }
}
static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
{
         pnv_ioda_release_window(pe, OPAL_M32_WINDOW_TYPE, 
phb->ioda.m32_segmap);
         if (phb->type != PNV_PHB_IODA2)
                 pnv_ioda_release_window(pe, OPAL_IO_WINDOW_TYPE,
                                 phb->ioda.io_segmap);
	else
                 pnv_ioda_release_window(pe, OPAL_M64_WINDOW_TYPE,
                                 phb->ioda.m64_segmap);
}
I'd actually merge pnv_ioda_release_pe_seg() into pnv_ioda_release_pe() as 
well as it is also small and called once.
> +
> +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb,
> +				   struct pnv_ioda_pe *pe);
> +static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe);
> +static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_ioda_pe *tmp, *slave;
> +
> +	/* Release slave PEs in compound PE */
> +	if (pe->flags & PNV_IODA_PE_MASTER) {
> +		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
> +			pnv_ioda_release_pe(slave);
> +	}
> +
> +	/* Remove the PE from the list */
> +	list_del(&pe->list);
> +
> +	/* Release resources */
> +	pnv_ioda_release_dma_pe(pe);
> +	pnv_ioda_release_pe_seg(pe);
> +	pnv_ioda_deconfigure_pe(pe->phb, pe);
> +
> +	pnv_ioda_free_pe(pe);
> +}
> +
> +static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
> +{
> +	if (!pe)
> +		return NULL;
> +
> +	pe->device_count++;
> +	return pe;
> +}
> +
> +static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
Merge this into pnv_pci_release_device() as it is small and called only once.
> +{
> +	if (!pe)
> +		return;
> +
> +	pe->device_count--;
> +	WARN_ON(pe->device_count < 0);
> +	if (pe->device_count == 0)
> +		pnv_ioda_release_pe(pe);
> +}
> +
> +static void pnv_pci_release_device(struct pci_dev *pdev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +	struct pnv_ioda_pe *pe;
> +
> +	if (pdev->is_virtfn)
> +		return;
> +
> +	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
> +		return;
> +
> +	pe = &phb->ioda.pe_array[pdn->pe_number];
> +	pnv_ioda_pe_put(pe);
> +}
> +
>   static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>   {
>   	phb->ioda.pe_array[pe_no].phb = phb;
> @@ -724,7 +933,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>   	return 0;
>   }
>
> -#ifdef CONFIG_PCI_IOV
>   static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   {
>   	struct pci_dev *parent;
> @@ -759,9 +967,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   		}
>   		rid_end = pe->rid + (count << 8);
>   	} else {
> +#ifdef CONFIG_PCI_IOV
>   		if (pe->flags & PNV_IODA_PE_VF)
>   			parent = pe->parent_dev;
>   		else
> +#endif
>   			parent = pe->pdev->bus->self;
>   		bcomp = OpalPciBusAll;
>   		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
> @@ -799,11 +1009,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>
>   	pe->pbus = NULL;
>   	pe->pdev = NULL;
> +#ifdef CONFIG_PCI_IOV
>   	pe->parent_dev = NULL;
> +#endif
These #ifdef movements seem very much unrelated.
>
>   	return 0;
>   }
> -#endif /* CONFIG_PCI_IOV */
>
>   static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   {
> @@ -985,6 +1196,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   			continue;
>
>   		pdn->pe_number = pe->pe_number;
> +		pnv_ioda_pe_get(pe);
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>   			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>   	}
> @@ -1047,9 +1259,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   			bus->busn_res.start, pe->pe_number);
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
> -		/* XXX What do we do here ? */
> -		pnv_ioda_free_pe(pe);
>   		pe->pbus = NULL;
> +		pnv_ioda_release_pe(pe);
This is unrelated unexplained change.
>   		return NULL;
>   	}
>
> @@ -1199,29 +1410,6 @@ m64_failed:
>   	return -EBUSY;
>   }
>
> -static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
> -		int num);
> -static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
> -
> -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
> -{
> -	struct iommu_table    *tbl;
> -	int64_t               rc;
> -
> -	tbl = pe->table_group.tables[0];
> -	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> -	if (rc)
> -		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
> -
> -	pnv_pci_ioda2_set_bypass(pe, false);
> -	if (pe->table_group.group) {
> -		iommu_group_put(pe->table_group.group);
> -		BUG_ON(pe->table_group.group);
> -	}
> -	pnv_pci_ioda2_table_free_pages(tbl);
> -	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> -}
> -
>   static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>   {
>   	struct pci_bus        *bus;
> @@ -1242,7 +1430,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>   		if (pe->parent_dev != pdev)
>   			continue;
>
> -		pnv_pci_ioda2_release_dma_pe(pdev, pe);
> +		pnv_pci_ioda2_release_dma_pe(pe);
This is unrelated change.
>
>   		/* Remove from list */
>   		mutex_lock(&phb->ioda.pe_list_mutex);
> @@ -3124,6 +3312,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>   	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
>   #endif
>   	.enable_device_hook	= pnv_pci_enable_device_hook,
> +	.release_device		= pnv_pci_release_device,
>   	.window_alignment	= pnv_pci_window_alignment,
>   	.setup_bridge		= pnv_pci_setup_bridge,
>   	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index ef5271a..3bb10de 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -30,6 +30,7 @@ struct pnv_phb;
>   struct pnv_ioda_pe {
>   	unsigned long		flags;
>   	struct pnv_phb		*phb;
> +	int			device_count;
Not atomic_t, no kref, no additional mutex, just "int"? Sure about it? If 
so, put a note to the commit log about what provides a guarantee that there 
is no race.
>
>   	/* A PE can be associated with a single device or an
>   	 * entire bus (& children). In the former case, pdev
>
-- 
Alexey Kardashevskiy
IBM OzLabs, LTC Team
e-mail: aik@au1.ibm.com
notes: Alexey Kardashevskiy/Australia/IBM
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  2015-11-04 13:12 ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
@ 2015-11-18  2:43   ` Alexey Kardashevskiy
  2015-11-23 23:08     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  2:43 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
> with names of the weak functions in PCI subsystem, which have the
> prefix "pcibios". No logical changes introduced.
As you mentioned before, the patchset is organized as "code refactoring,
IO/M32/M64, DMA, PE allocation/releaseing". This patch fits into the 
refactoring category so it goes to the beginning of the series :)
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes()
  2015-11-04 13:12 ` [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
@ 2015-11-18  3:14   ` Alexey Kardashevskiy
  2015-11-23 23:23     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  3:14 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This renames traverse_pci_devices() to pci_traverse_device_nodes().
Why? pci_traverse_device_nodes() is not moved to some more generic header 
where is would be required to have a standard prefix. And the ppc-pci.h 
header does not use any standard prefix so the point of renaming is unclear.
traverse_pci_dn() is still there and it has "traverse", "pci" and "device 
node" (abbreviated as "dn") in it so pci_traverse_device_nodes is more 
confusing name than traverse_pci_devices. Cannot we just get rid of one of 
them?
Also the subject line says "Export" but nothing gets exported in this patch 
- the visibility of pci_traverse_device_nodes() remains unchanged.
> The function traverses all subordinate device nodes of the specified
> one. Also, below cleanup applied to the function. No logical changes
> introduced.
>
>     * Rename "pre" to "fn".
>     * Avoid assignment in if condition reported from checkpatch.pl.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>   arch/powerpc/kernel/pci_dn.c         | 14 +++++++++-----
>   arch/powerpc/platforms/pseries/msi.c |  4 ++--
>   3 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
> index ca0c5bf..8753e4e 100644
> --- a/arch/powerpc/include/asm/ppc-pci.h
> +++ b/arch/powerpc/include/asm/ppc-pci.h
> @@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
>   struct device_node;
>   struct pci_dn;
>
> -typedef void *(*traverse_func)(struct device_node *me, void *data);
> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
> -		void *data);
> +void *pci_traverse_device_nodes(struct device_node *start,
> +				void *(*fn)(struct device_node *, void *),
> +				void *data);
>   void *traverse_pci_dn(struct pci_dn *root,
>   		      void *(*fn)(struct pci_dn *, void *),
>   		      void *data);
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 7f877a4..aa4110f 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -355,8 +355,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>    * one of these nodes we also assume its siblings are non-pci for
>    * performance.
>    */
> -void *traverse_pci_devices(struct device_node *start, traverse_func pre,
> -		void *data)
> +void *pci_traverse_device_nodes(struct device_node *start,
> +				void *(*fn)(struct device_node *, void *),
> +				void *data)
>   {
>   	struct device_node *dn, *nextdn;
>   	void *ret;
> @@ -371,8 +372,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>   		if (classp)
>   			class = of_read_number(classp, 1);
>
> -		if (pre && ((ret = pre(dn, data)) != NULL))
> -			return ret;
> +		if (fn) {
> +			ret = fn(dn, data);
> +			if (ret)
> +				return ret;
> +		}
>
>   		/* If we are a PCI bridge, go down */
>   		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
> @@ -470,7 +474,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>   	}
>
>   	/* Update dn->phb ptrs for new phb and children devices */
> -	traverse_pci_devices(dn, add_pdn, phb);
> +	pci_traverse_device_nodes(dn, add_pdn, phb);
>   }
>
>   /**
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index 272e9ec..543a638 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>   	memset(&counts, 0, sizeof(struct msi_counts));
>
>   	/* Work out how many devices we have below this PE */
> -	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
> +	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
>
>   	if (counts.num_devices == 0) {
>   		pr_err("rtas_msi: found 0 devices under PE for %s\n",
> @@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>   	/* else, we have some more calculating to do */
>   	counts.requestor = pci_device_to_OF_node(dev);
>   	counts.request = request;
> -	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
> +	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
>
>   	/* If the quota isn't an integer multiple of the total, we can
>   	 * use the remainder as spare MSIs for anyone that wants them. */
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus()
  2015-11-04 13:12 ` [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
@ 2015-11-18  3:59   ` Alexey Kardashevskiy
  2015-11-23 23:11     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  3:59 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
> avoid conflicts with those PCI subsystem weak function names, which
> have prefix "pcibios". No logical changes introduced.
Could be merged into [PATCH v7 28/50] powerpc/pci: Rename 
pcibios_{add,remove}_pci_devices()  or/and moved to the beginning of the 
series?
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pci-bridge.h      | 2 +-
>   arch/powerpc/platforms/pseries/pci_dlpar.c | 5 ++---
>   drivers/pci/hotplug/rpadlpar_core.c        | 6 +++---
>   drivers/pci/hotplug/rpaphp_pci.c           | 2 +-
>   4 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index c2360c8..28385cb 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -257,7 +257,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
>   #endif
>
>   /** Find the bus corresponding to the indicated device node */
> -extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
> +extern struct pci_bus *pci_find_bus_by_node(struct device_node *dn);
>
>   /** Remove all of the PCI devices under this bus */
>   extern void pci_remove_pci_devices(struct pci_bus *bus);
> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
> index 5d4a3df..aee22b4 100644
> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
> @@ -54,8 +54,7 @@ find_bus_among_children(struct pci_bus *bus,
>   	return child;
>   }
>
> -struct pci_bus *
> -pcibios_find_pci_bus(struct device_node *dn)
> +struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
>   {
>   	struct pci_dn *pdn = dn->data;
>
> @@ -64,7 +63,7 @@ pcibios_find_pci_bus(struct device_node *dn)
>
>   	return find_bus_among_children(pdn->phb->bus, dn);
>   }
> -EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
> +EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
>
>   struct pci_controller *init_phb_dynamic(struct device_node *dn)
>   {
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
> index ebd283b..9aa392b 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -176,7 +176,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
>   	struct pci_dev *dev;
>   	struct pci_controller *phb;
>
> -	if (pcibios_find_pci_bus(dn))
> +	if (pci_find_bus_by_node(dn))
>   		return -EINVAL;
>
>   	/* Add pci bus */
> @@ -213,7 +213,7 @@ static int dlpar_remove_phb(char *drc_name, struct device_node *dn)
>   	struct pci_dn *pdn;
>   	int rc = 0;
>
> -	if (!pcibios_find_pci_bus(dn))
> +	if (!pci_find_bus_by_node(dn))
>   		return -EINVAL;
>
>   	/* If pci slot is hotpluggable, use hotplug to remove it */
> @@ -357,7 +357,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>
>   	pci_lock_rescan_remove();
>
> -	bus = pcibios_find_pci_bus(dn);
> +	bus = pci_find_bus_by_node(dn);
>   	if (!bus) {
>   		ret = -EINVAL;
>   		goto out;
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> index 256066c..e7dd573 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
>   	if (rc)
>   		return rc;
>
> -	bus = pcibios_find_pci_bus(slot->dn);
> +	bus = pci_find_bus_by_node(slot->dn);
>   	if (!bus) {
>   		err("%s: no pci_bus for dn %s\n", __func__, slot->dn->full_name);
>   		return -EINVAL;
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 34/50] powerpc/pci: Delay populating pdn
  2015-11-04 13:12 ` [PATCH v7 34/50] powerpc/pci: Delay populating pdn Gavin Shan
@ 2015-11-18  4:24   ` Alexey Kardashevskiy
  2015-11-23 23:42     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  4:24 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> The pdn (struct pci_dn) instances are allocated from memblock or
> bootmem when creating PCI controller (hoses) in setup_arch(). PCI
> hotplug, which will be supported by proceeding patches, release
> PCI device nodes and their corresponding pdn on unplugging event.
> The memory chunks for pdn instances allocated from memblock or
> bootmem are hard to reused after being released.
>
> This delays creating pdn in core_initcall_sync(eeh_dev_phb_init) so
> that they are allocated from slab. In turn, the memory chunks for
> them can be reused after being released without problem. Since the
> pdn and eeh_dev has same life cycle, the eeh_dev is created when
> pdn is populated. We needn't create eeh_dev with another initcall.
> The time to create PHB PEs is delayed a bit from core_initcall() to
> core_initcall_sync().
Why is delayed? I mean what needs to be called before eeh_dev_phb_init()?
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/eeh.h         |  2 +-
>   arch/powerpc/include/asm/ppc-pci.h     |  2 --
>   arch/powerpc/kernel/eeh_dev.c          | 19 ++++-------------
>   arch/powerpc/kernel/pci_dn.c           | 20 ++++++++++++++++--
>   arch/powerpc/platforms/maple/pci.c     | 34 ++++++++++++++++++------------
>   arch/powerpc/platforms/pasemi/pci.c    |  3 ---
>   arch/powerpc/platforms/powermac/pci.c  | 38 +++++++++++++++++++++-------------
>   arch/powerpc/platforms/powernv/pci.c   |  3 ---
>   arch/powerpc/platforms/pseries/setup.c |  6 +-----
>   9 files changed, 69 insertions(+), 58 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index c5eb86f..27352f4 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -268,7 +268,7 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
>   const char *eeh_pe_loc_get(struct eeh_pe *pe);
>   struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
>
> -void *eeh_dev_init(struct pci_dn *pdn, void *data);
> +struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
>   void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
>   int eeh_init(void);
>   int __init eeh_ops_register(struct eeh_ops *ops);
> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
> index 8753e4e..0f73de0 100644
> --- a/arch/powerpc/include/asm/ppc-pci.h
> +++ b/arch/powerpc/include/asm/ppc-pci.h
> @@ -39,8 +39,6 @@ void *pci_traverse_device_nodes(struct device_node *start,
>   void *traverse_pci_dn(struct pci_dn *root,
>   		      void *(*fn)(struct pci_dn *, void *),
>   		      void *data);
> -
> -extern void pci_devs_phb_init(void);
>   extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
>
>   /* From rtas_pci.h */
> diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
> index aabba94..1c4bc35 100644
> --- a/arch/powerpc/kernel/eeh_dev.c
> +++ b/arch/powerpc/kernel/eeh_dev.c
> @@ -44,14 +44,13 @@
>   /**
>    * eeh_dev_init - Create EEH device according to OF node
>    * @pdn: PCI device node
> - * @data: PHB
>    *
>    * It will create EEH device according to the given OF node. The function
>    * might be called by PCI emunation, DR, PHB hotplug.
>    */
> -void *eeh_dev_init(struct pci_dn *pdn, void *data)
> +struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
>   {
> -	struct pci_controller *phb = data;
> +	struct pci_controller *phb = pdn->phb;
>   	struct eeh_dev *edev;
>
>   	/* Allocate EEH device */
> @@ -68,7 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
>   	edev->phb = phb;
>   	INIT_LIST_HEAD(&edev->list);
>
> -	return NULL;
> +	return edev;
>   }
>
>   /**
> @@ -80,16 +79,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
>    */
>   void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
>   {
> -	struct pci_dn *root = phb->pci_data;
> -
>   	/* EEH PE for PHB */
>   	eeh_phb_pe_create(phb);
> -
> -	/* EEH device for PHB */
> -	eeh_dev_init(root, phb);
> -
> -	/* EEH devices for children OF nodes */
> -	traverse_pci_dn(root, eeh_dev_init, phb);
>   }
>
>   /**
> @@ -105,9 +96,7 @@ static int __init eeh_dev_phb_init(void)
>   	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
>   		eeh_dev_phb_init_dynamic(phb);
>
> -	pr_info("EEH: devices created\n");
> -
>   	return 0;
>   }
>
> -core_initcall(eeh_dev_phb_init);
> +core_initcall_sync(eeh_dev_phb_init);
May be remove core_initcall_sync and call eeh_dev_phb_init_dynamic() 
directly from the loop in pci_devs_phb_init()?
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index aa4110f..581612c 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -272,8 +272,11 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>   	const __be32 *regs;
>   	struct device_node *parent;
>   	struct pci_dn *pdn;
> +#ifdef CONFIG_EEH
> +	struct eeh_dev *edev;
> +#endif
>
> -	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
> +	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
>   	if (pdn == NULL)
>   		return NULL;
>   	dn->data = pdn;
> @@ -302,6 +305,15 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>   	/* Extended config space */
>   	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
>
> +	/* Create EEH device */
> +#ifdef CONFIG_EEH
> +	edev = eeh_dev_init(pdn);
> +	if (!edev) {
> +		kfree(pdn);
> +		return NULL;
> +	}
> +#endif
> +
>   	/* Attach to parent node */
>   	INIT_LIST_HEAD(&pdn->child_list);
>   	INIT_LIST_HEAD(&pdn->list);
> @@ -486,15 +498,19 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>    * pci device found underneath.  This routine runs once,
>    * early in the boot sequence.
>    */
> -void __init pci_devs_phb_init(void)
> +static int __init pci_devs_phb_init(void)
>   {
>   	struct pci_controller *phb, *tmp;
>
>   	/* This must be done first so the device nodes have valid pci info! */
>   	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
>   		pci_devs_phb_init_dynamic(phb);
> +
> +	return 0;
>   }
>
> +core_initcall(pci_devs_phb_init);
> +
>   static void pci_dev_pdn_setup(struct pci_dev *pdev)
>   {
>   	struct pci_dn *pdn;
> diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
> index a923230..a2f89e6 100644
> --- a/arch/powerpc/platforms/maple/pci.c
> +++ b/arch/powerpc/platforms/maple/pci.c
> @@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
>   	DBG(" <- maple_pci_irq_fixup\n");
>   }
>
> +static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
> +	struct device_node *np, *child;
> +
> +	if (hose != u3_agp)
> +		return 0;
> +
> +	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
> +	 * assume there is no P2P bridge on the AGP bus, which should be a
> +	 * safe assumptions hopefully.
> +	 */
> +	np = hose->dn;
> +	PCI_DN(np)->busno = 0xf0;
> +	for_each_child_of_node(np, child)
> +		PCI_DN(child)->busno = 0xf0;
> +
> +	return 0;
> +}
> +
>   void __init maple_pci_init(void)
>   {
>   	struct device_node *np, *root;
> @@ -605,19 +625,7 @@ void __init maple_pci_init(void)
>   	if (ht && maple_add_bridge(ht) != 0)
>   		of_node_put(ht);
>
> -	/* Setup the linkage between OF nodes and PHBs */
> -	pci_devs_phb_init();
> -
> -	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
> -	 * assume there is no P2P bridge on the AGP bus, which should be a
> -	 * safe assumptions hopefully.
> -	 */
> -	if (u3_agp) {
> -		struct device_node *np = u3_agp->dn;
> -		PCI_DN(np)->busno = 0xf0;
> -		for (np = np->child; np; np = np->sibling)
> -			PCI_DN(np)->busno = 0xf0;
> -	}
> +	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
This seems an unrelated change.
What is this pcibios_root_bridge_prepare()? How come you do not need one 
for the powernv platform but do need for others? Same question about powermac.
>
>   	/* Tell pci.c to not change any resource allocations.  */
>   	pci_add_flags(PCI_PROBE_ONLY);
> diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
> index f3a68a0..10c4e8f 100644
> --- a/arch/powerpc/platforms/pasemi/pci.c
> +++ b/arch/powerpc/platforms/pasemi/pci.c
> @@ -229,9 +229,6 @@ void __init pas_pci_init(void)
>   			of_node_get(np);
>
>   	of_node_put(root);
> -
> -	/* Setup the linkage between OF nodes and PHBs */
> -	pci_devs_phb_init();
>   }
>
>   void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
> diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
> index 59ab16f..6e06c3b 100644
> --- a/arch/powerpc/platforms/powermac/pci.c
> +++ b/arch/powerpc/platforms/powermac/pci.c
> @@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
>   #endif /* CONFIG_PPC32 */
>   }
>
> +#ifdef CONFIG_PPC64
> +static int pmac_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
> +	struct device_node *np, *child;
> +
> +	if (hose != u3_agp)
> +		return 0;
> +
> +	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
> +	 * assume there is no P2P bridge on the AGP bus, which should be a
> +	 * safe assumptions for now. We should do something better in the
> +	 * future though
> +	 */
> +	np = hose->dn;
> +	PCI_DN(np)->busno = 0xf0;
> +	for_each_child_of_node(np, child)
> +		PCI_DN(child)->busno = 0xf0;
> +
> +	return 0;
> +}
> +#endif /* CONFIG_PPC64 */
> +
>   void __init pmac_pci_init(void)
>   {
>   	struct device_node *np, *root;
> @@ -914,20 +937,7 @@ void __init pmac_pci_init(void)
>   	if (ht && pmac_add_bridge(ht) != 0)
>   		of_node_put(ht);
>
> -	/* Setup the linkage between OF nodes and PHBs */
> -	pci_devs_phb_init();
> -
> -	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
> -	 * assume there is no P2P bridge on the AGP bus, which should be a
> -	 * safe assumptions for now. We should do something better in the
> -	 * future though
> -	 */
> -	if (u3_agp) {
> -		struct device_node *np = u3_agp->dn;
> -		PCI_DN(np)->busno = 0xf0;
> -		for (np = np->child; np; np = np->sibling)
> -			PCI_DN(np)->busno = 0xf0;
> -	}
> +	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
>   	/* pmac_check_ht_link(); */
>
>   #else /* CONFIG_PPC64 */
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index fa99daf..d8832ea 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -807,9 +807,6 @@ void __init pnv_pci_init(void)
>   	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
>   		pnv_pci_init_ioda2_phb(np);
>
> -	/* Setup the linkage between OF nodes and PHBs */
> -	pci_devs_phb_init();
> -
>   	/* Configure IOMMU DMA hooks */
>   	set_pci_dma_ops(&dma_iommu_ops);
>   }
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index 6c274cb..bdf93a1 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -262,11 +262,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
>   	case OF_RECONFIG_ATTACH_NODE:
>   		parent = of_get_parent(np);
>   		pdn = parent ? PCI_DN(parent) : NULL;
> -		if (pdn) {
> -			/* Create pdn and EEH device */
> +		if (pdn)
>   			pci_add_device_node_info(pdn->phb, np);
> -			eeh_dev_init(PCI_DN(np), pdn->phb);
> -		}
>
>   		of_node_put(parent);
>   		break;
> @@ -489,7 +486,6 @@ static void __init find_and_init_phbs(void)
>   	}
>
>   	of_node_put(root);
> -	pci_devs_phb_init();
>
>   	/*
>   	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-11-04 13:12 ` [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
@ 2015-11-18  7:33   ` Alexey Kardashevskiy
  2015-11-23 23:16     ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-18  7:33 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, frowand.list
On 11/05/2015 12:12 AM, Gavin Shan wrote:
> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
> platform that runs on top of skiboot firmware. The firmware identifies
> hotpluggable slots and marked their device tree node with proper
> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans device
> tree nodes to create/register PCI hotplug slot accordingly.
>
> If the skiboot firmware doesn't support slot status retrieval, the PCI
> slot device node shouldn't have property "ibm,reset-by-firmware". In
> that case, none of valid PCI slots will be detected from device tree.
> The skiboot firmware doesn't export the capability to access attention
> LEDs yet and it's something for TBD.
Few words what we are actually dealing with and how children slots can be 
hotplugged to parent slots?
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   MAINTAINERS                   |   6 +
>   drivers/pci/hotplug/Kconfig   |  12 +
>   drivers/pci/hotplug/Makefile  |   3 +
>   drivers/pci/hotplug/pnv_php.c | 866 ++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 887 insertions(+)
>   create mode 100644 drivers/pci/hotplug/pnv_php.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9f6685f..10088f1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7931,6 +7931,12 @@ L:	linux-pci@vger.kernel.org
>   S:	Supported
>   F:	Documentation/PCI/pci-error-recovery.txt
>
> +PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
> +M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
> +L:	linux-pci@vger.kernel.org
> +S:	Supported
> +F:	drivers/pci/hotplug/pnv_php.c
> +
>   PCI SUBSYSTEM
>   M:	Bjorn Helgaas <bhelgaas@google.com>
>   L:	linux-pci@vger.kernel.org
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index df8caec..167c8ce 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>
>   	  When in doubt, say N.
>
> +config HOTPLUG_PCI_POWERNV
> +	tristate "PowerPC PowerNV PCI Hotplug driver"
> +	depends on PPC_POWERNV && EEH
> +	help
> +	  Say Y here if you run PowerPC PowerNV platform that supports
> +	  PCI Hotplug
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called pnv-php.
> +
> +	  When in doubt, say N.
> +
>   config HOTPLUG_PCI_RPA
>   	tristate "RPA PCI Hotplug driver"
>   	depends on PPC_PSERIES && EEH
> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> index b616e75..e33cdda 100644
> --- a/drivers/pci/hotplug/Makefile
> +++ b/drivers/pci/hotplug/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
> @@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>   acpiphp-objs		:=	acpiphp_core.o	\
>   				acpiphp_glue.o
>
> +pnv-php-objs		:=	pnv_php.o
> +
>   rpaphp-objs		:=	rpaphp_core.o	\
>   				rpaphp_pci.o	\
>   				rpaphp_slot.o
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> new file mode 100644
> index 0000000..415e9b9
> --- /dev/null
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -0,0 +1,866 @@
> +/*
> + * PCI Hotplug Driver for PowerPC PowerNV platform.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
> +#include <linux/module.h>
> +
> +#include <asm/opal.h>
> +#include <asm/pnv-pci.h>
> +#include <asm/ppc-pci.h>
> +
> +#define DRIVER_VERSION	"0.1"
> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
> +
> +struct pnv_php_slot {
> +	struct hotplug_slot		php_slot;
> +	struct hotplug_slot_info	php_slot_info;
> +	uint64_t			id;
> +	char				*name;
> +	int				slot_no;
> +	struct kref			kref;
> +	int				state;
> +#define PNV_PHP_STATE_INIT		0
INITIALIZED
> +#define PNV_PHP_STATE_REGISTER		1
REGISTERED
> +#define PNV_PHP_STATE_POPULATED		2
This one has "ed" already :)
And usually definitions go before the variable which uses them.
> +	struct device_node		*dn;
> +	struct pci_dev			*pdev;
> +	struct pci_bus			*bus;
> +	bool				power_state_check;
> +	int				power_state_confirmed;
> +#define PNV_PHP_POWER_CONFIRMED_INVALID	0
> +#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
> +#define PNV_PHP_POWER_CONFIRMED_FAIL	2
> +	struct opal_msg			*msg;
> +	void				*fdt;
> +	void				*dt;
> +	struct of_changeset		ocs;
> +	struct work_struct		work;
> +	wait_queue_head_t		queue;
> +	struct pnv_php_slot		*parent;
> +	struct list_head		children;
> +	struct list_head		link;
> +};
> +
> +static LIST_HEAD(pnv_php_slot_list);
> +static DEFINE_SPINLOCK(pnv_php_lock);
> +
> +static void pnv_php_register(struct device_node *dn);
> +static void pnv_php_unregister_one(struct device_node *dn);
> +static void pnv_php_unregister(struct device_node *dn);
> +
> +static inline struct pnv_php_slot *pnv_php_get_slot(struct pnv_php_slot *slot)
> +{
> +	if (slot) {
> +		kref_get(&slot->kref);
> +		return slot;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void pnv_php_free_slot(struct kref *kref)
> +{
> +	struct pnv_php_slot *slot = container_of(kref,
> +						 struct pnv_php_slot,
> +						 kref);
> +
> +	WARN_ON(!list_empty(&slot->children));
> +	kfree(slot->name);
> +	kfree(slot);
> +}
> +
> +static inline void pnv_php_put_slot(struct pnv_php_slot *slot)
> +{
> +	if (!slot)
> +		return;
> +
> +	kref_put(&slot->kref, pnv_php_free_slot);
> +}
> +
> +static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
> +					  struct pnv_php_slot *slot)
> +{
> +	struct pnv_php_slot *target, *tmp;
> +
> +	if (slot->dn == dn)
> +		return pnv_php_get_slot(slot);
> +
> +	list_for_each_entry(tmp, &slot->children, link) {
> +		target = pnv_php_match(dn, tmp);
> +		if (target)
> +			return target;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
> +{
> +	struct pnv_php_slot *slot, *tmp;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
> +		slot = pnv_php_match(dn, tmp);
> +		if (slot) {
> +			spin_unlock_irqrestore(&pnv_php_lock, flags);
> +			return slot;
> +		}
> +	}
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	return NULL;
> +}
> +
> +/*
> + * Remove pdn for all children of the indicated device node.
> + * The function should remove pdn in a depth-first manner.
> + */
> +static void pnv_php_rmv_pdns(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_rmv_pdns(child);
> +
> +		pci_remove_device_node_info(child);
> +	}
> +}
> +
> +/*
> + * Remove all child nodes of the indicated device nodes. The
> + * function should remove device nodes in depth-first manner.
> + */
> +static int pnv_php_rmv_device_nodes(struct device_node *parent)
> +{
> +	struct device_node *dn, *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(parent, dn) {
> +		ret = pnv_php_rmv_device_nodes(dn);
> +		if (ret)
> +			return ret;
> +
> +		child = of_get_next_child(dn, NULL);
> +		if (child) {
> +			of_node_put(child);
> +			of_node_put(dn);
> +			pr_err("%s: Alive children of node <%s>\n",
> +			       __func__, of_node_full_name(dn));
> +			return -EBUSY;
> +		}
> +
> +		of_detach_node(dn);
> +		of_node_put(dn);
> +	}
This loop iterates just once, is this correct? If so, then a loop is not 
needed here...
> +
> +	return 0;
> +}
> +
> +/*
> + * The function processes the message sent by firmware
> + * to remove all device tree nodes beneath the slot's
> + * nodes and the associated auxiliary data.
> + */
> +static void pnv_php_handle_poweroff(struct pnv_php_slot *slot)
> +{
> +	int ret;
> +
> +	pnv_php_rmv_pdns(slot->dn);
> +
> +	/*
> +	 * If the device sub-tree was created from OF changeset, simply
> +	 * to revert that. Otherwise, the device nodes in the sub-tree
> +	 * need to be iterated and detached.
> +	 */
> +	if (slot->fdt) {
> +		of_changeset_destroy(&slot->ocs);
> +		kfree(slot->dt);
> +		kfree(slot->fdt);
> +		slot->dt = NULL;
> +		slot->dn->child = NULL;
> +		slot->fdt = NULL;
> +		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +		goto confirm;
> +	}
} else {
> +
> +	ret = pnv_php_rmv_device_nodes(slot->dn);
> +	if (!ret) {
> +		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	} else {
> +		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
> +		dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
> +			 ret);
Could be one line :)
> +	}
> +
}
and remove the label below?
> +confirm:
> +	wake_up_interruptible(&slot->queue);
> +}
> +
> +static int pnv_php_populate_changeset(struct of_changeset *ocs,
> +				      struct device_node *dn)
> +{
> +	struct device_node *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(dn, child) {
> +		ret = of_changeset_attach_node(ocs, child);
> +		if (ret)
> +			return ret;
> +
> +		ret = pnv_php_populate_changeset(ocs, child);
if (ret) break; may be?
> +	}
> +
> +	return ret;
> +}
> +
> +static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
> +{
> +	struct pci_controller *hose = (struct pci_controller *)data;
> +	struct pci_dn *pdn;
> +
> +	pdn = pci_add_device_node_info(hose, dn);
> +	if (!pdn)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return NULL;
> +}
> +
> +static void pnv_php_add_pdns(struct pnv_php_slot *slot)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(slot->bus);
> +
> +	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
> +}
> +
> +static void pnv_php_handle_poweron(struct pnv_php_slot *slot)
> +{
> +	void *fdt, *dt;
> +	uint64_t len;
> +	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
> +	int ret;
> +
> +	/* We don't know the FDT blob size. It tries with incremental
> +	 * sized memory chunk.
> +	 */
> +	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
> +		fdt = kzalloc(len, GFP_KERNEL);
> +		if (!fdt)
> +			break;
> +
> +		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
> +		if (!ret)
> +			break;
> +
> +		kfree(fdt);
> +	}
> +
> +	if (len > 0x10000) {
> +		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
> +		goto out;
This seems like an error but slot->power_state_confirmed will be set to 
PNV_PHP_POWER_CONFIRMED_SUCCESS anyway, is that correct?
> +	}
I'd redo the chunk above like this:
fdt1 = kzalloc(0x10000);
if (!fdt1)
	goto out;
ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt1, 0x10000);
if (!ret)
	goto out;
fdt = kzalloc(fdt_totalsize(fdt1));
if (!fdt)
	goto out;
memcpy(fdt, fdt1, fdt_totalsize(fdt1));
kfree(fdt1);
This way you end up using less memory after setup has completed.
And what is an usual size of the returned blob?
> +
> +	/* Unflatten device tree blob */
> +	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
> +	if (!dt) {
> +		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
> +		goto free_fdt;
> +	}
> +
> +	/* Initialize and apply the changeset */
> +	of_changeset_init(&slot->ocs);
> +	ret = pnv_php_populate_changeset(&slot->ocs, slot->dn);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
> +			 ret);
> +		goto free_dt;
> +	}
> +
> +	slot->dn->child = NULL;
> +	ret = of_changeset_apply(&slot->ocs);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
> +			 ret);
> +		goto destroy_changeset;
> +	}
> +
> +	/* Add device node firmware data */
> +	pnv_php_add_pdns(slot);
> +	slot->fdt = fdt;
> +	slot->dt = dt;
> +	goto out;
> +
> +destroy_changeset:
> +	of_changeset_destroy(&slot->ocs);
> +free_dt:
> +	kfree(dt);
> +	slot->dn->child = NULL;
> +free_fdt:
> +	kfree(fdt);
> +	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
> +out:
> +	/* Confirm status change */
> +	slot->power_state_confirmed = confirm;
> +	wake_up_interruptible(&slot->queue);
> +}
> +
> +static void pnv_php_work(struct work_struct *data)
> +{
> +	struct pnv_php_slot *slot = container_of(data,
> +						 struct pnv_php_slot, work);
> +	uint64_t event = be64_to_cpu(slot->msg->params[0]);
> +
> +	if (event == OPAL_PCI_SLOT_POWER_OFF)
> +		pnv_php_handle_poweroff(slot);
> +	else
> +		pnv_php_handle_poweron(slot);
> +
> +	pnv_php_put_slot(slot);
> +}
> +
> +static int pnv_php_handle_msg(struct notifier_block *nb,
> +			      unsigned long type,
> +			      void *message)
> +{
> +	phandle h;
> +	struct device_node *dn;
> +	struct pnv_php_slot *slot;
> +	struct opal_msg *msg = message;
> +
> +	if (type != OPAL_MSG_PCI_HOTPLUG) {
> +		pr_warn("%s: Invalid message %ld received!\n",
> +			__func__, type);
> +		return NOTIFY_DONE;
> +	}
> +
> +	h = (phandle)be64_to_cpu(msg->params[1]);
> +	dn = of_find_node_by_phandle(h);
> +	if (!dn) {
> +		pr_warn("%s: No device node for phandle 0x%x\n",
> +			__func__, h);
> +		return NOTIFY_DONE;
> +	}
> +
> +	slot = pnv_php_find_slot(dn);
> +	of_node_put(dn);
> +	if (!slot) {
> +		pr_warn("%s: No slot found for node <%s>\n",
> +			__func__, of_node_full_name(dn));
> +		of_node_put(dn);
You already put the node 5 lines above, is this correct?
> +		return NOTIFY_DONE;
> +	}
> +
> +	slot->msg = msg;
> +	schedule_work(&slot->work);
> +	return NOTIFY_OK;
> +}
> +
> +static int pnv_php_set_power_state(struct hotplug_slot *php_slot, u8 state)
> +{
> +	struct pnv_php_slot *slot = php_slot->private;
Most instances of "struct pnv_php_slot" are called "slot".
Most instances of "struct hotplug_slot" are called "php_slot".
When I read this code, I have to remind myself that a "php_slot" variable 
(which has "php" in it) is NOT of the type with "php" (i.e. NOT 
"pnv_php_slot").
I would suggest swapping slot <-> php_slot.
> +	int ret;
> +
> +	slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> +	ret = pnv_pci_set_power_state(slot->id, state);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
> +			 ret, state ? "on" : "off");
> +		return ret;
> +	}
> +
> +	/* Continue to PCI probing after finalized device-tree. The
> +	 * device-tree might have been updated completely at this
> +	 * point. Thus we don't have to always waiting for that.
s/always waiting/wait forever/ ?
> +	 */
> +	if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> +		return 0;
> +	else if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
No need in "else" here.
> +		return -EBUSY;
> +
> +	ret = wait_event_timeout(slot->queue,
> +				 slot->power_state_confirmed, 10 * HZ);
The code flow is unclear in this case.
The queue is signaled from pnv_php_handle_poweron() which is "work" and 
scheduled by pnv_php_handle_msg() and it is not obvious what code calls 
pnv_php_handle_msg().
> +	if (!ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
> +			 ret, state ? "on" : "off");
> +		return -EBUSY;
> +	}
> +
> +	if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
> +		return 0;
> +
> +	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
> +		 slot->power_state_confirmed, state ? "on" : "off");
> +	return -EBUSY;
> +}
> +
> +static int pnv_php_get_power_state(struct hotplug_slot *php_slot, u8 *state)
> +{
> +	struct pnv_php_slot *slot = php_slot->private;
> +	uint8_t power_state;
> +	int ret;
> +
> +	/*
> +	 * Retrieve power status from firmware. If we fail
> +	 * getting that, the power status fails back to
> +	 * be on.
> +	 */
> +	ret = pnv_pci_get_power_state(slot->id, &power_state);
> +	if (ret) {
> +		*state = OPAL_PCI_SLOT_POWER_ON;
> +		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
> +			 ret);
> +	} else {
> +		*state = power_state;
> +		php_slot->info->power_status = power_state;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_php_get_adapter_state(struct hotplug_slot *php_slot, u8 *state)
> +{
> +	struct pnv_php_slot *slot = php_slot->private;
> +	uint8_t presence;
> +	int ret;
> +
> +	/*
> +	 * Retrieve presence status from firmware. If we can't
> +	 * get that, it will fail back to be empty.
> +	 */
> +	ret = pnv_pci_get_presence_state(slot->id, &presence);
> +	if (ret >= 0) {
> +		*state = presence;
> +		php_slot->info->adapter_status = presence;
> +		ret = 0;
> +	} else {
> +		*state = OPAL_PCI_SLOT_EMPTY;
> +		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
> +			 ret);
> +	}
> +
> +	return ret;
> +}
> +
> +static int pnv_php_set_attention_state(struct hotplug_slot *php_slot, u8 state)
> +{
> +	/* FIXME: Make it real once firmware supports it */
> +	php_slot->info->attention_status = state;
> +
> +	return 0;
> +}
> +
> +static int pnv_php_enable(struct pnv_php_slot *slot, bool rescan)
> +{
> +	struct hotplug_slot *php_slot = &slot->php_slot;
> +	uint8_t presence, power_status;
> +	int ret;
> +
> +	/* Check if the slot has been configured */
> +	if (slot->state != PNV_PHP_STATE_REGISTER)
> +		return 0;
> +
> +	/* Retrieve slot presence status */
> +	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
Here and in other places there is no point in dereferencing ops, just call 
pnv_php_get_adapter_state() here directly as you decided not to have a 
separate source file for pnv_php_slot.
> +	if (ret)
> +		return ret;
> +
> +	/* Proceed if there have nothing behind the slot */
> +	if (presence == OPAL_PCI_SLOT_EMPTY)
> +		goto scan;
> +
> +	/*
> +	 * If we don't detect something behind the slot, we need
> +	 * make sure the power suply to the slot is on.
Is this correct - "don't detect" -> "make sure it is on"?
> Otherwise,
> +	 * the slot downstream PCIe linkturn should be down.
> +	 *
> +	 * On the first time, we don't change the power status to
> +	 * boost system boot with assumption that the firmware
Out of curiosity - does it really boost booting? :)
> +	 * supplies consistent slot power status: empty slot always
> +	 * has its power off and non-empty slot has its power on.
> +	 */
> +	if (!slot->power_state_check) {
> +		slot->power_state_check = true;
> +		goto scan;
> +	}
> +
> +	/* Check the power status. Scan the slot if that's already on */
> +	ret = php_slot->ops->get_power_status(php_slot, &power_status);
> +	if (ret)
> +		return ret;
> +
> +	if (power_status == OPAL_PCI_SLOT_POWER_ON)
> +		goto scan;
> +
> +	/* Power is off, turn it on and then scan the slot */
> +	ret = pnv_php_set_power_state(php_slot, OPAL_PCI_SLOT_POWER_ON);
> +	if (ret)
> +		return ret;
> +
> +scan:
> +	if (presence == OPAL_PCI_SLOT_PRESENT) {
> +		if (rescan) {
> +			pci_lock_rescan_remove();
> +			pci_add_pci_devices(slot->bus);
> +			pci_unlock_rescan_remove();
> +		}
> +
> +		/* Rescan for child hotpluggable slots */
> +		slot->state = PNV_PHP_STATE_POPULATED;
> +		if (rescan)
> +			pnv_php_register(slot->dn);
The chunk above adds a parent slot (a physical slot) and then scans for 
children slots (a mighty extended with extra physical slots)? :)
> +	} else {
> +		slot->state = PNV_PHP_STATE_POPULATED;
> +	}
> +
> +	return 0;
> +}
> +
> +static int pnv_php_enable_slot(struct hotplug_slot *php_slot)
> +{
> +	struct pnv_php_slot *slot = container_of(php_slot,
> +						 struct pnv_php_slot,
> +						 php_slot);
> +
> +	return pnv_php_enable(slot, true);
> +}
> +
> +static int pnv_php_disable_slot(struct hotplug_slot *php_slot)
> +{
> +	struct pnv_php_slot *slot = php_slot->private;
> +	uint8_t power_state;
> +	int ret;
> +
> +	if (slot->state != PNV_PHP_STATE_POPULATED)
> +		return 0;
> +
> +	/* Remove all devices behind the slot */
> +	pci_lock_rescan_remove();
> +	pci_remove_pci_devices(slot->bus);
> +	pci_unlock_rescan_remove();
> +
> +	/* Detach the child hotpluggable slots */
> +	pnv_php_unregister(slot->dn);
> +
> +	/*
> +	 * Check the power status and turn it off if necessary. If we
> +	 * fail to get the power status, the power will be forced to
> +	 * be off.
> +	 */
> +	ret = php_slot->ops->get_power_status(php_slot, &power_state);
> +	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
> +		ret = pnv_php_set_power_state(php_slot,
> +					      OPAL_PCI_SLOT_POWER_OFF);
> +		if (ret)
> +			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
> +				 ret);
> +	}
> +
> +	/* Update slot state */
> +	slot->state = PNV_PHP_STATE_REGISTER;
> +	return 0;
> +}
> +
> +static struct hotplug_slot_ops php_slot_ops = {
> +	.get_power_status	= pnv_php_get_power_state,
> +	.get_adapter_status	= pnv_php_get_adapter_state,
> +	.set_attention_status	= pnv_php_set_attention_state,
> +	.enable_slot		= pnv_php_enable_slot,
> +	.disable_slot		= pnv_php_disable_slot,
> +};
> +
> +static void pnv_php_release(struct hotplug_slot *hp_slot)
> +{
> +	struct pnv_php_slot *slot = hp_slot->private;
> +	unsigned long flags;
> +
> +	/* Remove from global or child list */
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	list_del(&slot->link);
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	/* Detach from parent */
> +	pnv_php_put_slot(slot);
> +	pnv_php_put_slot(slot->parent);
> +}
> +
> +static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
> +{
> +	struct device_node *parent = dn;
> +	const __be64 *prop64;
> +	const __be32 *prop32;
> +
> +	/*
> +	 * The hotpluggable slot always has a compound Id, which
> +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
> +	 * number, and compound indicator
> +	 */
> +	*id = (0x1ul << 63);
> +
> +	/* Bus/Slot/Function number */
> +	prop32 = of_get_property(dn, "reg", NULL);
> +	if (!prop32)
> +		return -ENXIO;
> +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
> +
> +	/* PHB Id */
> +	while ((parent = of_get_parent(parent))) {
> +		if (!PCI_DN(parent)) {
> +			of_node_put(parent);
> +			break;
> +		}
> +
> +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
> +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
> +			of_node_put(parent);
> +			continue;
> +		}
> +
> +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
> +		if (!prop64) {
> +			of_node_put(parent);
> +			return -ENXIO;
> +		}
> +
> +		*id |= be64_to_cpup(prop64);
> +		of_node_put(parent);
> +		return 0;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
> +{
> +	struct pnv_php_slot *slot;
> +	struct pci_bus *bus;
> +	const char *label;
> +	uint64_t id;
> +
> +	label = of_get_property(dn, "ibm,slot-label", NULL);
> +	if (!label)
> +		return NULL;
> +
> +	if (pnv_php_get_slot_id(dn, &id))
> +		return NULL;
> +
> +	bus = pci_find_bus_by_node(dn);
> +	if (!bus)
> +		return NULL;
> +
> +	slot = kzalloc(sizeof(*slot), GFP_KERNEL);
> +	if (!slot)
> +		return NULL;
> +
> +	slot->name = kstrdup(label, GFP_KERNEL);
> +	if (!slot->name) {
> +		kfree(slot);
> +		return NULL;
> +	}
> +
> +	if (dn->child && PCI_DN(dn->child))
> +		slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
> +	else
> +		slot->slot_no = -1;   /* Placeholder slot */
> +
> +	kref_init(&slot->kref);
> +	slot->state	            = PNV_PHP_STATE_INIT;
> +	slot->dn	            = dn;
> +	slot->pdev	            = bus->self;
> +	slot->bus	            = bus;
> +	slot->id	            = id;
> +	slot->power_state_check     = false;
> +	slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
> +	slot->php_slot.ops          = &php_slot_ops;
> +	slot->php_slot.info         = &slot->php_slot_info;
> +	slot->php_slot.release      = pnv_php_release;
> +	slot->php_slot.private      = slot;
> +
> +	INIT_WORK(&slot->work, pnv_php_work);
> +	init_waitqueue_head(&slot->queue);
> +	INIT_LIST_HEAD(&slot->children);
> +	INIT_LIST_HEAD(&slot->link);
> +
> +	return slot;
> +}
> +
> +static int pnv_php_register_slot(struct pnv_php_slot *slot)
> +{
> +	struct pnv_php_slot *parent;
> +	struct device_node *dn = slot->dn;
> +	unsigned long flags;
> +	int ret;
> +
> +	/* Check if the slot exists or not */
s/exists/is registered/
> +	parent = pnv_php_find_slot(slot->dn);
> +	if (parent) {
> +		pnv_php_put_slot(parent);
> +		return -EEXIST;
> +	}
> +
> +	/* Register PCI slot */
> +	ret = pci_hp_register(&slot->php_slot, slot->bus,
> +			      slot->slot_no, slot->name);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
> +			 ret);
> +		return ret;
> +	}
> +
> +	/* Attach to the parent's child list or global list */
> +	while ((dn = of_get_parent(dn))) {
> +		if (!PCI_DN(dn)) {
> +			of_node_put(dn);
> +			break;
> +		}
> +
> +		parent = pnv_php_find_slot(dn);
> +		if (parent) {
> +			of_node_put(dn);
> +			break;
> +		}
This is missing here:
of_node_put(dn);
> +	}
> +
> +	spin_lock_irqsave(&pnv_php_lock, flags);
> +	slot->parent = parent;
> +	if (parent)
> +		list_add_tail(&slot->link, &parent->children);
> +	else
> +		list_add_tail(&slot->link, &pnv_php_slot_list);
> +	spin_unlock_irqrestore(&pnv_php_lock, flags);
> +
> +	slot->state = PNV_PHP_STATE_REGISTER;
> +	return 0;
> +}
> +
> +static int pnv_php_register_one(struct device_node *dn)
> +{
> +	struct pnv_php_slot *slot;
> +	const __be32 *prop32;
> +	int ret;
> +
> +	/* Check if it's hotpluggable slot */
> +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	slot = pnv_php_alloc_slot(dn);
> +	if (!slot)
> +		return -ENODEV;
> +
> +	ret = pnv_php_register_slot(slot);
> +	if (ret)
> +		goto free_slot;
> +
> +	ret = pnv_php_enable(slot, false);
> +	if (ret)
> +		goto unregister_slot;
> +
> +	return 0;
> +
> +unregister_slot:
> +	pnv_php_unregister_one(slot->dn);
> +free_slot:
> +	pnv_php_put_slot(slot);
> +	return ret;
> +}
> +
> +static void pnv_php_register(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/*
> +	 * The parent slots should be registered before their
> +	 * child slots.
> +	 */
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_register_one(child);
> +		pnv_php_register(child);
> +	}
> +}
> +
> +static void pnv_php_unregister_one(struct device_node *dn)
> +{
> +	struct pnv_php_slot *slot;
> +
> +	slot = pnv_php_find_slot(dn);
> +	if (!slot)
> +		return;
> +
> +	pnv_php_put_slot(slot);
> +	pci_hp_deregister(&slot->php_slot);
> +}
> +
> +static void pnv_php_unregister(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/* The child slots should go before their parent slots */
> +	for_each_child_of_node(dn, child) {
> +		pnv_php_unregister(child);
> +		pnv_php_unregister_one(child);
> +	}
> +}
> +
> +static struct notifier_block php_msg_nb = {
> +	.notifier_call	= pnv_php_handle_msg,
> +	.next		= NULL,
> +	.priority	= 0,
> +};
> +
> +static int __init pnv_php_init(void)
> +{
> +	struct device_node *dn;
> +	int ret;
> +
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +
> +	/* Register hotplug message handler */
> +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
> +	if (ret) {
> +		pr_warn("%s: Error %d registering hotplug notifier\n",
> +			__func__, ret);
> +		return ret;
> +	}
> +
> +	/* Scan PHB nodes and their children */
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_register(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_register(dn);
> +
> +	return 0;
> +}
> +
> +static void __exit pnv_php_exit(void)
> +{
> +	struct device_node *dn;
> +
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		pnv_php_unregister(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		pnv_php_unregister(dn);
> +
> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
> +}
> +
> +module_init(pnv_php_init);
> +module_exit(pnv_php_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption
  2015-11-17  1:04     ` Gavin Shan
@ 2015-11-19  0:10       ` Alexey Kardashevskiy
  2015-11-23 22:42         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-19  0:10 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 12:04 PM, Gavin Shan wrote:
> On Mon, Nov 16, 2015 at 07:01:59PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> As we track M32 segment consumption, this introduces an array to
>>> the PHB to track the mapping between M64 segment and PE number.
>>> The information is going to be used to find M64 segment from the
>>> PE number during PCI unplugging time in subsequent patches.
>>
>>
>> It would not hurt to put a few words about how we managed to live without
>> such a mapping for M64 before but we needed mapping for M32.
>>
>
> The M32 mapping (phb->ioda.m32_segmap[]) isn't used for anything before
> this patcheset. There're no need for M64 mapping before this patchset
> similarly, no need to add the words.
After years I learned that reviewers ask less questions about new but yet 
unused code when I put few words in the commit log confirming that it is 
not used now but it will be used for <here I put what it is for> later.
And it is not obvious that m32_segment is not used. And m64_segmap is 
started being used only 13 patches later in:
[PATCH v7 27/50] powerpc/powernv: Dynamically release PEs
which is quite far, complicates reviewing. 12/50 is better be moved there 
(to make it 26/50) or just merged into 27/50.
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-17  1:37     ` Gavin Shan
@ 2015-11-19  0:18       ` Alexey Kardashevskiy
  2015-11-22 22:46         ` Gavin Shan
  0 siblings, 1 reply; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-19  0:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 12:37 PM, Gavin Shan wrote:
> On Mon, Nov 16, 2015 at 07:01:46PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> This enables M64 window on P7IOC, which has been enabled on PHB3.
>>> Different from PHB3 where 16 M64 BARs are supported and each of
>>> them can be owned by one particular PE# exclusively or divided
>>> evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>> of them are divided to 8 segments. So every P7IOC PHB supports
>>> 128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>> M64DT, indicating that one M64 segment can only be pinned to the
>>> fixed PE#. In order to have same code to support M64 on P7IOC and
>>> PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>> of them is pinned to the fixed PE# by bypassing the function of
>>> M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>> and PHB3 to support M64.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>>   arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>>   2 files changed, 86 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 1f7d985..bfe69f1 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>   	}
>>>   }
>>>
>>> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>> +{
>>> +	struct resource *r;
>>> +	int index;
>>> +
>>> +	/*
>>> +	 * There are 16 M64 BARs, each of which has 8 segments. So
>>> +	 * there are as many M64 segments as the maximum number of
>>> +	 * PEs, which is 128.
>>> +	 */
>>> +	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>> +		unsigned long base, segsz = phb->ioda.m64_segsize;
>>> +		int64_t rc;
>>> +
>>> +		base = phb->ioda.m64_base +
>>> +		       index * PNV_IODA1_M64_SEGS * segsz;
>>> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>> +				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>> +				PNV_IODA1_M64_SEGS * segsz);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, index);
>>> +			goto fail;
>>> +		}
>>> +
>>> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>> +				OPAL_M64_WINDOW_TYPE, index,
>>> +				OPAL_ENABLE_M64_SPLIT);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, index);
>>> +			goto fail;
>>> +		}
>>> +	}
>>> +
>>> +	/*
>>> +	 * Exclude the segment used by the reserved PE, which
>>> +	 * is expected to be 0 or last supported PE#.
>>> +	 */
>>> +	r = &phb->hose->mem_resources[1];
>>> +	if (phb->ioda.reserved_pe_idx == 0)
>>> +		r->start += phb->ioda.m64_segsize;
>>> +	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>> +		r->end -= phb->ioda.m64_segsize;
>>> +	else
>>> +		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>> +			phb->ioda.reserved_pe_idx);
>>> +
>>> +	return 0;
>>> +
>>> +fail:
>>> +	for ( ; index >= 0; index--)
>>> +		opal_pci_phb_mmio_enable(phb->opal_id,
>>> +			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
>>> +
>>> +	return -EIO;
>>> +}
>>> +
>>>   static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>>   				    unsigned long *pe_bitmap,
>>>   				    bool all)
>>> @@ -325,6 +383,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   			pe->master = master_pe;
>>>   			list_add_tail(&pe->list, &master_pe->slaves);
>>>   		}
>>> +
>>> +		/*
>>> +		 * P7IOC supports M64DT, which helps mapping M64 segment
>>> +		 * to one particular PE#. However, PHB3 has fixed mapping
>>> +		 * between M64 segment and PE#. In order to have same logic
>>> +		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>> +		 * segment and PE# on P7IOC.
>>> +		 */
>>> +		if (phb->type == PNV_PHB_IODA1) {
>>> +			int64_t rc;
>>> +
>>> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> +					pe->pe_number, OPAL_M64_WINDOW_TYPE,
>>> +					pe->pe_number / PNV_IODA1_M64_SEGS,
>>> +					pe->pe_number % PNV_IODA1_M64_SEGS);
>>> +			if (rc != OPAL_SUCCESS)
>>> +				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>> +					__func__, rc, phb->hose->global_number,
>>> +					pe->pe_number);
>>> +		}
>>>   	}
>>>
>>>   	kfree(pe_alloc);
>>> @@ -339,8 +417,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>   	const u32 *r;
>>>   	u64 pci_addr;
>>>
>>> -	/* FIXME: Support M64 for P7IOC */
>>> -	if (phb->type != PNV_PHB_IODA2) {
>>> +	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>>>   		pr_info("  Not support M64 window\n");
>>>   		return;
>>>   	}
>>> @@ -373,7 +450,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>
>>>   	/* Use last M64 BAR to cover M64 window */
>>>   	phb->ioda.m64_bar_idx = 15;
>>> -	phb->init_m64 = pnv_ioda2_init_m64;
>>> +	if (phb->type == PNV_PHB_IODA1)
>>> +		phb->init_m64 = pnv_ioda1_init_m64;
>>> +	else
>>> +		phb->init_m64 = pnv_ioda2_init_m64;
>>>   	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>>   	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>
>>
>> Nit: the callbacks initialization does not seem to relate to parsing any
>> window :) They could all go to where pnv_ioda_parse_m64_window() is called,
>> no separate patch is needed.
>>
>
> Well, what's the benifit for that? I personally prefer the way I had: initialize
> all callbacks in one place, not in separate places.
One place is good, agree. Which should have been a single phv_phb_ops 
instance as it is done everywhere else in the kernel. And the existing 
pnv_phb's callbacks are already initialized in various places, not a single 
one.
> However, if you have good
> reason to support your suggestion, I'll change accordingly for sure.
Nah, leave it as is for now, we will probably change this later.
>
>>
>>>   }
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 671fd13..c4019ac 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -78,6 +78,9 @@ struct pnv_ioda_pe {
>>>   	struct list_head	list;
>>>   };
>>>
>>> +#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>> +#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>> +
>>>   #define PNV_PHB_FLAG_EEH	(1 << 0)
>>>
>>>   struct pnv_phb {
>>>
>
> Thanks,
> Gavin
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus
  2015-11-17  9:06     ` Gavin Shan
@ 2015-11-19  0:21       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-19  0:21 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/17/2015 08:06 PM, Gavin Shan wrote:
> On Tue, Nov 17, 2015 at 05:04:42PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> We're going to reserve/assign PEs when pcibios_setup_bridge() is
>>> called. The function won't be called for root bus as it doesn't
>>> have parent bridge. However, the root bus still needs a PE to be
>>> covered.
>>>
>>> This reserves PE numbers that are adjacent to the reserved one
>>> for root buses.
>>
>>
>> Somewhere in the patchset you need to describe why you need a separate PE for
>> a root bus and why reserved_pe_idx is not enough for this.
>>
>
> Please confirm if it's fine to add the descrption in this patch's chagelog.
Yes, it is fine. Thanks!
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation
  2015-11-04 13:12 ` [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation Gavin Shan
@ 2015-11-20  3:14   ` Daniel Axtens
  0 siblings, 0 replies; 157+ messages in thread
From: Daniel Axtens @ 2015-11-20  3:14 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: devicetree, aik, linux-pci, panto, Gavin Shan, grant.likely,
	robherring2, bhelgaas, frowand.list
[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]
> +	total_weight = pnv_pci_ioda_total_dma_weight(phb);
> +	weight = pnv_pci_ioda_pe_dma_weight(pe);
> +	if (!total_weight || !weight)
> +		return;
> +
> +	segs = (weight * phb->ioda.dma32_count) / total_weight;
> +	if (!segs)
> +		segs = 1;
I'm a little bit concerned about rounding here. Having said that I've
also lost track of dma32_count: if it's big then rounding won't
matter. What's a typical dma32_count?
> +
> +	/*
> +	 * Allocate continuous DMA32 segments. We begin with the expected
Very much a nit pick, but I think you mean s/continuous/contiguous/.
> +	 * number of segments. With one more attempt, the number of DMA32
> +	 * segments to be allocated is decreased by one until one segment
> +	 * is allocated successfully.
> +	 */
> +	while (segs) {
> +		found = false;
> +		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
> +			for (i = base; i < base + segs; i++) {
> +				if (phb->ioda.dma32_segmap[i] !=
> +				    IODA_INVALID_PE)
> +					break;
> +			}
> +
> +			if (i >= base + segs) {
How would `i' ever be greater than base + segs? Should the test just
be 'if (i == base + segs) {' 
> +				found = true;
> +				break;
> +			}
> +		}
Regards,
Daniel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC
  2015-11-19  0:18       ` Alexey Kardashevskiy
@ 2015-11-22 22:46         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-22 22:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Thu, Nov 19, 2015 at 11:18:46AM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 12:37 PM, Gavin Shan wrote:
>>On Mon, Nov 16, 2015 at 07:01:46PM +1100, Alexey Kardashevskiy wrote:
>>>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>>>Different from PHB3 where 16 M64 BARs are supported and each of
>>>>them can be owned by one particular PE# exclusively or divided
>>>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>>>of them are divided to 8 segments. So every P7IOC PHB supports
>>>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>>>M64DT, indicating that one M64 segment can only be pinned to the
>>>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>>>of them is pinned to the fixed PE# by bypassing the function of
>>>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>>>and PHB3 to support M64.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +++++++++++++++++++++++++++++--
>>>>  arch/powerpc/platforms/powernv/pci.h      |  3 ++
>>>>  2 files changed, 86 insertions(+), 3 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 1f7d985..bfe69f1 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -256,6 +256,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>>  	}
>>>>  }
>>>>
>>>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>>>+{
>>>>+	struct resource *r;
>>>>+	int index;
>>>>+
>>>>+	/*
>>>>+	 * There are 16 M64 BARs, each of which has 8 segments. So
>>>>+	 * there are as many M64 segments as the maximum number of
>>>>+	 * PEs, which is 128.
>>>>+	 */
>>>>+	for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>>>+		unsigned long base, segsz = phb->ioda.m64_segsize;
>>>>+		int64_t rc;
>>>>+
>>>>+		base = phb->ioda.m64_base +
>>>>+		       index * PNV_IODA1_M64_SEGS * segsz;
>>>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>>>+				OPAL_M64_WINDOW_TYPE, index, base, 0,
>>>>+				PNV_IODA1_M64_SEGS * segsz);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>>>+				rc, phb->hose->global_number, index);
>>>>+			goto fail;
>>>>+		}
>>>>+
>>>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>+				OPAL_M64_WINDOW_TYPE, index,
>>>>+				OPAL_ENABLE_M64_SPLIT);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>>>+				rc, phb->hose->global_number, index);
>>>>+			goto fail;
>>>>+		}
>>>>+	}
>>>>+
>>>>+	/*
>>>>+	 * Exclude the segment used by the reserved PE, which
>>>>+	 * is expected to be 0 or last supported PE#.
>>>>+	 */
>>>>+	r = &phb->hose->mem_resources[1];
>>>>+	if (phb->ioda.reserved_pe_idx == 0)
>>>>+		r->start += phb->ioda.m64_segsize;
>>>>+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>>>+		r->end -= phb->ioda.m64_segsize;
>>>>+	else
>>>>+		pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>>>+			phb->ioda.reserved_pe_idx);
>>>>+
>>>>+	return 0;
>>>>+
>>>>+fail:
>>>>+	for ( ; index >= 0; index--)
>>>>+		opal_pci_phb_mmio_enable(phb->opal_id,
>>>>+			OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
>>>>+
>>>>+	return -EIO;
>>>>+}
>>>>+
>>>>  static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>>>  				    unsigned long *pe_bitmap,
>>>>  				    bool all)
>>>>@@ -325,6 +383,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  			pe->master = master_pe;
>>>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>>>  		}
>>>>+
>>>>+		/*
>>>>+		 * P7IOC supports M64DT, which helps mapping M64 segment
>>>>+		 * to one particular PE#. However, PHB3 has fixed mapping
>>>>+		 * between M64 segment and PE#. In order to have same logic
>>>>+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>>>+		 * segment and PE# on P7IOC.
>>>>+		 */
>>>>+		if (phb->type == PNV_PHB_IODA1) {
>>>>+			int64_t rc;
>>>>+
>>>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>+					pe->pe_number, OPAL_M64_WINDOW_TYPE,
>>>>+					pe->pe_number / PNV_IODA1_M64_SEGS,
>>>>+					pe->pe_number % PNV_IODA1_M64_SEGS);
>>>>+			if (rc != OPAL_SUCCESS)
>>>>+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>>>+					__func__, rc, phb->hose->global_number,
>>>>+					pe->pe_number);
>>>>+		}
>>>>  	}
>>>>
>>>>  	kfree(pe_alloc);
>>>>@@ -339,8 +417,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>>  	const u32 *r;
>>>>  	u64 pci_addr;
>>>>
>>>>-	/* FIXME: Support M64 for P7IOC */
>>>>-	if (phb->type != PNV_PHB_IODA2) {
>>>>+	if (phb->type != PNV_PHB_IODA1 && phb->type != PNV_PHB_IODA2) {
>>>>  		pr_info("  Not support M64 window\n");
>>>>  		return;
>>>>  	}
>>>>@@ -373,7 +450,10 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>>
>>>>  	/* Use last M64 BAR to cover M64 window */
>>>>  	phb->ioda.m64_bar_idx = 15;
>>>>-	phb->init_m64 = pnv_ioda2_init_m64;
>>>>+	if (phb->type == PNV_PHB_IODA1)
>>>>+		phb->init_m64 = pnv_ioda1_init_m64;
>>>>+	else
>>>>+		phb->init_m64 = pnv_ioda2_init_m64;
>>>>  	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>>>  	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>>
>>>
>>>Nit: the callbacks initialization does not seem to relate to parsing any
>>>window :) They could all go to where pnv_ioda_parse_m64_window() is called,
>>>no separate patch is needed.
>>>
>>
>>Well, what's the benifit for that? I personally prefer the way I had: initialize
>>all callbacks in one place, not in separate places.
>
>One place is good, agree. Which should have been a single phv_phb_ops
>instance as it is done everywhere else in the kernel. And the existing
>pnv_phb's callbacks are already initialized in various places, not a single
>one.
>
Ok, but that's separate issue not related to this patchset. I guess the issue
you're taling about can be fixed in future, not in this patchset.
>>However, if you have good
>>reason to support your suggestion, I'll change accordingly for sure.
>
>Nah, leave it as is for now, we will probably change this later.
>
Thanks.
>>
>>>
>>>>  }
>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>index 671fd13..c4019ac 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>@@ -78,6 +78,9 @@ struct pnv_ioda_pe {
>>>>  	struct list_head	list;
>>>>  };
>>>>
>>>>+#define PNV_IODA1_M64_NUM	16	/* Number of M64 BARs   */
>>>>+#define PNV_IODA1_M64_SEGS	8	/* Segments per M64 BAR */
>>>>+
>>>>  #define PNV_PHB_FLAG_EEH	(1 << 0)
>>>>
>>>>  struct pnv_phb {
>>>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release
  2015-11-18  0:13       ` Alexey Kardashevskiy
@ 2015-11-22 22:52         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-22 22:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, devicetree, linux-pci, panto,
	grant.likely, robherring2, bhelgaas, frowand.list
On Wed, Nov 18, 2015 at 11:13:55AM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 08:03 PM, Gavin Shan wrote:
>>On Tue, Nov 17, 2015 at 04:08:30PM +1100, Alexey Kardashevskiy wrote:
>>>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>In current implementation, the PEs that are allocated or picked
>>>>from the reserved list are identified by PE number. The PE instance
>>>>has to be picked according to the PE number eventually. We have
>>>>same issue when PE is released.
>>>>
>>>>For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
>>>>PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
>>>>or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
>>>>returns the reserved/allocated PE instance to be used in subsequent
>>>>patches. On the other hand, pnv_ioda_free_pe() uses PE instance
>>>>(not number) as its argument. No logical changes introduced.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++++++++++++++++--------------
>>>>  arch/powerpc/platforms/powernv/pci.h      |  2 +-
>>>>  2 files changed, 46 insertions(+), 37 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 488e0f8..ae82df1 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -152,7 +152,7 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>>  	pnv_ioda_init_pe(phb, pe_no);
>>>>  }
>>>>
>>>>-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>>+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>>  {
>>>>  	unsigned long pe;
>>>>
>>>>@@ -160,19 +160,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>>  		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>>>  					phb->ioda.total_pe_num, 0);
>>>>  		if (pe >= phb->ioda.total_pe_num)
>>>>-			return IODA_INVALID_PE;
>>>>+			return NULL;
>>>>  	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>>>
>>>>-	pnv_ioda_init_pe(phb, pe);
>>>>-	return pe;
>>>>+	return pnv_ioda_init_pe(phb, pe);
>>>>  }
>>>>
>>>>-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>>>+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>>>>  {
>>>>-	WARN_ON(phb->ioda.pe_array[pe].pdev);
>>>>+	struct pnv_phb *phb = pe->phb;
>>>>+
>>>>+	WARN_ON(pe->pdev);
>>>>
>>>>-	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
>>>>-	clear_bit(pe, phb->ioda.pe_alloc);
>>>>+	memset(pe, 0, sizeof(struct pnv_ioda_pe));
>>>>+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
>>>>  }
>>>>
>>>>  /* The default M64 BAR is shared by all PEs */
>>>>@@ -332,7 +333,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>>>  	}
>>>>  }
>>>>
>>>>-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  {
>>>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>@@ -342,7 +343,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>
>>>>  	/* Root bus shouldn't use M64 */
>>>>  	if (pci_is_root_bus(bus))
>>>>-		return IODA_INVALID_PE;
>>>>+		return NULL;
>>>>
>>>>  	/* Allocate bitmap */
>>>>  	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>>>@@ -350,7 +351,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  	if (!pe_alloc) {
>>>>  		pr_warn("%s: Out of memory !\n",
>>>>  			__func__);
>>>>-		return IODA_INVALID_PE;
>>>>+		return NULL;
>>>>  	}
>>>>
>>>>  	/* Figure out reserved PE numbers by the PE */
>>>>@@ -363,7 +364,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  	 */
>>>>  	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
>>>>  		kfree(pe_alloc);
>>>>-		return IODA_INVALID_PE;
>>>>+		return NULL;
>>>>  	}
>>>>
>>>>  	/*
>>>>@@ -409,7 +410,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  	}
>>>>
>>>>  	kfree(pe_alloc);
>>>>-	return master_pe->pe_number;
>>>>+	return master_pe;
>>>>  }
>>>>
>>>>  static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>>@@ -988,28 +989,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>>>   * subordinate PCI devices and buses. The second type of PE is normally
>>>>   * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>>>>   */
>>>>-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>>+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>>  {
>>>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>-	struct pnv_ioda_pe *pe;
>>>>-	int pe_num = IODA_INVALID_PE;
>>>>+	struct pnv_ioda_pe *pe = NULL;
>>>>
>>>>  	/* Check if PE is determined by M64 */
>>>>  	if (phb->pick_m64_pe)
>>>>-		pe_num = phb->pick_m64_pe(bus, all);
>>>>+		pe = phb->pick_m64_pe(bus, all);
>>>>
>>>>  	/* The PE number isn't pinned by M64 */
>>>>-	if (pe_num == IODA_INVALID_PE)
>>>>-		pe_num = pnv_ioda_alloc_pe(phb);
>>>>+	if (!pe)
>>>>+		pe = pnv_ioda_alloc_pe(phb);
>>>>
>>>>-	if (pe_num == IODA_INVALID_PE) {
>>>>+	if (!pe) {
>>>>  		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
>>>>  			__func__, pci_domain_nr(bus), bus->number);
>>>>-		return;
>>>>+		return NULL;
>>>>  	}
>>>>
>>>>-	pe = &phb->ioda.pe_array[pe_num];
>>>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>>>  	pe->pbus = bus;
>>>>  	pe->pdev = NULL;
>>>>@@ -1018,17 +1017,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>>
>>>>  	if (all)
>>>>  		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>>>-			bus->busn_res.start, bus->busn_res.end, pe_num);
>>>>+			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>>>>  	else
>>>>  		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
>>>>-			bus->busn_res.start, pe_num);
>>>>+			bus->busn_res.start, pe->pe_number);
>>>>
>>>>  	if (pnv_ioda_configure_pe(phb, pe)) {
>>>>  		/* XXX What do we do here ? */
>>>>-		if (pe_num)
>>>>-			pnv_ioda_free_pe(phb, pe_num);
>>>>+		pnv_ioda_free_pe(pe);
>>>>  		pe->pbus = NULL;
>>>>-		return;
>>>>+		return NULL;
>>>>  	}
>>>>
>>>>  	/* Associate it with all child devices */
>>>>@@ -1036,6 +1034,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>>
>>>>  	/* Put PE to the list */
>>>>  	list_add_tail(&pe->list, &phb->ioda.pe_list);
>>>>+
>>>>+	return pe;
>>>>  }
>>>>
>>>>  static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>>>@@ -1267,7 +1267,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>>
>>>>  		pnv_ioda_deconfigure_pe(phb, pe);
>>>>
>>>>-		pnv_ioda_free_pe(phb, pe->pe_number);
>>>>+		pnv_ioda_free_pe(pe);
>>>>  	}
>>>>  }
>>>>
>>>>@@ -1276,6 +1276,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>>  	struct pci_bus        *bus;
>>>>  	struct pci_controller *hose;
>>>>  	struct pnv_phb        *phb;
>>>>+	struct pnv_ioda_pe    *pe;
>>>>  	struct pci_dn         *pdn;
>>>>  	struct pci_sriov      *iov;
>>>>  	u16                    num_vfs, i;
>>>>@@ -1300,8 +1301,11 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>>  		/* Release PE numbers */
>>>>  		if (pdn->m64_single_mode) {
>>>>  			for (i = 0; i < num_vfs; i++) {
>>>>-				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>>>-					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>>>+				if (pdn->pe_num_map[i] == IODA_INVALID_PE)
>>>>+					continue;
>>>>+
>>>>+				pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
>>>>+				pnv_ioda_free_pe(pe);
>>>>  			}
>>>>  		} else
>>>>  			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>>>@@ -1354,9 +1358,8 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>
>>>>  		if (pnv_ioda_configure_pe(phb, pe)) {
>>>>  			/* XXX What do we do here ? */
>>>>-			if (pe_num)
>>>>-				pnv_ioda_free_pe(phb, pe_num);
>>>>  			pe->pdev = NULL;
>>>>+			pnv_ioda_free_pe(pe);
>>>
>>>
>>>
>>>pnv_ioda_free_pe() does WARN_ON(pdev). Before this patch you would free PE
>>>first and then reset pe->pdev, now you reset it first, then call
>>>pnv_ioda_free_pe(). This change is not just about "Use PE instead of number
>>>during setup and release", is/was that a bug?
>>>
>>>And I fail to see when pe->pdev could get initialized in
>>>pnv_ioda_configure_pe() as pnv_pci_dma_dev_setup() should not be called while
>>>pnv_ioda_setup_vf_PE() is working.
>>>
>>
>>It wasn't or isn't a bug as
>
>
>There is an unexplained change in behavior - after the patch pe->pdev gets
>cleaned before pnv_ioda_free_pe(), before the patch it was opposite. Your
>options are:
>- remove "No logical changes introduced" from the commit log and explain the
>change or
>- move "pe->pdev = NULL;" after pnv_ioda_free_pe().
>
I'll take option#1: to drop this specific change in next revision.
>
>
>>pe->pdev is initialized in arch/powerpc/platform/powernv/pci.c::
>>pnv_pci_dma_dev_setup()
>
>So when pnv_ioda_setup_vf_PE() starts working, it is valid for pe->pdev to
>have not-NULL pointer? Because nothing in pnv_ioda_setup_vf_PE() calls
>pnv_pci_dma_dev_setup(), explicitly or implicitly.
>
Correct.
>
>
>>>
>>>>  			continue;
>>>>  		}
>>>>
>>>>@@ -1374,6 +1377,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>  	struct pci_bus        *bus;
>>>>  	struct pci_controller *hose;
>>>>  	struct pnv_phb        *phb;
>>>>+	struct pnv_ioda_pe    *pe;
>>>>  	struct pci_dn         *pdn;
>>>>  	int                    ret;
>>>>  	u16                    i;
>>>>@@ -1416,11 +1420,13 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>  		/* Calculate available PE for required VFs */
>>>>  		if (pdn->m64_single_mode) {
>>>>  			for (i = 0; i < num_vfs; i++) {
>>>>-				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
>>>>-				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
>>>>+				pe = pnv_ioda_alloc_pe(phb);
>>>>+				if (!pe) {
>>>>  					ret = -EBUSY;
>>>>  					goto m64_failed;
>>>>  				}
>>>>+
>>>>+				pdn->pe_num_map[i] = pe->pe_number;
>>>>  			}
>>>>  		} else {
>>>>  			mutex_lock(&phb->ioda.pe_alloc_mutex);
>>>>@@ -1465,8 +1471,11 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>  m64_failed:
>>>>  	if (pdn->m64_single_mode) {
>>>>  		for (i = 0; i < num_vfs; i++) {
>>>>-			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>>>-				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>>>+			if (pdn->pe_num_map[i] == IODA_INVALID_PE)
>>>>+				continue;
>>>>+
>>>>+			pe = &phb->ioda.pe_array[pdn->pe_num_map[i]];
>>>>+			pnv_ioda_free_pe(pe);
>>>>  		}
>>>>  	} else
>>>>  		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>index 5df945f..e55ab0e 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>@@ -105,7 +105,7 @@ struct pnv_phb {
>>>>  	int (*init_m64)(struct pnv_phb *phb);
>>>>  	void (*reserve_m64_pe)(struct pci_bus *bus,
>>>>  			       unsigned long *pe_bitmap, bool all);
>>>>-	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
>>>>+	struct pnv_ioda_pe *(*pick_m64_pe)(struct pci_bus *bus, bool all);
>>>>  	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>>>>  	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>>>>  	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>>>>
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption
  2015-11-19  0:10       ` Alexey Kardashevskiy
@ 2015-11-23 22:42         ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 22:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Thu, Nov 19, 2015 at 11:10:42AM +1100, Alexey Kardashevskiy wrote:
>On 11/17/2015 12:04 PM, Gavin Shan wrote:
>>On Mon, Nov 16, 2015 at 07:01:59PM +1100, Alexey Kardashevskiy wrote:
>>>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>>>As we track M32 segment consumption, this introduces an array to
>>>>the PHB to track the mapping between M64 segment and PE number.
>>>>The information is going to be used to find M64 segment from the
>>>>PE number during PCI unplugging time in subsequent patches.
>>>
>>>
>>>It would not hurt to put a few words about how we managed to live without
>>>such a mapping for M64 before but we needed mapping for M32.
>>>
>>
>>The M32 mapping (phb->ioda.m32_segmap[]) isn't used for anything before
>>this patcheset. There're no need for M64 mapping before this patchset
>>similarly, no need to add the words.
>
>After years I learned that reviewers ask less questions about new but yet
>unused code when I put few words in the commit log confirming that it is not
>used now but it will be used for <here I put what it is for> later.
>
>And it is not obvious that m32_segment is not used. And m64_segmap is started
>being used only 13 patches later in:
>
>[PATCH v7 27/50] powerpc/powernv: Dynamically release PEs
>
>which is quite far, complicates reviewing. 12/50 is better be moved there (to
>make it 26/50) or just merged into 27/50.
>
It doesn't make sense to me. As said in PATCH[00/50], the patchset consists of
3 separate parts: PowerNV PCI rework; Using PCI slot; Hotplug standalone driver;
For the first part ("PowerNV PCI rework"), the patches are organized in order:
refactor/cleanup, IO/M32/M64, DMA, PE allocation/deallocation. So I don't think
I need move the patch around if you don't have a stronger reason.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs
  2015-11-18  2:23   ` Alexey Kardashevskiy
@ 2015-11-23 23:06     ` Gavin Shan
  2015-11-24  0:22       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 23:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Wed, Nov 18, 2015 at 01:23:05PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This adds a reference count of PE, representing the number of PCI
>>devices associated with the PE. The reference count is increased
>>or decreased when PCI devices join or leave the PE. Once it becomes
>>zero, the PE together with its used resources (IO, MMIO, DMA, PELTM,
>>PELTV) are released to support PCI hot unplug.
>
>
>The commit log suggest the patch only adds a counter, initializes it, and
>replaces unconditional release of an object (in this case - PE) with the
>conditional one. But it is more that that...
>
Yes, it's more than that as stated in the commit log.
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 245 ++++++++++++++++++++++++++----
>>  arch/powerpc/platforms/powernv/pci.h      |   1 +
>>  2 files changed, 218 insertions(+), 28 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 0bb0056..dcffce5 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -129,6 +129,215 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>  		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>  }
>>
>>+static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+	struct iommu_table *tbl;
>>+	int start, count, i;
>>+	int64_t rc;
>>+
>>+	/* Search for the used DMA32 segments */
>>+	start = -1;
>>+	count = 0;
>>+	for (i = 0; i < phb->ioda.dma32_count; i++) {
>>+		if (phb->ioda.dma32_segmap[i] != pe->pe_number)
>>+			continue;
>>+
>>+		count++;
>>+		if (start < 0)
>>+			start = i;
>>+	}
>>+
>>+	if (!count)
>>+		return;
>
>
>imho checking pe->table_group.tables[0] != NULL is shorter than the loop above.
>
Will use it in next revision.
>>+
>>+	/* Unlink IOMMU table from group */
>>+	tbl = pe->table_group.tables[0];
>>+	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>>+	if (pe->table_group.group) {
>>+		iommu_group_put(pe->table_group.group);
>>+		WARN_ON(pe->table_group.group);
>>+	}
>>+
>>+	/* Release IOMMU table */
>>+	pnv_pci_ioda2_table_free_pages(tbl);
>
>
>This is IODA2 helper with multilevel support, does IODA1 support multilevel
>TCE tables? If not, it should WARN_ON on levels!=1.
>
>Another thing is you should first unprogram TVEs (via
>opal_pci_map_pe_dma_window), then invalidate the cache (if required, not sure
>if this is needed on IODA1), only then free the actual table.
>
>
>>+	iommu_free_table(tbl, of_node_full_name(pci_bus_to_OF_node(pe->pbus)));
>>+
>>+	/* Disable TVE */
>>+	for (i = start; i < start + count; i++) {
>>+		rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
>>+						i, 0, 0ul, 0ul, 0ul);
>>+		if (rc)
>>+			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
>>+				rc, i);
>>+
>>+		phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>>+	}
>
>
>You could implement pnv_pci_ioda1_unset_window/pnv_ioda1_table_free as
>callbacks, change pnv_pci_ioda2_release_dma_pe() to use them (and rename it
>to reflect that it supports IODA1 and IODA2).
>
>
>>+}
>>+
>>+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
>>+static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
>>+		int num);
>>+static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
>>+
>>+static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)
>
>
>You moved this function and changed it, please do one thing at once (which is
>"change", not "move").
>
>>+{
>>+	struct iommu_table *tbl;
>>+	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+	int64_t rc;
>>+
>>+	if (!weight)
>>+		return;
>
>
>Checking for pe->table_group.group is better because if we ever change the
>logic of what gets included to an IOMMU group, we will have to do the change
>where we add devices to a group but we won't have to touch releasing code.
>
>
>>+
>>+	tbl = pe->table_group.tables[0];
>>+	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>>+	if (rc)
>>+		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>>+
>>+	pnv_pci_ioda2_set_bypass(pe, false);
>>+	if (pe->table_group.group) {
>>+		iommu_group_put(pe->table_group.group);
>>+		WARN_ON(pe->table_group.group);
>>+	}
>>+
>>+	pnv_pci_ioda2_table_free_pages(tbl);
>>+	iommu_free_table(tbl, "pnv");
>>+}
>>+
>>+static void pnv_ioda_release_dma_pe(struct pnv_ioda_pe *pe)
>
>Merge this into pnv_ioda_release_pe() - it is small and called just once.
>
>
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+
>>+	switch (phb->type) {
>>+	case PNV_PHB_IODA1:
>>+		pnv_pci_ioda1_release_dma_pe(pe);
>>+		break;
>>+	case PNV_PHB_IODA2:
>>+		pnv_pci_ioda2_release_dma_pe(pe);
>>+		break;
>>+	default:
>>+		WARN_ON(1);
>>+	}
>>+}
>>+
>>+static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win)
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+	int index, *segmap = NULL;
>>+	int64_t rc;
>>+
>>+	switch (win) {
>>+	case OPAL_IO_WINDOW_TYPE:
>>+		segmap = phb->ioda.io_segmap;
>>+		break;
>>+	case OPAL_M32_WINDOW_TYPE:
>>+		segmap = phb->ioda.m32_segmap;
>>+		break;
>>+	case OPAL_M64_WINDOW_TYPE:
>>+		if (phb->type != PNV_PHB_IODA1)
>>+			return;
>>+		segmap = phb->ioda.m64_segmap;
>>+		break;
>>+	default:
>>+		return;
>
>Unnecessary return.
>
>
>>+	}
>>+
>>+	for (index = 0; index < phb->ioda.total_pe_num; index++) {
>>+		if (segmap[index] != pe->pe_number)
>>+			continue;
>>+
>>+		if (win == OPAL_M64_WINDOW_TYPE)
>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+					phb->ioda.reserved_pe_idx, win,
>>+					index / PNV_IODA1_M64_SEGS,
>>+					index % PNV_IODA1_M64_SEGS);
>>+		else
>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+					phb->ioda.reserved_pe_idx, win,
>>+					0, index);
>>+
>>+		if (rc != OPAL_SUCCESS)
>>+			pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
>>+				rc, win, index);
>>+
>>+		segmap[index] = IODA_INVALID_PE;
>>+	}
>>+}
>>+
>>+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+	int win;
>>+
>>+	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {
>>+		if (phb->type == PNV_PHB_IODA2 && win == OPAL_IO_WINDOW_TYPE)
>>+			continue;
>
>Move this check to pnv_ioda_release_window() or move case(win ==
>OPAL_M64_WINDOW_TYPE):if(phb->type != PNV_PHB_IODA1) from that function here.
>
>
>>+
>>+		pnv_ioda_release_window(pe, win);
>>+	}
>>+}
>
>This is shorter and cleaner:
>
>
>static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win, int
>*segmap
>{
>        struct pnv_phb *phb = pe->phb;
>        int index;
>        int64_t rc;
>
>        for (index = 0; index < phb->ioda.total_pe_num; index++) {
>                if (segmap[index] != pe->pe_number)
>                        continue;
>
>                if (win == OPAL_M64_WINDOW_TYPE)
>                        rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>                                        phb->ioda.reserved_pe_idx, win,
>                                        index / PNV_IODA1_M64_SEGS,
>                                        index % PNV_IODA1_M64_SEGS);
>                else
>                        rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>                                        phb->ioda.reserved_pe_idx, win,
>                                        0, index);
>
>                if (rc != OPAL_SUCCESS)
>                        pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
>                                rc, win, index);
>
>                segmap[index] = IODA_INVALID_PE;
>        }
>}
>
>static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
>{
>        pnv_ioda_release_window(pe, OPAL_M32_WINDOW_TYPE,
>phb->ioda.m32_segmap);
>        if (phb->type != PNV_PHB_IODA2)
>                pnv_ioda_release_window(pe, OPAL_IO_WINDOW_TYPE,
>                                phb->ioda.io_segmap);
>	else
>                pnv_ioda_release_window(pe, OPAL_M64_WINDOW_TYPE,
>                                phb->ioda.m64_segmap);
>}
>
>
>I'd actually merge pnv_ioda_release_pe_seg() into pnv_ioda_release_pe() as
>well as it is also small and called once.
>
>
>>+
>>+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb,
>>+				   struct pnv_ioda_pe *pe);
>>+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe);
>>+static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
>>+{
>>+	struct pnv_ioda_pe *tmp, *slave;
>>+
>>+	/* Release slave PEs in compound PE */
>>+	if (pe->flags & PNV_IODA_PE_MASTER) {
>>+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
>>+			pnv_ioda_release_pe(slave);
>>+	}
>>+
>>+	/* Remove the PE from the list */
>>+	list_del(&pe->list);
>>+
>>+	/* Release resources */
>>+	pnv_ioda_release_dma_pe(pe);
>>+	pnv_ioda_release_pe_seg(pe);
>>+	pnv_ioda_deconfigure_pe(pe->phb, pe);
>>+
>>+	pnv_ioda_free_pe(pe);
>>+}
>>+
>>+static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
>>+{
>>+	if (!pe)
>>+		return NULL;
>>+
>>+	pe->device_count++;
>>+	return pe;
>>+}
>>+
>>+static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
>
>
>Merge this into pnv_pci_release_device() as it is small and called only once.
>
I don't think so. The functions pnv_ioda_pe_{get,put}() are paired. I think it's
good enough to have separate function for the logic included in pnv_ioda_pe_put().
>>+{
>>+	if (!pe)
>>+		return;
>>+
>>+	pe->device_count--;
>>+	WARN_ON(pe->device_count < 0);
>>+	if (pe->device_count == 0)
>>+		pnv_ioda_release_pe(pe);
>>+}
>>+
>>+static void pnv_pci_release_device(struct pci_dev *pdev)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>+	struct pnv_ioda_pe *pe;
>>+
>>+	if (pdev->is_virtfn)
>>+		return;
>>+
>>+	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
>>+		return;
>>+
>>+	pe = &phb->ioda.pe_array[pdn->pe_number];
>>+	pnv_ioda_pe_put(pe);
>>+}
>>+
>>  static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>>  {
>>  	phb->ioda.pe_array[pe_no].phb = phb;
>>@@ -724,7 +933,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>>  	return 0;
>>  }
>>
>>-#ifdef CONFIG_PCI_IOV
>>  static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>  {
>>  	struct pci_dev *parent;
>>@@ -759,9 +967,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>  		}
>>  		rid_end = pe->rid + (count << 8);
>>  	} else {
>>+#ifdef CONFIG_PCI_IOV
>>  		if (pe->flags & PNV_IODA_PE_VF)
>>  			parent = pe->parent_dev;
>>  		else
>>+#endif
>>  			parent = pe->pdev->bus->self;
>>  		bcomp = OpalPciBusAll;
>>  		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>>@@ -799,11 +1009,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>
>>  	pe->pbus = NULL;
>>  	pe->pdev = NULL;
>>+#ifdef CONFIG_PCI_IOV
>>  	pe->parent_dev = NULL;
>>+#endif
>
>
>These #ifdef movements seem very much unrelated.
>
It's related: pnv_ioda_deconfigure_pe() was used for VF PE only. Now it's used by all
types of PEs. pe->parent_dev is declared as below:
#ifdef CONFIG_PCI_IOV
        struct pci_dev          *parent_dev;
#endif
>
>>
>>  	return 0;
>>  }
>>-#endif /* CONFIG_PCI_IOV */
>>
>>  static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>  {
>>@@ -985,6 +1196,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  			continue;
>>
>>  		pdn->pe_number = pe->pe_number;
>>+		pnv_ioda_pe_get(pe);
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>  			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>  	}
>>@@ -1047,9 +1259,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  			bus->busn_res.start, pe->pe_number);
>>
>>  	if (pnv_ioda_configure_pe(phb, pe)) {
>>-		/* XXX What do we do here ? */
>>-		pnv_ioda_free_pe(pe);
>>  		pe->pbus = NULL;
>>+		pnv_ioda_release_pe(pe);
>
>
>This is unrelated unexplained change.
>
Will drop it in next revision.
>>  		return NULL;
>>  	}
>>
>>@@ -1199,29 +1410,6 @@ m64_failed:
>>  	return -EBUSY;
>>  }
>>
>>-static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
>>-		int num);
>>-static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
>>-
>>-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
>>-{
>>-	struct iommu_table    *tbl;
>>-	int64_t               rc;
>>-
>>-	tbl = pe->table_group.tables[0];
>>-	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>>-	if (rc)
>>-		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>>-
>>-	pnv_pci_ioda2_set_bypass(pe, false);
>>-	if (pe->table_group.group) {
>>-		iommu_group_put(pe->table_group.group);
>>-		BUG_ON(pe->table_group.group);
>>-	}
>>-	pnv_pci_ioda2_table_free_pages(tbl);
>>-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>-}
>>-
>>  static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>  {
>>  	struct pci_bus        *bus;
>>@@ -1242,7 +1430,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>  		if (pe->parent_dev != pdev)
>>  			continue;
>>
>>-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
>>+		pnv_pci_ioda2_release_dma_pe(pe);
>
>
>This is unrelated change.
>
>>
>>  		/* Remove from list */
>>  		mutex_lock(&phb->ioda.pe_list_mutex);
>>@@ -3124,6 +3312,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>>  	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
>>  #endif
>>  	.enable_device_hook	= pnv_pci_enable_device_hook,
>>+	.release_device		= pnv_pci_release_device,
>>  	.window_alignment	= pnv_pci_window_alignment,
>>  	.setup_bridge		= pnv_pci_setup_bridge,
>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index ef5271a..3bb10de 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -30,6 +30,7 @@ struct pnv_phb;
>>  struct pnv_ioda_pe {
>>  	unsigned long		flags;
>>  	struct pnv_phb		*phb;
>>+	int			device_count;
>
>Not atomic_t, no kref, no additional mutex, just "int"? Sure about it? If so,
>put a note to the commit log about what provides a guarantee that there is no
>race.
>
>
It was a kref. Something you suggested on v5 as below:
| You do not need kref here. You call kref_put() in a single location and can do
| stuff directly, without kref. Just have an "unsigned int" counter and that's
| it (it does not even have to be atomic if you do not have races but I am not
| sure you do not).
|
>>
>>  	/* A PE can be associated with a single device or an
>>  	 * entire bus (& children). In the former case, pdev
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()
  2015-11-18  2:43   ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Alexey Kardashevskiy
@ 2015-11-23 23:08     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 23:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Wed, Nov 18, 2015 at 01:43:06PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
>>with names of the weak functions in PCI subsystem, which have the
>>prefix "pcibios". No logical changes introduced.
>
>
>As you mentioned before, the patchset is organized as "code refactoring,
>IO/M32/M64, DMA, PE allocation/releaseing". This patch fits into the
>refactoring category so it goes to the beginning of the series :)
>
I don't think so. As said in PATCH[00/50], this patchset consists of 3
separate parts: PowerNV PCI rework; Using PCI slot; Hotplug driver; this
patch is one of the second part. So I don't think it needs to be move to
the beginning of the series.
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus()
  2015-11-18  3:59   ` Alexey Kardashevskiy
@ 2015-11-23 23:11     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 23:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Wed, Nov 18, 2015 at 02:59:32PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
>>avoid conflicts with those PCI subsystem weak function names, which
>>have prefix "pcibios". No logical changes introduced.
>
>Could be merged into [PATCH v7 28/50] powerpc/pci: Rename
>pcibios_{add,remove}_pci_devices()  or/and moved to the beginning of the
>series?
>
I don't think it would be merged to [PATCH 28/50]. If it needs to be merged
to another one patch, that would be [PATCH 30/50], but I prefer to keep it
to make the patch simple (doing one thing if possible).
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h      | 2 +-
>>  arch/powerpc/platforms/pseries/pci_dlpar.c | 5 ++---
>>  drivers/pci/hotplug/rpadlpar_core.c        | 6 +++---
>>  drivers/pci/hotplug/rpaphp_pci.c           | 2 +-
>>  4 files changed, 7 insertions(+), 8 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index c2360c8..28385cb 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -257,7 +257,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
>>  #endif
>>
>>  /** Find the bus corresponding to the indicated device node */
>>-extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
>>+extern struct pci_bus *pci_find_bus_by_node(struct device_node *dn);
>>
>>  /** Remove all of the PCI devices under this bus */
>>  extern void pci_remove_pci_devices(struct pci_bus *bus);
>>diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
>>index 5d4a3df..aee22b4 100644
>>--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
>>+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
>>@@ -54,8 +54,7 @@ find_bus_among_children(struct pci_bus *bus,
>>  	return child;
>>  }
>>
>>-struct pci_bus *
>>-pcibios_find_pci_bus(struct device_node *dn)
>>+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
>>  {
>>  	struct pci_dn *pdn = dn->data;
>>
>>@@ -64,7 +63,7 @@ pcibios_find_pci_bus(struct device_node *dn)
>>
>>  	return find_bus_among_children(pdn->phb->bus, dn);
>>  }
>>-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
>>+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
>>
>>  struct pci_controller *init_phb_dynamic(struct device_node *dn)
>>  {
>>diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
>>index ebd283b..9aa392b 100644
>>--- a/drivers/pci/hotplug/rpadlpar_core.c
>>+++ b/drivers/pci/hotplug/rpadlpar_core.c
>>@@ -176,7 +176,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
>>  	struct pci_dev *dev;
>>  	struct pci_controller *phb;
>>
>>-	if (pcibios_find_pci_bus(dn))
>>+	if (pci_find_bus_by_node(dn))
>>  		return -EINVAL;
>>
>>  	/* Add pci bus */
>>@@ -213,7 +213,7 @@ static int dlpar_remove_phb(char *drc_name, struct device_node *dn)
>>  	struct pci_dn *pdn;
>>  	int rc = 0;
>>
>>-	if (!pcibios_find_pci_bus(dn))
>>+	if (!pci_find_bus_by_node(dn))
>>  		return -EINVAL;
>>
>>  	/* If pci slot is hotpluggable, use hotplug to remove it */
>>@@ -357,7 +357,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>>
>>  	pci_lock_rescan_remove();
>>
>>-	bus = pcibios_find_pci_bus(dn);
>>+	bus = pci_find_bus_by_node(dn);
>>  	if (!bus) {
>>  		ret = -EINVAL;
>>  		goto out;
>>diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
>>index 256066c..e7dd573 100644
>>--- a/drivers/pci/hotplug/rpaphp_pci.c
>>+++ b/drivers/pci/hotplug/rpaphp_pci.c
>>@@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
>>  	if (rc)
>>  		return rc;
>>
>>-	bus = pcibios_find_pci_bus(slot->dn);
>>+	bus = pci_find_bus_by_node(slot->dn);
>>  	if (!bus) {
>>  		err("%s: no pci_bus for dn %s\n", __func__, slot->dn->full_name);
>>  		return -EINVAL;
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-11-18  7:33   ` Alexey Kardashevskiy
@ 2015-11-23 23:16     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 23:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Wed, Nov 18, 2015 at 06:33:08PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>platform that runs on top of skiboot firmware. The firmware identifies
>>hotpluggable slots and marked their device tree node with proper
>>"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans device
>>tree nodes to create/register PCI hotplug slot accordingly.
>>
>>If the skiboot firmware doesn't support slot status retrieval, the PCI
>>slot device node shouldn't have property "ibm,reset-by-firmware". In
>>that case, none of valid PCI slots will be detected from device tree.
>>The skiboot firmware doesn't export the capability to access attention
>>LEDs yet and it's something for TBD.
>
>
>Few words what we are actually dealing with and how children slots can be
>hotplugged to parent slots?
>
Sure, will do. All comments you gave will be reflected in next revision.
Please let me know if you finish the review and I can start the respin
for next revision.
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>---
>>  MAINTAINERS                   |   6 +
>>  drivers/pci/hotplug/Kconfig   |  12 +
>>  drivers/pci/hotplug/Makefile  |   3 +
>>  drivers/pci/hotplug/pnv_php.c | 866 ++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 887 insertions(+)
>>  create mode 100644 drivers/pci/hotplug/pnv_php.c
>>
>>diff --git a/MAINTAINERS b/MAINTAINERS
>>index 9f6685f..10088f1 100644
>>--- a/MAINTAINERS
>>+++ b/MAINTAINERS
>>@@ -7931,6 +7931,12 @@ L:	linux-pci@vger.kernel.org
>>  S:	Supported
>>  F:	Documentation/PCI/pci-error-recovery.txt
>>
>>+PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
>>+M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
>>+L:	linux-pci@vger.kernel.org
>>+S:	Supported
>>+F:	drivers/pci/hotplug/pnv_php.c
>>+
>>  PCI SUBSYSTEM
>>  M:	Bjorn Helgaas <bhelgaas@google.com>
>>  L:	linux-pci@vger.kernel.org
>>diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>index df8caec..167c8ce 100644
>>--- a/drivers/pci/hotplug/Kconfig
>>+++ b/drivers/pci/hotplug/Kconfig
>>@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>
>>  	  When in doubt, say N.
>>
>>+config HOTPLUG_PCI_POWERNV
>>+	tristate "PowerPC PowerNV PCI Hotplug driver"
>>+	depends on PPC_POWERNV && EEH
>>+	help
>>+	  Say Y here if you run PowerPC PowerNV platform that supports
>>+	  PCI Hotplug
>>+
>>+	  To compile this driver as a module, choose M here: the
>>+	  module will be called pnv-php.
>>+
>>+	  When in doubt, say N.
>>+
>>  config HOTPLUG_PCI_RPA
>>  	tristate "RPA PCI Hotplug driver"
>>  	depends on PPC_PSERIES && EEH
>>diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>index b616e75..e33cdda 100644
>>--- a/drivers/pci/hotplug/Makefile
>>+++ b/drivers/pci/hotplug/Makefile
>>@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>  obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= pnv-php.o
>>  obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>  obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>  obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>@@ -50,6 +51,8 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>  acpiphp-objs		:=	acpiphp_core.o	\
>>  				acpiphp_glue.o
>>
>>+pnv-php-objs		:=	pnv_php.o
>>+
>>  rpaphp-objs		:=	rpaphp_core.o	\
>>  				rpaphp_pci.o	\
>>  				rpaphp_slot.o
>>diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>new file mode 100644
>>index 0000000..415e9b9
>>--- /dev/null
>>+++ b/drivers/pci/hotplug/pnv_php.c
>>@@ -0,0 +1,866 @@
>>+/*
>>+ * PCI Hotplug Driver for PowerPC PowerNV platform.
>>+ *
>>+ * Copyright Gavin Shan, IBM Corporation 2015.
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include <linux/pci.h>
>>+#include <linux/pci_hotplug.h>
>>+#include <linux/module.h>
>>+
>>+#include <asm/opal.h>
>>+#include <asm/pnv-pci.h>
>>+#include <asm/ppc-pci.h>
>>+
>>+#define DRIVER_VERSION	"0.1"
>>+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>+
>>+struct pnv_php_slot {
>>+	struct hotplug_slot		php_slot;
>>+	struct hotplug_slot_info	php_slot_info;
>>+	uint64_t			id;
>>+	char				*name;
>>+	int				slot_no;
>>+	struct kref			kref;
>>+	int				state;
>>+#define PNV_PHP_STATE_INIT		0
>
>INITIALIZED
>
>>+#define PNV_PHP_STATE_REGISTER		1
>
>REGISTERED
>
>
>>+#define PNV_PHP_STATE_POPULATED		2
>
>This one has "ed" already :)
>
>And usually definitions go before the variable which uses them.
>
>>+	struct device_node		*dn;
>>+	struct pci_dev			*pdev;
>>+	struct pci_bus			*bus;
>>+	bool				power_state_check;
>>+	int				power_state_confirmed;
>>+#define PNV_PHP_POWER_CONFIRMED_INVALID	0
>>+#define PNV_PHP_POWER_CONFIRMED_SUCCESS	1
>>+#define PNV_PHP_POWER_CONFIRMED_FAIL	2
>>+	struct opal_msg			*msg;
>>+	void				*fdt;
>>+	void				*dt;
>>+	struct of_changeset		ocs;
>>+	struct work_struct		work;
>>+	wait_queue_head_t		queue;
>>+	struct pnv_php_slot		*parent;
>>+	struct list_head		children;
>>+	struct list_head		link;
>>+};
>>+
>>+static LIST_HEAD(pnv_php_slot_list);
>>+static DEFINE_SPINLOCK(pnv_php_lock);
>>+
>>+static void pnv_php_register(struct device_node *dn);
>>+static void pnv_php_unregister_one(struct device_node *dn);
>>+static void pnv_php_unregister(struct device_node *dn);
>>+
>>+static inline struct pnv_php_slot *pnv_php_get_slot(struct pnv_php_slot *slot)
>>+{
>>+	if (slot) {
>>+		kref_get(&slot->kref);
>>+		return slot;
>>+	}
>>+
>>+	return NULL;
>>+}
>>+
>>+static void pnv_php_free_slot(struct kref *kref)
>>+{
>>+	struct pnv_php_slot *slot = container_of(kref,
>>+						 struct pnv_php_slot,
>>+						 kref);
>>+
>>+	WARN_ON(!list_empty(&slot->children));
>>+	kfree(slot->name);
>>+	kfree(slot);
>>+}
>>+
>>+static inline void pnv_php_put_slot(struct pnv_php_slot *slot)
>>+{
>>+	if (!slot)
>>+		return;
>>+
>>+	kref_put(&slot->kref, pnv_php_free_slot);
>>+}
>>+
>>+static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
>>+					  struct pnv_php_slot *slot)
>>+{
>>+	struct pnv_php_slot *target, *tmp;
>>+
>>+	if (slot->dn == dn)
>>+		return pnv_php_get_slot(slot);
>>+
>>+	list_for_each_entry(tmp, &slot->children, link) {
>>+		target = pnv_php_match(dn, tmp);
>>+		if (target)
>>+			return target;
>>+	}
>>+
>>+	return NULL;
>>+}
>>+
>>+static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *slot, *tmp;
>>+	unsigned long flags;
>>+
>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>+	list_for_each_entry(tmp, &pnv_php_slot_list, link) {
>>+		slot = pnv_php_match(dn, tmp);
>>+		if (slot) {
>>+			spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+			return slot;
>>+		}
>>+	}
>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+
>>+	return NULL;
>>+}
>>+
>>+/*
>>+ * Remove pdn for all children of the indicated device node.
>>+ * The function should remove pdn in a depth-first manner.
>>+ */
>>+static void pnv_php_rmv_pdns(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	for_each_child_of_node(dn, child) {
>>+		pnv_php_rmv_pdns(child);
>>+
>>+		pci_remove_device_node_info(child);
>>+	}
>>+}
>>+
>>+/*
>>+ * Remove all child nodes of the indicated device nodes. The
>>+ * function should remove device nodes in depth-first manner.
>>+ */
>>+static int pnv_php_rmv_device_nodes(struct device_node *parent)
>>+{
>>+	struct device_node *dn, *child;
>>+	int ret = 0;
>>+
>>+	for_each_child_of_node(parent, dn) {
>>+		ret = pnv_php_rmv_device_nodes(dn);
>>+		if (ret)
>>+			return ret;
>>+
>>+		child = of_get_next_child(dn, NULL);
>>+		if (child) {
>>+			of_node_put(child);
>>+			of_node_put(dn);
>>+			pr_err("%s: Alive children of node <%s>\n",
>>+			       __func__, of_node_full_name(dn));
>>+			return -EBUSY;
>>+		}
>>+
>>+		of_detach_node(dn);
>>+		of_node_put(dn);
>>+	}
>
>
>This loop iterates just once, is this correct? If so, then a loop is not
>needed here...
>
>
>>+
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * The function processes the message sent by firmware
>>+ * to remove all device tree nodes beneath the slot's
>>+ * nodes and the associated auxiliary data.
>>+ */
>>+static void pnv_php_handle_poweroff(struct pnv_php_slot *slot)
>>+{
>>+	int ret;
>>+
>>+	pnv_php_rmv_pdns(slot->dn);
>>+
>>+	/*
>>+	 * If the device sub-tree was created from OF changeset, simply
>>+	 * to revert that. Otherwise, the device nodes in the sub-tree
>>+	 * need to be iterated and detached.
>>+	 */
>>+	if (slot->fdt) {
>>+		of_changeset_destroy(&slot->ocs);
>>+		kfree(slot->dt);
>>+		kfree(slot->fdt);
>>+		slot->dt = NULL;
>>+		slot->dn->child = NULL;
>>+		slot->fdt = NULL;
>>+		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>+		goto confirm;
>>+	}
>
>} else {
>
>>+
>>+	ret = pnv_php_rmv_device_nodes(slot->dn);
>>+	if (!ret) {
>>+		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>+	} else {
>>+		slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_FAIL;
>>+		dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
>>+			 ret);
>
>Could be one line :)
>
>
>>+	}
>>+
>
>}
>and remove the label below?
>
>
>>+confirm:
>
>
>>+	wake_up_interruptible(&slot->queue);
>>+}
>>+
>>+static int pnv_php_populate_changeset(struct of_changeset *ocs,
>>+				      struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+	int ret = 0;
>>+
>>+	for_each_child_of_node(dn, child) {
>>+		ret = of_changeset_attach_node(ocs, child);
>>+		if (ret)
>>+			return ret;
>>+
>>+		ret = pnv_php_populate_changeset(ocs, child);
>
>if (ret) break; may be?
>
>
>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static void *pnv_php_add_one_pdn(struct device_node *dn, void *data)
>>+{
>>+	struct pci_controller *hose = (struct pci_controller *)data;
>>+	struct pci_dn *pdn;
>>+
>>+	pdn = pci_add_device_node_info(hose, dn);
>>+	if (!pdn)
>>+		return ERR_PTR(-ENOMEM);
>>+
>>+	return NULL;
>>+}
>>+
>>+static void pnv_php_add_pdns(struct pnv_php_slot *slot)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(slot->bus);
>>+
>>+	pci_traverse_device_nodes(slot->dn, pnv_php_add_one_pdn, hose);
>>+}
>>+
>>+static void pnv_php_handle_poweron(struct pnv_php_slot *slot)
>>+{
>>+	void *fdt, *dt;
>>+	uint64_t len;
>>+	int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>+	int ret;
>>+
>>+	/* We don't know the FDT blob size. It tries with incremental
>>+	 * sized memory chunk.
>>+	 */
>>+	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
>>+		fdt = kzalloc(len, GFP_KERNEL);
>>+		if (!fdt)
>>+			break;
>>+
>>+		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
>>+		if (!ret)
>>+			break;
>>+
>>+		kfree(fdt);
>>+	}
>>+
>>+	if (len > 0x10000) {
>>+		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
>>+		goto out;
>
>This seems like an error but slot->power_state_confirmed will be set to
>PNV_PHP_POWER_CONFIRMED_SUCCESS anyway, is that correct?
>
>
>>+	}
>
>I'd redo the chunk above like this:
>
>fdt1 = kzalloc(0x10000);
>if (!fdt1)
>	goto out;
>ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt1, 0x10000);
>if (!ret)
>	goto out;
>fdt = kzalloc(fdt_totalsize(fdt1));
>if (!fdt)
>	goto out;
>memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>kfree(fdt1);
>
>
>This way you end up using less memory after setup has completed.
>
>And what is an usual size of the returned blob?
>
>
>>+
>>+	/* Unflatten device tree blob */
>>+	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
>>+	if (!dt) {
>>+		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
>>+		goto free_fdt;
>>+	}
>>+
>>+	/* Initialize and apply the changeset */
>>+	of_changeset_init(&slot->ocs);
>>+	ret = pnv_php_populate_changeset(&slot->ocs, slot->dn);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
>>+			 ret);
>>+		goto free_dt;
>>+	}
>>+
>>+	slot->dn->child = NULL;
>>+	ret = of_changeset_apply(&slot->ocs);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
>>+			 ret);
>>+		goto destroy_changeset;
>>+	}
>>+
>>+	/* Add device node firmware data */
>>+	pnv_php_add_pdns(slot);
>>+	slot->fdt = fdt;
>>+	slot->dt = dt;
>>+	goto out;
>>+
>>+destroy_changeset:
>>+	of_changeset_destroy(&slot->ocs);
>>+free_dt:
>>+	kfree(dt);
>>+	slot->dn->child = NULL;
>>+free_fdt:
>>+	kfree(fdt);
>>+	confirm = PNV_PHP_POWER_CONFIRMED_FAIL;
>>+out:
>>+	/* Confirm status change */
>>+	slot->power_state_confirmed = confirm;
>>+	wake_up_interruptible(&slot->queue);
>>+}
>>+
>>+static void pnv_php_work(struct work_struct *data)
>>+{
>>+	struct pnv_php_slot *slot = container_of(data,
>>+						 struct pnv_php_slot, work);
>>+	uint64_t event = be64_to_cpu(slot->msg->params[0]);
>>+
>>+	if (event == OPAL_PCI_SLOT_POWER_OFF)
>>+		pnv_php_handle_poweroff(slot);
>>+	else
>>+		pnv_php_handle_poweron(slot);
>>+
>>+	pnv_php_put_slot(slot);
>>+}
>>+
>>+static int pnv_php_handle_msg(struct notifier_block *nb,
>>+			      unsigned long type,
>>+			      void *message)
>>+{
>>+	phandle h;
>>+	struct device_node *dn;
>>+	struct pnv_php_slot *slot;
>>+	struct opal_msg *msg = message;
>>+
>>+	if (type != OPAL_MSG_PCI_HOTPLUG) {
>>+		pr_warn("%s: Invalid message %ld received!\n",
>>+			__func__, type);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	h = (phandle)be64_to_cpu(msg->params[1]);
>>+	dn = of_find_node_by_phandle(h);
>>+	if (!dn) {
>>+		pr_warn("%s: No device node for phandle 0x%x\n",
>>+			__func__, h);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	slot = pnv_php_find_slot(dn);
>>+	of_node_put(dn);
>>+	if (!slot) {
>>+		pr_warn("%s: No slot found for node <%s>\n",
>>+			__func__, of_node_full_name(dn));
>>+		of_node_put(dn);
>
>You already put the node 5 lines above, is this correct?
>
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	slot->msg = msg;
>>+	schedule_work(&slot->work);
>>+	return NOTIFY_OK;
>>+}
>>+
>>+static int pnv_php_set_power_state(struct hotplug_slot *php_slot, u8 state)
>>+{
>>+	struct pnv_php_slot *slot = php_slot->private;
>
>
>Most instances of "struct pnv_php_slot" are called "slot".
>Most instances of "struct hotplug_slot" are called "php_slot".
>
>When I read this code, I have to remind myself that a "php_slot" variable
>(which has "php" in it) is NOT of the type with "php" (i.e. NOT
>"pnv_php_slot").
>
>I would suggest swapping slot <-> php_slot.
>
>
>>+	int ret;
>>+
>>+	slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>+	ret = pnv_pci_set_power_state(slot->id, state);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
>>+			 ret, state ? "on" : "off");
>>+		return ret;
>>+	}
>>+
>>+	/* Continue to PCI probing after finalized device-tree. The
>>+	 * device-tree might have been updated completely at this
>>+	 * point. Thus we don't have to always waiting for that.
>
>s/always waiting/wait forever/ ?
>
>>+	 */
>>+	if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>+		return 0;
>>+	else if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_FAIL)
>
>
>No need in "else" here.
>
>
>>+		return -EBUSY;
>>+
>>+	ret = wait_event_timeout(slot->queue,
>>+				 slot->power_state_confirmed, 10 * HZ);
>
>The code flow is unclear in this case.
>
>The queue is signaled from pnv_php_handle_poweron() which is "work" and
>scheduled by pnv_php_handle_msg() and it is not obvious what code calls
>pnv_php_handle_msg().
>
>
>
>>+	if (!ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
>>+			 ret, state ? "on" : "off");
>>+		return -EBUSY;
>>+	}
>>+
>>+	if (slot->power_state_confirmed == PNV_PHP_POWER_CONFIRMED_SUCCESS)
>>+		return 0;
>>+
>>+	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
>>+		 slot->power_state_confirmed, state ? "on" : "off");
>>+	return -EBUSY;
>>+}
>>+
>>+static int pnv_php_get_power_state(struct hotplug_slot *php_slot, u8 *state)
>>+{
>>+	struct pnv_php_slot *slot = php_slot->private;
>>+	uint8_t power_state;
>>+	int ret;
>>+
>>+	/*
>>+	 * Retrieve power status from firmware. If we fail
>>+	 * getting that, the power status fails back to
>>+	 * be on.
>>+	 */
>>+	ret = pnv_pci_get_power_state(slot->id, &power_state);
>>+	if (ret) {
>>+		*state = OPAL_PCI_SLOT_POWER_ON;
>>+		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
>>+			 ret);
>>+	} else {
>>+		*state = power_state;
>>+		php_slot->info->power_status = power_state;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_get_adapter_state(struct hotplug_slot *php_slot, u8 *state)
>>+{
>>+	struct pnv_php_slot *slot = php_slot->private;
>>+	uint8_t presence;
>>+	int ret;
>>+
>>+	/*
>>+	 * Retrieve presence status from firmware. If we can't
>>+	 * get that, it will fail back to be empty.
>>+	 */
>>+	ret = pnv_pci_get_presence_state(slot->id, &presence);
>>+	if (ret >= 0) {
>>+		*state = presence;
>>+		php_slot->info->adapter_status = presence;
>>+		ret = 0;
>>+	} else {
>>+		*state = OPAL_PCI_SLOT_EMPTY;
>>+		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
>>+			 ret);
>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static int pnv_php_set_attention_state(struct hotplug_slot *php_slot, u8 state)
>>+{
>>+	/* FIXME: Make it real once firmware supports it */
>>+	php_slot->info->attention_status = state;
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_enable(struct pnv_php_slot *slot, bool rescan)
>>+{
>>+	struct hotplug_slot *php_slot = &slot->php_slot;
>>+	uint8_t presence, power_status;
>>+	int ret;
>>+
>>+	/* Check if the slot has been configured */
>>+	if (slot->state != PNV_PHP_STATE_REGISTER)
>>+		return 0;
>>+
>>+	/* Retrieve slot presence status */
>>+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
>
>
>Here and in other places there is no point in dereferencing ops, just call
>pnv_php_get_adapter_state() here directly as you decided not to have a
>separate source file for pnv_php_slot.
>
>
>>+	if (ret)
>>+		return ret;
>>+
>>+	/* Proceed if there have nothing behind the slot */
>>+	if (presence == OPAL_PCI_SLOT_EMPTY)
>>+		goto scan;
>>+
>>+	/*
>>+	 * If we don't detect something behind the slot, we need
>>+	 * make sure the power suply to the slot is on.
>
>Is this correct - "don't detect" -> "make sure it is on"?
>
>
>>Otherwise,
>>+	 * the slot downstream PCIe linkturn should be down.
>>+	 *
>>+	 * On the first time, we don't change the power status to
>>+	 * boost system boot with assumption that the firmware
>
>Out of curiosity - does it really boost booting? :)
>
>
>>+	 * supplies consistent slot power status: empty slot always
>>+	 * has its power off and non-empty slot has its power on.
>>+	 */
>>+	if (!slot->power_state_check) {
>>+		slot->power_state_check = true;
>>+		goto scan;
>>+	}
>>+
>>+	/* Check the power status. Scan the slot if that's already on */
>>+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
>>+	if (ret)
>>+		return ret;
>>+
>>+	if (power_status == OPAL_PCI_SLOT_POWER_ON)
>>+		goto scan;
>>+
>>+	/* Power is off, turn it on and then scan the slot */
>>+	ret = pnv_php_set_power_state(php_slot, OPAL_PCI_SLOT_POWER_ON);
>>+	if (ret)
>>+		return ret;
>>+
>>+scan:
>>+	if (presence == OPAL_PCI_SLOT_PRESENT) {
>>+		if (rescan) {
>>+			pci_lock_rescan_remove();
>>+			pci_add_pci_devices(slot->bus);
>>+			pci_unlock_rescan_remove();
>>+		}
>>+
>>+		/* Rescan for child hotpluggable slots */
>>+		slot->state = PNV_PHP_STATE_POPULATED;
>>+		if (rescan)
>>+			pnv_php_register(slot->dn);
>
>
>The chunk above adds a parent slot (a physical slot) and then scans for
>children slots (a mighty extended with extra physical slots)? :)
>
>
>>+	} else {
>>+		slot->state = PNV_PHP_STATE_POPULATED;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_enable_slot(struct hotplug_slot *php_slot)
>>+{
>>+	struct pnv_php_slot *slot = container_of(php_slot,
>>+						 struct pnv_php_slot,
>>+						 php_slot);
>>+
>>+	return pnv_php_enable(slot, true);
>>+}
>>+
>>+static int pnv_php_disable_slot(struct hotplug_slot *php_slot)
>>+{
>>+	struct pnv_php_slot *slot = php_slot->private;
>>+	uint8_t power_state;
>>+	int ret;
>>+
>>+	if (slot->state != PNV_PHP_STATE_POPULATED)
>>+		return 0;
>>+
>>+	/* Remove all devices behind the slot */
>>+	pci_lock_rescan_remove();
>>+	pci_remove_pci_devices(slot->bus);
>>+	pci_unlock_rescan_remove();
>>+
>>+	/* Detach the child hotpluggable slots */
>>+	pnv_php_unregister(slot->dn);
>>+
>>+	/*
>>+	 * Check the power status and turn it off if necessary. If we
>>+	 * fail to get the power status, the power will be forced to
>>+	 * be off.
>>+	 */
>>+	ret = php_slot->ops->get_power_status(php_slot, &power_state);
>>+	if (ret || power_state == OPAL_PCI_SLOT_POWER_ON) {
>>+		ret = pnv_php_set_power_state(php_slot,
>>+					      OPAL_PCI_SLOT_POWER_OFF);
>>+		if (ret)
>>+			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
>>+				 ret);
>>+	}
>>+
>>+	/* Update slot state */
>>+	slot->state = PNV_PHP_STATE_REGISTER;
>>+	return 0;
>>+}
>>+
>>+static struct hotplug_slot_ops php_slot_ops = {
>>+	.get_power_status	= pnv_php_get_power_state,
>>+	.get_adapter_status	= pnv_php_get_adapter_state,
>>+	.set_attention_status	= pnv_php_set_attention_state,
>>+	.enable_slot		= pnv_php_enable_slot,
>>+	.disable_slot		= pnv_php_disable_slot,
>>+};
>>+
>>+static void pnv_php_release(struct hotplug_slot *hp_slot)
>>+{
>>+	struct pnv_php_slot *slot = hp_slot->private;
>>+	unsigned long flags;
>>+
>>+	/* Remove from global or child list */
>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>+	list_del(&slot->link);
>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+
>>+	/* Detach from parent */
>>+	pnv_php_put_slot(slot);
>>+	pnv_php_put_slot(slot->parent);
>>+}
>>+
>>+static int pnv_php_get_slot_id(struct device_node *dn, uint64_t *id)
>>+{
>>+	struct device_node *parent = dn;
>>+	const __be64 *prop64;
>>+	const __be32 *prop32;
>>+
>>+	/*
>>+	 * The hotpluggable slot always has a compound Id, which
>>+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
>>+	 * number, and compound indicator
>>+	 */
>>+	*id = (0x1ul << 63);
>>+
>>+	/* Bus/Slot/Function number */
>>+	prop32 = of_get_property(dn, "reg", NULL);
>>+	if (!prop32)
>>+		return -ENXIO;
>>+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
>>+
>>+	/* PHB Id */
>>+	while ((parent = of_get_parent(parent))) {
>>+		if (!PCI_DN(parent)) {
>>+			of_node_put(parent);
>>+			break;
>>+		}
>>+
>>+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
>>+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
>>+			of_node_put(parent);
>>+			continue;
>>+		}
>>+
>>+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
>>+		if (!prop64) {
>>+			of_node_put(parent);
>>+			return -ENXIO;
>>+		}
>>+
>>+		*id |= be64_to_cpup(prop64);
>>+		of_node_put(parent);
>>+		return 0;
>>+	}
>>+
>>+	return -ENODEV;
>>+}
>>+
>>+static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *slot;
>>+	struct pci_bus *bus;
>>+	const char *label;
>>+	uint64_t id;
>>+
>>+	label = of_get_property(dn, "ibm,slot-label", NULL);
>>+	if (!label)
>>+		return NULL;
>>+
>>+	if (pnv_php_get_slot_id(dn, &id))
>>+		return NULL;
>>+
>>+	bus = pci_find_bus_by_node(dn);
>>+	if (!bus)
>>+		return NULL;
>>+
>>+	slot = kzalloc(sizeof(*slot), GFP_KERNEL);
>>+	if (!slot)
>>+		return NULL;
>>+
>>+	slot->name = kstrdup(label, GFP_KERNEL);
>>+	if (!slot->name) {
>>+		kfree(slot);
>>+		return NULL;
>>+	}
>>+
>>+	if (dn->child && PCI_DN(dn->child))
>>+		slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>+	else
>>+		slot->slot_no = -1;   /* Placeholder slot */
>>+
>>+	kref_init(&slot->kref);
>>+	slot->state	            = PNV_PHP_STATE_INIT;
>>+	slot->dn	            = dn;
>>+	slot->pdev	            = bus->self;
>>+	slot->bus	            = bus;
>>+	slot->id	            = id;
>>+	slot->power_state_check     = false;
>>+	slot->power_state_confirmed = PNV_PHP_POWER_CONFIRMED_INVALID;
>>+	slot->php_slot.ops          = &php_slot_ops;
>>+	slot->php_slot.info         = &slot->php_slot_info;
>>+	slot->php_slot.release      = pnv_php_release;
>>+	slot->php_slot.private      = slot;
>>+
>>+	INIT_WORK(&slot->work, pnv_php_work);
>>+	init_waitqueue_head(&slot->queue);
>>+	INIT_LIST_HEAD(&slot->children);
>>+	INIT_LIST_HEAD(&slot->link);
>>+
>>+	return slot;
>>+}
>>+
>>+static int pnv_php_register_slot(struct pnv_php_slot *slot)
>>+{
>>+	struct pnv_php_slot *parent;
>>+	struct device_node *dn = slot->dn;
>>+	unsigned long flags;
>>+	int ret;
>>+
>>+	/* Check if the slot exists or not */
>
>s/exists/is registered/
>
>
>>+	parent = pnv_php_find_slot(slot->dn);
>>+	if (parent) {
>>+		pnv_php_put_slot(parent);
>>+		return -EEXIST;
>>+	}
>>+
>>+	/* Register PCI slot */
>>+	ret = pci_hp_register(&slot->php_slot, slot->bus,
>>+			      slot->slot_no, slot->name);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
>>+			 ret);
>>+		return ret;
>>+	}
>>+
>>+	/* Attach to the parent's child list or global list */
>>+	while ((dn = of_get_parent(dn))) {
>>+		if (!PCI_DN(dn)) {
>>+			of_node_put(dn);
>>+			break;
>>+		}
>>+
>>+		parent = pnv_php_find_slot(dn);
>>+		if (parent) {
>>+			of_node_put(dn);
>>+			break;
>>+		}
>
>This is missing here:
>
>of_node_put(dn);
>
>
>>+	}
>>+
>>+	spin_lock_irqsave(&pnv_php_lock, flags);
>>+	slot->parent = parent;
>>+	if (parent)
>>+		list_add_tail(&slot->link, &parent->children);
>>+	else
>>+		list_add_tail(&slot->link, &pnv_php_slot_list);
>>+	spin_unlock_irqrestore(&pnv_php_lock, flags);
>>+
>>+	slot->state = PNV_PHP_STATE_REGISTER;
>>+	return 0;
>>+}
>>+
>>+static int pnv_php_register_one(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *slot;
>>+	const __be32 *prop32;
>>+	int ret;
>>+
>>+	/* Check if it's hotpluggable slot */
>>+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
>>+	if (!prop32 || !of_read_number(prop32, 1))
>>+		return -ENXIO;
>>+
>>+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
>>+	if (!prop32 || !of_read_number(prop32, 1))
>>+		return -ENXIO;
>>+
>>+	slot = pnv_php_alloc_slot(dn);
>>+	if (!slot)
>>+		return -ENODEV;
>>+
>>+	ret = pnv_php_register_slot(slot);
>>+	if (ret)
>>+		goto free_slot;
>>+
>>+	ret = pnv_php_enable(slot, false);
>>+	if (ret)
>>+		goto unregister_slot;
>>+
>>+	return 0;
>>+
>>+unregister_slot:
>>+	pnv_php_unregister_one(slot->dn);
>>+free_slot:
>>+	pnv_php_put_slot(slot);
>>+	return ret;
>>+}
>>+
>>+static void pnv_php_register(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	/*
>>+	 * The parent slots should be registered before their
>>+	 * child slots.
>>+	 */
>>+	for_each_child_of_node(dn, child) {
>>+		pnv_php_register_one(child);
>>+		pnv_php_register(child);
>>+	}
>>+}
>>+
>>+static void pnv_php_unregister_one(struct device_node *dn)
>>+{
>>+	struct pnv_php_slot *slot;
>>+
>>+	slot = pnv_php_find_slot(dn);
>>+	if (!slot)
>>+		return;
>>+
>>+	pnv_php_put_slot(slot);
>>+	pci_hp_deregister(&slot->php_slot);
>>+}
>>+
>>+static void pnv_php_unregister(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	/* The child slots should go before their parent slots */
>>+	for_each_child_of_node(dn, child) {
>>+		pnv_php_unregister(child);
>>+		pnv_php_unregister_one(child);
>>+	}
>>+}
>>+
>>+static struct notifier_block php_msg_nb = {
>>+	.notifier_call	= pnv_php_handle_msg,
>>+	.next		= NULL,
>>+	.priority	= 0,
>>+};
>>+
>>+static int __init pnv_php_init(void)
>>+{
>>+	struct device_node *dn;
>>+	int ret;
>>+
>>+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>>+
>>+	/* Register hotplug message handler */
>>+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
>>+	if (ret) {
>>+		pr_warn("%s: Error %d registering hotplug notifier\n",
>>+			__func__, ret);
>>+		return ret;
>>+	}
>>+
>>+	/* Scan PHB nodes and their children */
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>+		pnv_php_register(dn);
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>+		pnv_php_register(dn);
>>+
>>+	return 0;
>>+}
>>+
>>+static void __exit pnv_php_exit(void)
>>+{
>>+	struct device_node *dn;
>>+
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>+		pnv_php_unregister(dn);
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>+		pnv_php_unregister(dn);
>>+
>>+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>>+}
>>+
>>+module_init(pnv_php_init);
>>+module_exit(pnv_php_exit);
>>+
>>+MODULE_VERSION(DRIVER_VERSION);
>>+MODULE_LICENSE("GPL v2");
>>+MODULE_AUTHOR(DRIVER_AUTHOR);
>>+MODULE_DESCRIPTION(DRIVER_DESC);
>>
>
>
>-- 
>Alexey
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes()
  2015-11-18  3:14   ` Alexey Kardashevskiy
@ 2015-11-23 23:23     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 23:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Wed, Nov 18, 2015 at 02:14:59PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>This renames traverse_pci_devices() to pci_traverse_device_nodes().
>
>Why? pci_traverse_device_nodes() is not moved to some more generic header
>where is would be required to have a standard prefix. And the ppc-pci.h
>header does not use any standard prefix so the point of renaming is unclear.
>
As the function is going to be exported, it's worthy to have a more generic
name.
>traverse_pci_dn() is still there and it has "traverse", "pci" and "device
>node" (abbreviated as "dn") in it so pci_traverse_device_nodes is more
>confusing name than traverse_pci_devices. Cannot we just get rid of one of
>them?
>
traverse_pci_dn() is traversing pdn (PCI_DN), not device nodes (struct device_node).
>Also the subject line says "Export" but nothing gets exported in this patch -
>the visibility of pci_traverse_device_nodes() remains unchanged.
>
Yes, the EXPORT_SYMBOL() part is missed from this patch. I'll fix in next
revision.
>>The function traverses all subordinate device nodes of the specified
>>one. Also, below cleanup applied to the function. No logical changes
>>introduced.
>>
>>    * Rename "pre" to "fn".
>>    * Avoid assignment in if condition reported from checkpatch.pl.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>>  arch/powerpc/kernel/pci_dn.c         | 14 +++++++++-----
>>  arch/powerpc/platforms/pseries/msi.c |  4 ++--
>>  3 files changed, 14 insertions(+), 10 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
>>index ca0c5bf..8753e4e 100644
>>--- a/arch/powerpc/include/asm/ppc-pci.h
>>+++ b/arch/powerpc/include/asm/ppc-pci.h
>>@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
>>  struct device_node;
>>  struct pci_dn;
>>
>>-typedef void *(*traverse_func)(struct device_node *me, void *data);
>>-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>-		void *data);
>>+void *pci_traverse_device_nodes(struct device_node *start,
>>+				void *(*fn)(struct device_node *, void *),
>>+				void *data);
>>  void *traverse_pci_dn(struct pci_dn *root,
>>  		      void *(*fn)(struct pci_dn *, void *),
>>  		      void *data);
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index 7f877a4..aa4110f 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -355,8 +355,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>>   * one of these nodes we also assume its siblings are non-pci for
>>   * performance.
>>   */
>>-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>-		void *data)
>>+void *pci_traverse_device_nodes(struct device_node *start,
>>+				void *(*fn)(struct device_node *, void *),
>>+				void *data)
>>  {
>>  	struct device_node *dn, *nextdn;
>>  	void *ret;
>>@@ -371,8 +372,11 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>  		if (classp)
>>  			class = of_read_number(classp, 1);
>>
>>-		if (pre && ((ret = pre(dn, data)) != NULL))
>>-			return ret;
>>+		if (fn) {
>>+			ret = fn(dn, data);
>>+			if (ret)
>>+				return ret;
>>+		}
>>
>>  		/* If we are a PCI bridge, go down */
>>  		if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
>>@@ -470,7 +474,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>>  	}
>>
>>  	/* Update dn->phb ptrs for new phb and children devices */
>>-	traverse_pci_devices(dn, add_pdn, phb);
>>+	pci_traverse_device_nodes(dn, add_pdn, phb);
>>  }
>>
>>  /**
>>diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
>>index 272e9ec..543a638 100644
>>--- a/arch/powerpc/platforms/pseries/msi.c
>>+++ b/arch/powerpc/platforms/pseries/msi.c
>>@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>>  	memset(&counts, 0, sizeof(struct msi_counts));
>>
>>  	/* Work out how many devices we have below this PE */
>>-	traverse_pci_devices(pe_dn, count_non_bridge_devices, &counts);
>>+	pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, &counts);
>>
>>  	if (counts.num_devices == 0) {
>>  		pr_err("rtas_msi: found 0 devices under PE for %s\n",
>>@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int request)
>>  	/* else, we have some more calculating to do */
>>  	counts.requestor = pci_device_to_OF_node(dev);
>>  	counts.request = request;
>>-	traverse_pci_devices(pe_dn, count_spare_msis, &counts);
>>+	pci_traverse_device_nodes(pe_dn, count_spare_msis, &counts);
>>
>>  	/* If the quota isn't an integer multiple of the total, we can
>>  	 * use the remainder as spare MSIs for anyone that wants them. */
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 34/50] powerpc/pci: Delay populating pdn
  2015-11-18  4:24   ` Alexey Kardashevskiy
@ 2015-11-23 23:42     ` Gavin Shan
  0 siblings, 0 replies; 157+ messages in thread
From: Gavin Shan @ 2015-11-23 23:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, frowand.list
On Wed, Nov 18, 2015 at 03:24:35PM +1100, Alexey Kardashevskiy wrote:
>On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>The pdn (struct pci_dn) instances are allocated from memblock or
>>bootmem when creating PCI controller (hoses) in setup_arch(). PCI
>>hotplug, which will be supported by proceeding patches, release
>>PCI device nodes and their corresponding pdn on unplugging event.
>>The memory chunks for pdn instances allocated from memblock or
>>bootmem are hard to reused after being released.
>>
>>This delays creating pdn in core_initcall_sync(eeh_dev_phb_init) so
>>that they are allocated from slab. In turn, the memory chunks for
>>them can be reused after being released without problem. Since the
>>pdn and eeh_dev has same life cycle, the eeh_dev is created when
>>pdn is populated. We needn't create eeh_dev with another initcall.
>>The time to create PHB PEs is delayed a bit from core_initcall() to
>>core_initcall_sync().
>
>Why is delayed? I mean what needs to be called before eeh_dev_phb_init()?
>
I think the changelog explains the "why". eeh_dev_phb_init() creates ancestor
PE for all other PEs. The ancestor PEs should be created before other PEs.
eeh_dev_phb_init() depends on PHBs (struct pci_controllers) only.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/eeh.h         |  2 +-
>>  arch/powerpc/include/asm/ppc-pci.h     |  2 --
>>  arch/powerpc/kernel/eeh_dev.c          | 19 ++++-------------
>>  arch/powerpc/kernel/pci_dn.c           | 20 ++++++++++++++++--
>>  arch/powerpc/platforms/maple/pci.c     | 34 ++++++++++++++++++------------
>>  arch/powerpc/platforms/pasemi/pci.c    |  3 ---
>>  arch/powerpc/platforms/powermac/pci.c  | 38 +++++++++++++++++++++-------------
>>  arch/powerpc/platforms/powernv/pci.c   |  3 ---
>>  arch/powerpc/platforms/pseries/setup.c |  6 +-----
>>  9 files changed, 69 insertions(+), 58 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>>index c5eb86f..27352f4 100644
>>--- a/arch/powerpc/include/asm/eeh.h
>>+++ b/arch/powerpc/include/asm/eeh.h
>>@@ -268,7 +268,7 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
>>  const char *eeh_pe_loc_get(struct eeh_pe *pe);
>>  struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
>>
>>-void *eeh_dev_init(struct pci_dn *pdn, void *data);
>>+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
>>  void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
>>  int eeh_init(void);
>>  int __init eeh_ops_register(struct eeh_ops *ops);
>>diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
>>index 8753e4e..0f73de0 100644
>>--- a/arch/powerpc/include/asm/ppc-pci.h
>>+++ b/arch/powerpc/include/asm/ppc-pci.h
>>@@ -39,8 +39,6 @@ void *pci_traverse_device_nodes(struct device_node *start,
>>  void *traverse_pci_dn(struct pci_dn *root,
>>  		      void *(*fn)(struct pci_dn *, void *),
>>  		      void *data);
>>-
>>-extern void pci_devs_phb_init(void);
>>  extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
>>
>>  /* From rtas_pci.h */
>>diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
>>index aabba94..1c4bc35 100644
>>--- a/arch/powerpc/kernel/eeh_dev.c
>>+++ b/arch/powerpc/kernel/eeh_dev.c
>>@@ -44,14 +44,13 @@
>>  /**
>>   * eeh_dev_init - Create EEH device according to OF node
>>   * @pdn: PCI device node
>>- * @data: PHB
>>   *
>>   * It will create EEH device according to the given OF node. The function
>>   * might be called by PCI emunation, DR, PHB hotplug.
>>   */
>>-void *eeh_dev_init(struct pci_dn *pdn, void *data)
>>+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
>>  {
>>-	struct pci_controller *phb = data;
>>+	struct pci_controller *phb = pdn->phb;
>>  	struct eeh_dev *edev;
>>
>>  	/* Allocate EEH device */
>>@@ -68,7 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
>>  	edev->phb = phb;
>>  	INIT_LIST_HEAD(&edev->list);
>>
>>-	return NULL;
>>+	return edev;
>>  }
>>
>>  /**
>>@@ -80,16 +79,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
>>   */
>>  void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
>>  {
>>-	struct pci_dn *root = phb->pci_data;
>>-
>>  	/* EEH PE for PHB */
>>  	eeh_phb_pe_create(phb);
>>-
>>-	/* EEH device for PHB */
>>-	eeh_dev_init(root, phb);
>>-
>>-	/* EEH devices for children OF nodes */
>>-	traverse_pci_dn(root, eeh_dev_init, phb);
>>  }
>>
>>  /**
>>@@ -105,9 +96,7 @@ static int __init eeh_dev_phb_init(void)
>>  	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
>>  		eeh_dev_phb_init_dynamic(phb);
>>
>>-	pr_info("EEH: devices created\n");
>>-
>>  	return 0;
>>  }
>>
>>-core_initcall(eeh_dev_phb_init);
>>+core_initcall_sync(eeh_dev_phb_init);
>
>
>May be remove core_initcall_sync and call eeh_dev_phb_init_dynamic() directly
>from the loop in pci_devs_phb_init()?
>
We can't do that as eeh_dev_phb_init_dynamic() can be called for newly added PHB.
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index aa4110f..581612c 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -272,8 +272,11 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>>  	const __be32 *regs;
>>  	struct device_node *parent;
>>  	struct pci_dn *pdn;
>>+#ifdef CONFIG_EEH
>>+	struct eeh_dev *edev;
>>+#endif
>>
>>-	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
>>+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
>>  	if (pdn == NULL)
>>  		return NULL;
>>  	dn->data = pdn;
>>@@ -302,6 +305,15 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>>  	/* Extended config space */
>>  	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
>>
>>+	/* Create EEH device */
>>+#ifdef CONFIG_EEH
>>+	edev = eeh_dev_init(pdn);
>>+	if (!edev) {
>>+		kfree(pdn);
>>+		return NULL;
>>+	}
>>+#endif
>>+
>>  	/* Attach to parent node */
>>  	INIT_LIST_HEAD(&pdn->child_list);
>>  	INIT_LIST_HEAD(&pdn->list);
>>@@ -486,15 +498,19 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>>   * pci device found underneath.  This routine runs once,
>>   * early in the boot sequence.
>>   */
>>-void __init pci_devs_phb_init(void)
>>+static int __init pci_devs_phb_init(void)
>>  {
>>  	struct pci_controller *phb, *tmp;
>>
>>  	/* This must be done first so the device nodes have valid pci info! */
>>  	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
>>  		pci_devs_phb_init_dynamic(phb);
>>+
>>+	return 0;
>>  }
>>
>>+core_initcall(pci_devs_phb_init);
>>+
>>  static void pci_dev_pdn_setup(struct pci_dev *pdev)
>>  {
>>  	struct pci_dn *pdn;
>>diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
>>index a923230..a2f89e6 100644
>>--- a/arch/powerpc/platforms/maple/pci.c
>>+++ b/arch/powerpc/platforms/maple/pci.c
>>@@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
>>  	DBG(" <- maple_pci_irq_fixup\n");
>>  }
>>
>>+static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
>>+	struct device_node *np, *child;
>>+
>>+	if (hose != u3_agp)
>>+		return 0;
>>+
>>+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
>>+	 * assume there is no P2P bridge on the AGP bus, which should be a
>>+	 * safe assumptions hopefully.
>>+	 */
>>+	np = hose->dn;
>>+	PCI_DN(np)->busno = 0xf0;
>>+	for_each_child_of_node(np, child)
>>+		PCI_DN(child)->busno = 0xf0;
>>+
>>+	return 0;
>>+}
>>+
>>  void __init maple_pci_init(void)
>>  {
>>  	struct device_node *np, *root;
>>@@ -605,19 +625,7 @@ void __init maple_pci_init(void)
>>  	if (ht && maple_add_bridge(ht) != 0)
>>  		of_node_put(ht);
>>
>>-	/* Setup the linkage between OF nodes and PHBs */
>>-	pci_devs_phb_init();
>>-
>>-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
>>-	 * assume there is no P2P bridge on the AGP bus, which should be a
>>-	 * safe assumptions hopefully.
>>-	 */
>>-	if (u3_agp) {
>>-		struct device_node *np = u3_agp->dn;
>>-		PCI_DN(np)->busno = 0xf0;
>>-		for (np = np->child; np; np = np->sibling)
>>-			PCI_DN(np)->busno = 0xf0;
>>-	}
>>+	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
>
>
>This seems an unrelated change.
>
>What is this pcibios_root_bridge_prepare()? How come you do not need one for
>the powernv platform but do need for others? Same question about powermac.
>
The function is fixing up pdn for U3 AGP device. As the pdn creation is delayed
to core_initcall(), the pdn isn't created when maple_pci_init() is called. So
the pdn's fixup work is delay to ppc_md.pcibios_root_bridge_prepare().
>>
>>  	/* Tell pci.c to not change any resource allocations.  */
>>  	pci_add_flags(PCI_PROBE_ONLY);
>>diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
>>index f3a68a0..10c4e8f 100644
>>--- a/arch/powerpc/platforms/pasemi/pci.c
>>+++ b/arch/powerpc/platforms/pasemi/pci.c
>>@@ -229,9 +229,6 @@ void __init pas_pci_init(void)
>>  			of_node_get(np);
>>
>>  	of_node_put(root);
>>-
>>-	/* Setup the linkage between OF nodes and PHBs */
>>-	pci_devs_phb_init();
>>  }
>>
>>  void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
>>diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
>>index 59ab16f..6e06c3b 100644
>>--- a/arch/powerpc/platforms/powermac/pci.c
>>+++ b/arch/powerpc/platforms/powermac/pci.c
>>@@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
>>  #endif /* CONFIG_PPC32 */
>>  }
>>
>>+#ifdef CONFIG_PPC64
>>+static int pmac_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
>>+	struct device_node *np, *child;
>>+
>>+	if (hose != u3_agp)
>>+		return 0;
>>+
>>+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
>>+	 * assume there is no P2P bridge on the AGP bus, which should be a
>>+	 * safe assumptions for now. We should do something better in the
>>+	 * future though
>>+	 */
>>+	np = hose->dn;
>>+	PCI_DN(np)->busno = 0xf0;
>>+	for_each_child_of_node(np, child)
>>+		PCI_DN(child)->busno = 0xf0;
>>+
>>+	return 0;
>>+}
>>+#endif /* CONFIG_PPC64 */
>>+
>>  void __init pmac_pci_init(void)
>>  {
>>  	struct device_node *np, *root;
>>@@ -914,20 +937,7 @@ void __init pmac_pci_init(void)
>>  	if (ht && pmac_add_bridge(ht) != 0)
>>  		of_node_put(ht);
>>
>>-	/* Setup the linkage between OF nodes and PHBs */
>>-	pci_devs_phb_init();
>>-
>>-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
>>-	 * assume there is no P2P bridge on the AGP bus, which should be a
>>-	 * safe assumptions for now. We should do something better in the
>>-	 * future though
>>-	 */
>>-	if (u3_agp) {
>>-		struct device_node *np = u3_agp->dn;
>>-		PCI_DN(np)->busno = 0xf0;
>>-		for (np = np->child; np; np = np->sibling)
>>-			PCI_DN(np)->busno = 0xf0;
>>-	}
>>+	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
>>  	/* pmac_check_ht_link(); */
>>
>>  #else /* CONFIG_PPC64 */
>>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>index fa99daf..d8832ea 100644
>>--- a/arch/powerpc/platforms/powernv/pci.c
>>+++ b/arch/powerpc/platforms/powernv/pci.c
>>@@ -807,9 +807,6 @@ void __init pnv_pci_init(void)
>>  	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
>>  		pnv_pci_init_ioda2_phb(np);
>>
>>-	/* Setup the linkage between OF nodes and PHBs */
>>-	pci_devs_phb_init();
>>-
>>  	/* Configure IOMMU DMA hooks */
>>  	set_pci_dma_ops(&dma_iommu_ops);
>>  }
>>diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
>>index 6c274cb..bdf93a1 100644
>>--- a/arch/powerpc/platforms/pseries/setup.c
>>+++ b/arch/powerpc/platforms/pseries/setup.c
>>@@ -262,11 +262,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
>>  	case OF_RECONFIG_ATTACH_NODE:
>>  		parent = of_get_parent(np);
>>  		pdn = parent ? PCI_DN(parent) : NULL;
>>-		if (pdn) {
>>-			/* Create pdn and EEH device */
>>+		if (pdn)
>>  			pci_add_device_node_info(pdn->phb, np);
>>-			eeh_dev_init(PCI_DN(np), pdn->phb);
>>-		}
>>
>>  		of_node_put(parent);
>>  		break;
>>@@ -489,7 +486,6 @@ static void __init find_and_init_phbs(void)
>>  	}
>>
>>  	of_node_put(root);
>>-	pci_devs_phb_init();
>>
>>  	/*
>>  	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
>>
Thanks,
Gavin
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs
  2015-11-23 23:06     ` Gavin Shan
@ 2015-11-24  0:22       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 157+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-24  0:22 UTC (permalink / raw)
  To: Gavin Shan, Alexey Kardashevskiy
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, frowand.list
On 11/24/2015 10:06 AM, Gavin Shan wrote:
> On Wed, Nov 18, 2015 at 01:23:05PM +1100, Alexey Kardashevskiy wrote:
>> On 11/05/2015 12:12 AM, Gavin Shan wrote:
>>> This adds a reference count of PE, representing the number of PCI
>>> devices associated with the PE. The reference count is increased
>>> or decreased when PCI devices join or leave the PE. Once it becomes
>>> zero, the PE together with its used resources (IO, MMIO, DMA, PELTM,
>>> PELTV) are released to support PCI hot unplug.
>>
>>
>> The commit log suggest the patch only adds a counter, initializes it, and
>> replaces unconditional release of an object (in this case - PE) with the
>> conditional one. But it is more that that...
>>
>
> Yes, it's more than that as stated in the commit log.
More? The commit log only tells about reference counting.
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 245 ++++++++++++++++++++++++++----
>>>   arch/powerpc/platforms/powernv/pci.h      |   1 +
>>>   2 files changed, 218 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 0bb0056..dcffce5 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -129,6 +129,215 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>>   }
>>>
>>> +static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
>>> +{
>>> +	struct pnv_phb *phb = pe->phb;
>>> +	struct iommu_table *tbl;
>>> +	int start, count, i;
>>> +	int64_t rc;
>>> +
>>> +	/* Search for the used DMA32 segments */
>>> +	start = -1;
>>> +	count = 0;
>>> +	for (i = 0; i < phb->ioda.dma32_count; i++) {
>>> +		if (phb->ioda.dma32_segmap[i] != pe->pe_number)
>>> +			continue;
>>> +
>>> +		count++;
>>> +		if (start < 0)
>>> +			start = i;
>>> +	}
>>> +
>>> +	if (!count)
>>> +		return;
>>
>>
>> imho checking pe->table_group.tables[0] != NULL is shorter than the loop above.
>>
>
> Will use it in next revision.
>
>>> +
>>> +	/* Unlink IOMMU table from group */
>>> +	tbl = pe->table_group.tables[0];
>>> +	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>>> +	if (pe->table_group.group) {
>>> +		iommu_group_put(pe->table_group.group);
>>> +		WARN_ON(pe->table_group.group);
>>> +	}
>>> +
>>> +	/* Release IOMMU table */
>>> +	pnv_pci_ioda2_table_free_pages(tbl);
>>
>>
>> This is IODA2 helper with multilevel support, does IODA1 support multilevel
>> TCE tables? If not, it should WARN_ON on levels!=1.
>>
>> Another thing is you should first unprogram TVEs (via
>> opal_pci_map_pe_dma_window), then invalidate the cache (if required, not sure
>> if this is needed on IODA1), only then free the actual table.
>>
>>
>>> +	iommu_free_table(tbl, of_node_full_name(pci_bus_to_OF_node(pe->pbus)));
>>> +
>>> +	/* Disable TVE */
>>> +	for (i = start; i < start + count; i++) {
>>> +		rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
>>> +						i, 0, 0ul, 0ul, 0ul);
>>> +		if (rc)
>>> +			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
>>> +				rc, i);
>>> +
>>> +		phb->ioda.dma32_segmap[i] = IODA_INVALID_PE;
>>> +	}
>>
>>
>> You could implement pnv_pci_ioda1_unset_window/pnv_ioda1_table_free as
>> callbacks, change pnv_pci_ioda2_release_dma_pe() to use them (and rename it
>> to reflect that it supports IODA1 and IODA2).
>>
>>
>>> +}
>>> +
>>> +static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
>>> +static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
>>> +		int num);
>>> +static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
>>> +
>>> +static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)
>>
>>
>> You moved this function and changed it, please do one thing at once (which is
>> "change", not "move").
>>
>>> +{
>>> +	struct iommu_table *tbl;
>>> +	unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
>>> +	int64_t rc;
>>> +
>>> +	if (!weight)
>>> +		return;
>>
>>
>> Checking for pe->table_group.group is better because if we ever change the
>> logic of what gets included to an IOMMU group, we will have to do the change
>> where we add devices to a group but we won't have to touch releasing code.
>>
>>
>>> +
>>> +	tbl = pe->table_group.tables[0];
>>> +	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>>> +	if (rc)
>>> +		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>>> +
>>> +	pnv_pci_ioda2_set_bypass(pe, false);
>>> +	if (pe->table_group.group) {
>>> +		iommu_group_put(pe->table_group.group);
>>> +		WARN_ON(pe->table_group.group);
>>> +	}
>>> +
>>> +	pnv_pci_ioda2_table_free_pages(tbl);
>>> +	iommu_free_table(tbl, "pnv");
>>> +}
>>> +
>>> +static void pnv_ioda_release_dma_pe(struct pnv_ioda_pe *pe)
>>
>> Merge this into pnv_ioda_release_pe() - it is small and called just once.
>>
>>
>>> +{
>>> +	struct pnv_phb *phb = pe->phb;
>>> +
>>> +	switch (phb->type) {
>>> +	case PNV_PHB_IODA1:
>>> +		pnv_pci_ioda1_release_dma_pe(pe);
>>> +		break;
>>> +	case PNV_PHB_IODA2:
>>> +		pnv_pci_ioda2_release_dma_pe(pe);
>>> +		break;
>>> +	default:
>>> +		WARN_ON(1);
>>> +	}
>>> +}
>>> +
>>> +static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win)
>>> +{
>>> +	struct pnv_phb *phb = pe->phb;
>>> +	int index, *segmap = NULL;
>>> +	int64_t rc;
>>> +
>>> +	switch (win) {
>>> +	case OPAL_IO_WINDOW_TYPE:
>>> +		segmap = phb->ioda.io_segmap;
>>> +		break;
>>> +	case OPAL_M32_WINDOW_TYPE:
>>> +		segmap = phb->ioda.m32_segmap;
>>> +		break;
>>> +	case OPAL_M64_WINDOW_TYPE:
>>> +		if (phb->type != PNV_PHB_IODA1)
>>> +			return;
>>> +		segmap = phb->ioda.m64_segmap;
>>> +		break;
>>> +	default:
>>> +		return;
>>
>> Unnecessary return.
>>
>>
>>> +	}
>>> +
>>> +	for (index = 0; index < phb->ioda.total_pe_num; index++) {
>>> +		if (segmap[index] != pe->pe_number)
>>> +			continue;
>>> +
>>> +		if (win == OPAL_M64_WINDOW_TYPE)
>>> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> +					phb->ioda.reserved_pe_idx, win,
>>> +					index / PNV_IODA1_M64_SEGS,
>>> +					index % PNV_IODA1_M64_SEGS);
>>> +		else
>>> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> +					phb->ioda.reserved_pe_idx, win,
>>> +					0, index);
>>> +
>>> +		if (rc != OPAL_SUCCESS)
>>> +			pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
>>> +				rc, win, index);
>>> +
>>> +		segmap[index] = IODA_INVALID_PE;
>>> +	}
>>> +}
>>> +
>>> +static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
>>> +{
>>> +	struct pnv_phb *phb = pe->phb;
>>> +	int win;
>>> +
>>> +	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {
>>> +		if (phb->type == PNV_PHB_IODA2 && win == OPAL_IO_WINDOW_TYPE)
>>> +			continue;
>>
>> Move this check to pnv_ioda_release_window() or move case(win ==
>> OPAL_M64_WINDOW_TYPE):if(phb->type != PNV_PHB_IODA1) from that function here.
>>
>>
>>> +
>>> +		pnv_ioda_release_window(pe, win);
>>> +	}
>>> +}
>>
>> This is shorter and cleaner:
>>
>>
>> static void pnv_ioda_release_window(struct pnv_ioda_pe *pe, int win, int
>> *segmap
>> {
>>         struct pnv_phb *phb = pe->phb;
>>         int index;
>>         int64_t rc;
>>
>>         for (index = 0; index < phb->ioda.total_pe_num; index++) {
>>                 if (segmap[index] != pe->pe_number)
>>                         continue;
>>
>>                 if (win == OPAL_M64_WINDOW_TYPE)
>>                         rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>                                         phb->ioda.reserved_pe_idx, win,
>>                                         index / PNV_IODA1_M64_SEGS,
>>                                         index % PNV_IODA1_M64_SEGS);
>>                 else
>>                         rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>                                         phb->ioda.reserved_pe_idx, win,
>>                                         0, index);
>>
>>                 if (rc != OPAL_SUCCESS)
>>                         pe_warn(pe, "Error %ld unmapping (%d) segment#%d\n",
>>                                 rc, win, index);
>>
>>                 segmap[index] = IODA_INVALID_PE;
>>         }
>> }
>>
>> static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
>> {
>>         pnv_ioda_release_window(pe, OPAL_M32_WINDOW_TYPE,
>> phb->ioda.m32_segmap);
>>         if (phb->type != PNV_PHB_IODA2)
>>                 pnv_ioda_release_window(pe, OPAL_IO_WINDOW_TYPE,
>>                                 phb->ioda.io_segmap);
>> 	else
>>                 pnv_ioda_release_window(pe, OPAL_M64_WINDOW_TYPE,
>>                                 phb->ioda.m64_segmap);
>> }
>>
>>
>> I'd actually merge pnv_ioda_release_pe_seg() into pnv_ioda_release_pe() as
>> well as it is also small and called once.
>>
>>
>>> +
>>> +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb,
>>> +				   struct pnv_ioda_pe *pe);
>>> +static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe);
>>> +static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
>>> +{
>>> +	struct pnv_ioda_pe *tmp, *slave;
>>> +
>>> +	/* Release slave PEs in compound PE */
>>> +	if (pe->flags & PNV_IODA_PE_MASTER) {
>>> +		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
>>> +			pnv_ioda_release_pe(slave);
>>> +	}
>>> +
>>> +	/* Remove the PE from the list */
>>> +	list_del(&pe->list);
>>> +
>>> +	/* Release resources */
>>> +	pnv_ioda_release_dma_pe(pe);
>>> +	pnv_ioda_release_pe_seg(pe);
>>> +	pnv_ioda_deconfigure_pe(pe->phb, pe);
>>> +
>>> +	pnv_ioda_free_pe(pe);
>>> +}
>>> +
>>> +static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
>>> +{
>>> +	if (!pe)
>>> +		return NULL;
>>> +
>>> +	pe->device_count++;
>>> +	return pe;
>>> +}
>>> +
>>> +static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
>>
>>
>> Merge this into pnv_pci_release_device() as it is small and called only once.
>>
>
> I don't think so. The functions pnv_ioda_pe_{get,put}() are paired. I think it's
> good enough to have separate function for the logic included in pnv_ioda_pe_put().
Ok. Another thing - just out of curiosity - is it possible and ok to have 
NULL in pe in these pnv_ioda_pe_put()/pnv_ioda_pe_get()? If it is NULL, 
does not this mean that something went wrong and we want WARN_ON or 
something like this?
>
>>> +{
>>> +	if (!pe)
>>> +		return;
>>> +
>>> +	pe->device_count--;
>>> +	WARN_ON(pe->device_count < 0);
>>> +	if (pe->device_count == 0)
>>> +		pnv_ioda_release_pe(pe);
>>> +}
>>> +
>>> +static void pnv_pci_release_device(struct pci_dev *pdev)
>>> +{
>>> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>> +	struct pnv_phb *phb = hose->private_data;
>>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>> +	struct pnv_ioda_pe *pe;
>>> +
>>> +	if (pdev->is_virtfn)
>>> +		return;
>>> +
>>> +	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
>>> +		return;
>>> +
>>> +	pe = &phb->ioda.pe_array[pdn->pe_number];
>>> +	pnv_ioda_pe_put(pe);
>>> +}
>>> +
>>>   static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>>>   {
>>>   	phb->ioda.pe_array[pe_no].phb = phb;
>>> @@ -724,7 +933,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>>>   	return 0;
>>>   }
>>>
>>> -#ifdef CONFIG_PCI_IOV
>>>   static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>>   {
>>>   	struct pci_dev *parent;
>>> @@ -759,9 +967,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>>   		}
>>>   		rid_end = pe->rid + (count << 8);
>>>   	} else {
>>> +#ifdef CONFIG_PCI_IOV
>>>   		if (pe->flags & PNV_IODA_PE_VF)
>>>   			parent = pe->parent_dev;
>>>   		else
>>> +#endif
>>>   			parent = pe->pdev->bus->self;
>>>   		bcomp = OpalPciBusAll;
>>>   		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>>> @@ -799,11 +1009,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>>
>>>   	pe->pbus = NULL;
>>>   	pe->pdev = NULL;
>>> +#ifdef CONFIG_PCI_IOV
>>>   	pe->parent_dev = NULL;
>>> +#endif
>>
>>
>> These #ifdef movements seem very much unrelated.
>>
>
> It's related: pnv_ioda_deconfigure_pe() was used for VF PE only. Now it's used by all
> types of PEs.
The commit log does not mention either VF or PF.
> pe->parent_dev is declared as below:
>
> #ifdef CONFIG_PCI_IOV
>          struct pci_dev          *parent_dev;
> #endif
>
>>
>>>
>>>   	return 0;
>>>   }
>>> -#endif /* CONFIG_PCI_IOV */
>>>
>>>   static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>>   {
>>> @@ -985,6 +1196,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>>   			continue;
>>>
>>>   		pdn->pe_number = pe->pe_number;
>>> +		pnv_ioda_pe_get(pe);
>>>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>>   			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>>   	}
>>> @@ -1047,9 +1259,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>   			bus->busn_res.start, pe->pe_number);
>>>
>>>   	if (pnv_ioda_configure_pe(phb, pe)) {
>>> -		/* XXX What do we do here ? */
>>> -		pnv_ioda_free_pe(pe);
>>>   		pe->pbus = NULL;
>>> +		pnv_ioda_release_pe(pe);
>>
>>
>> This is unrelated unexplained change.
>>
>
> Will drop it in next revision.
>
>>>   		return NULL;
>>>   	}
>>>
>>> @@ -1199,29 +1410,6 @@ m64_failed:
>>>   	return -EBUSY;
>>>   }
>>>
>>> -static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
>>> -		int num);
>>> -static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
>>> -
>>> -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
>>> -{
>>> -	struct iommu_table    *tbl;
>>> -	int64_t               rc;
>>> -
>>> -	tbl = pe->table_group.tables[0];
>>> -	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>>> -	if (rc)
>>> -		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>>> -
>>> -	pnv_pci_ioda2_set_bypass(pe, false);
>>> -	if (pe->table_group.group) {
>>> -		iommu_group_put(pe->table_group.group);
>>> -		BUG_ON(pe->table_group.group);
>>> -	}
>>> -	pnv_pci_ioda2_table_free_pages(tbl);
>>> -	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>> -}
>>> -
>>>   static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>   {
>>>   	struct pci_bus        *bus;
>>> @@ -1242,7 +1430,7 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>   		if (pe->parent_dev != pdev)
>>>   			continue;
>>>
>>> -		pnv_pci_ioda2_release_dma_pe(pdev, pe);
>>> +		pnv_pci_ioda2_release_dma_pe(pe);
>>
>>
>> This is unrelated change.
>>
>
>
>>>
>>>   		/* Remove from list */
>>>   		mutex_lock(&phb->ioda.pe_list_mutex);
>>> @@ -3124,6 +3312,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>>>   	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
>>>   #endif
>>>   	.enable_device_hook	= pnv_pci_enable_device_hook,
>>> +	.release_device		= pnv_pci_release_device,
>>>   	.window_alignment	= pnv_pci_window_alignment,
>>>   	.setup_bridge		= pnv_pci_setup_bridge,
>>>   	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index ef5271a..3bb10de 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -30,6 +30,7 @@ struct pnv_phb;
>>>   struct pnv_ioda_pe {
>>>   	unsigned long		flags;
>>>   	struct pnv_phb		*phb;
>>> +	int			device_count;
>>
>> Not atomic_t, no kref, no additional mutex, just "int"? Sure about it? If so,
>> put a note to the commit log about what provides a guarantee that there is no
>> race.
>>
>>
>
> It was a kref. Something you suggested on v5 as below:
>
> | You do not need kref here. You call kref_put() in a single location and can do
> | stuff directly, without kref. Just have an "unsigned int" counter and that's
> | it (it does not even have to be atomic if you do not have races but I am not
> | sure you do not).
Aaaaand I still do not see any mentioning why there is no race here.
> |
>
>>>
>>>   	/* A PE can be associated with a single device or an
>>>   	 * entire bus (& children). In the former case, pdev
>>>
>
> Thanks,
> Gavin
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-- 
Alexey
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-11-04 13:12 ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
  2015-11-04 16:07   ` Rob Herring
@ 2015-12-06 20:28   ` Rob Herring
  2015-12-06 21:49     ` Guenter Roeck
  2015-12-06 23:54     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 157+ messages in thread
From: Rob Herring @ 2015-12-06 20:28 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand, Guenter Roeck
+Guenter
On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> In current implementation, unflatten_dt_node() is called recursively
> to unflatten device nodes in FDT blob. It's stress to limited stack
> capacity.
>
> This avoids calling the function recursively, meaning the device
> nodes are unflattened in one call on unflatten_dt_node(): two arrays
> are introduced to track the parent path size and the device node of
> current level of depth, which will be used by the device node on next
> level of depth to be unflattened. Also, the parameter "poffset" and
> "fpsize" are unused and dropped.
Do you plan to respin the OF parts at least soon? There's another
problem Guenter found that of_fdt_unflatten_tree is not re-entrant due
to "depth" being static and this series fixes that. So I'd rather
apply this and avoid adding a mutex if possible.
Rob
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 94 +++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 56 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 173b036..f4793d0 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -355,61 +355,82 @@ static unsigned long populate_node(const void *blob,
>         return fpsize;
>  }
>
> +static void reverse_nodes(struct device_node *parent)
> +{
> +       struct device_node *child, *next;
> +
> +       /* In-depth first */
> +       child = parent->child;
> +       while (child) {
> +               reverse_nodes(child);
> +
> +               child = child->sibling;
> +       }
> +
> +       /* Reverse the nodes in the child list */
> +       child = parent->child;
> +       parent->child = NULL;
> +       while (child) {
> +               next = child->sibling;
> +
> +               child->sibling = parent->child;
> +               parent->child = child;
> +               child = next;
> +       }
> +}
> +
>  /**
>   * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>   * @blob: The parent device tree blob
>   * @mem: Memory chunk to use for allocating device nodes and properties
> - * @poffset: pointer to node in flat tree
>   * @dad: Parent struct device_node
>   * @nodepp: The device_node tree created by the call
> - * @fpsize: Size of the node path up at the current depth.
>   * @dryrun: If true, do not allocate device nodes but still calculate needed
>   * memory size
>   */
>  static void *unflatten_dt_node(const void *blob,
>                                void *mem,
> -                              int *poffset,
>                                struct device_node *dad,
>                                struct device_node **nodepp,
> -                              unsigned long fpsize,
>                                bool dryrun)
>  {
> -       struct device_node *np;
> -       static int depth;
> -       int old_depth;
> -
> -       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
> -       if (!fpsize)
> -               return mem;
> +       struct device_node *root;
> +       int offset = 0, depth = 0;
> +       unsigned long fpsizes[64];
> +       struct device_node *nps[64];
>
> -       old_depth = depth;
> -       *poffset = fdt_next_node(blob, *poffset, &depth);
> -       if (depth < 0)
> -               depth = 0;
> -       while (*poffset > 0 && depth > old_depth)
> -               mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
> -                                       fpsize, dryrun);
> +       if (nodepp)
> +               *nodepp = NULL;
> +
> +       root = dad;
> +       fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
> +       nps[depth++] = dad;
> +       while (offset >= 0 && depth < 64) {
> +               fpsizes[depth] = populate_node(blob, offset, &mem,
> +                                              nps[depth - 1],
> +                                              fpsizes[depth - 1],
> +                                              &nps[depth], dryrun);
> +               if (!fpsizes[depth])
> +                       return mem;
> +
> +               if (!dryrun && nodepp && !*nodepp)
> +                       *nodepp = nps[depth];
> +               if (!dryrun && !root)
> +                       root = nps[depth];
> +
> +               offset = fdt_next_node(blob, offset, &depth);
> +       }
>
> -       if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
> -               pr_err("unflatten: error %d processing FDT\n", *poffset);
> +       if (offset < 0 && offset != -FDT_ERR_NOTFOUND)
> +               pr_err("%s: Error %d processing FDT\n",
> +                      __func__, offset);
>
>         /*
>          * Reverse the child list. Some drivers assumes node order matches .dts
>          * node order
>          */
> -       if (!dryrun && np->child) {
> -               struct device_node *child = np->child;
> -               np->child = NULL;
> -               while (child) {
> -                       struct device_node *next = child->sibling;
> -                       child->sibling = np->child;
> -                       np->child = child;
> -                       child = next;
> -               }
> -       }
> -
> -       if (nodepp)
> -               *nodepp = np;
> +       if (!dryrun)
> +               reverse_nodes(root);
>
>         return mem;
>  }
> @@ -431,7 +452,6 @@ static void __unflatten_device_tree(const void *blob,
>                              void * (*dt_alloc)(u64 size, u64 align))
>  {
>         unsigned long size;
> -       int start;
>         void *mem;
>
>         pr_debug(" -> unflatten_device_tree()\n");
> @@ -452,8 +472,7 @@ static void __unflatten_device_tree(const void *blob,
>         }
>
>         /* First pass, scan for size */
> -       start = 0;
> -       size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
> +       size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
>         size = ALIGN(size, 4);
>
>         pr_debug("  size is %lx, allocating...\n", size);
> @@ -467,8 +486,7 @@ static void __unflatten_device_tree(const void *blob,
>         pr_debug("  unflattening %p...\n", mem);
>
>         /* Second pass, do actual unflattening */
> -       start = 0;
> -       unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
> +       unflatten_dt_node(blob, mem, NULL, mynodes, false);
>         if (be32_to_cpup(mem + size) != 0xdeadbeef)
>                 pr_warning("End of tree marker overwritten: %08x\n",
>                            be32_to_cpup(mem + size));
> --
> 2.1.0
>
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-12-06 20:28   ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Rob Herring
@ 2015-12-06 21:49     ` Guenter Roeck
  2015-12-06 23:54     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 157+ messages in thread
From: Guenter Roeck @ 2015-12-06 21:49 UTC (permalink / raw)
  To: Rob Herring, Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On 12/06/2015 12:28 PM, Rob Herring wrote:
> +Guenter
>
> On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> In current implementation, unflatten_dt_node() is called recursively
>> to unflatten device nodes in FDT blob. It's stress to limited stack
>> capacity.
>>
>> This avoids calling the function recursively, meaning the device
>> nodes are unflattened in one call on unflatten_dt_node(): two arrays
>> are introduced to track the parent path size and the device node of
>> current level of depth, which will be used by the device node on next
>> level of depth to be unflattened. Also, the parameter "poffset" and
>> "fpsize" are unused and dropped.
>
> Do you plan to respin the OF parts at least soon? There's another
> problem Guenter found that of_fdt_unflatten_tree is not re-entrant due
> to "depth" being static and this series fixes that. So I'd rather
> apply this and avoid adding a mutex if possible.
>
Hi Rob,
We see this problem in 4.1, so whatever patch you accept should be
back-ported to at least that release.
Any idea when this patch will be accepted ? We actively see the problem
in our kernel, so I'll need a solution soon. Otherwise I'll have to apply
my patch to our kernel and revert it as soon as the 'real' patch has been
back-ported.
Thanks,
Guenter
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-12-06 20:28   ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Rob Herring
  2015-12-06 21:49     ` Guenter Roeck
@ 2015-12-06 23:54     ` Benjamin Herrenschmidt
  2015-12-07  2:21       ` Guenter Roeck
  1 sibling, 1 reply; 157+ messages in thread
From: Benjamin Herrenschmidt @ 2015-12-06 23:54 UTC (permalink / raw)
  To: Rob Herring, Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Michael Ellerman, aik, Bjorn Helgaas,
	Grant Likely, Pantelis Antoniou, Frank Rowand, Guenter Roeck
On Sun, 2015-12-06 at 14:28 -0600, Rob Herring wrote:
> 
> Do you plan to respin the OF parts at least soon? There's another
> problem Guenter found that of_fdt_unflatten_tree is not re-entrant due
> to "depth" being static and this series fixes that. So I'd rather
> apply this and avoid adding a mutex if possible.
Gavin is on vacation until next year.
Cheers,
Ben.
> Rob
> 
> > 
> > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > ---
> >  drivers/of/fdt.c | 94 +++++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 56 insertions(+), 38 deletions(-)
> > 
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 173b036..f4793d0 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -355,61 +355,82 @@ static unsigned long populate_node(const void *blob,
> >         return fpsize;
> >  }
> > 
> > +static void reverse_nodes(struct device_node *parent)
> > +{
> > +       struct device_node *child, *next;
> > +
> > +       /* In-depth first */
> > +       child = parent->child;
> > +       while (child) {
> > +               reverse_nodes(child);
> > +
> > +               child = child->sibling;
> > +       }
> > +
> > +       /* Reverse the nodes in the child list */
> > +       child = parent->child;
> > +       parent->child = NULL;
> > +       while (child) {
> > +               next = child->sibling;
> > +
> > +               child->sibling = parent->child;
> > +               parent->child = child;
> > +               child = next;
> > +       }
> > +}
> > +
> >  /**
> >   * unflatten_dt_node - Alloc and populate a device_node from the flat tree
> >   * @blob: The parent device tree blob
> >   * @mem: Memory chunk to use for allocating device nodes and properties
> > - * @poffset: pointer to node in flat tree
> >   * @dad: Parent struct device_node
> >   * @nodepp: The device_node tree created by the call
> > - * @fpsize: Size of the node path up at the current depth.
> >   * @dryrun: If true, do not allocate device nodes but still calculate needed
> >   * memory size
> >   */
> >  static void *unflatten_dt_node(const void *blob,
> >                                void *mem,
> > -                              int *poffset,
> >                                struct device_node *dad,
> >                                struct device_node **nodepp,
> > -                              unsigned long fpsize,
> >                                bool dryrun)
> >  {
> > -       struct device_node *np;
> > -       static int depth;
> > -       int old_depth;
> > -
> > -       fpsize = populate_node(blob, *poffset, &mem, dad, fpsize, &np, dryrun);
> > -       if (!fpsize)
> > -               return mem;
> > +       struct device_node *root;
> > +       int offset = 0, depth = 0;
> > +       unsigned long fpsizes[64];
> > +       struct device_node *nps[64];
> > 
> > -       old_depth = depth;
> > -       *poffset = fdt_next_node(blob, *poffset, &depth);
> > -       if (depth < 0)
> > -               depth = 0;
> > -       while (*poffset > 0 && depth > old_depth)
> > -               mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
> > -                                       fpsize, dryrun);
> > +       if (nodepp)
> > +               *nodepp = NULL;
> > +
> > +       root = dad;
> > +       fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
> > +       nps[depth++] = dad;
> > +       while (offset >= 0 && depth < 64) {
> > +               fpsizes[depth] = populate_node(blob, offset, &mem,
> > +                                              nps[depth - 1],
> > +                                              fpsizes[depth - 1],
> > +                                              &nps[depth], dryrun);
> > +               if (!fpsizes[depth])
> > +                       return mem;
> > +
> > +               if (!dryrun && nodepp && !*nodepp)
> > +                       *nodepp = nps[depth];
> > +               if (!dryrun && !root)
> > +                       root = nps[depth];
> > +
> > +               offset = fdt_next_node(blob, offset, &depth);
> > +       }
> > 
> > -       if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
> > -               pr_err("unflatten: error %d processing FDT\n", *poffset);
> > +       if (offset < 0 && offset != -FDT_ERR_NOTFOUND)
> > +               pr_err("%s: Error %d processing FDT\n",
> > +                      __func__, offset);
> > 
> >         /*
> >          * Reverse the child list. Some drivers assumes node order matches .dts
> >          * node order
> >          */
> > -       if (!dryrun && np->child) {
> > -               struct device_node *child = np->child;
> > -               np->child = NULL;
> > -               while (child) {
> > -                       struct device_node *next = child->sibling;
> > -                       child->sibling = np->child;
> > -                       np->child = child;
> > -                       child = next;
> > -               }
> > -       }
> > -
> > -       if (nodepp)
> > -               *nodepp = np;
> > +       if (!dryrun)
> > +               reverse_nodes(root);
> > 
> >         return mem;
> >  }
> > @@ -431,7 +452,6 @@ static void __unflatten_device_tree(const void *blob,
> >                              void * (*dt_alloc)(u64 size, u64 align))
> >  {
> >         unsigned long size;
> > -       int start;
> >         void *mem;
> > 
> >         pr_debug(" -> unflatten_device_tree()\n");
> > @@ -452,8 +472,7 @@ static void __unflatten_device_tree(const void *blob,
> >         }
> > 
> >         /* First pass, scan for size */
> > -       start = 0;
> > -       size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
> > +       size = (unsigned long)unflatten_dt_node(blob, NULL, NULL, NULL, true);
> >         size = ALIGN(size, 4);
> > 
> >         pr_debug("  size is %lx, allocating...\n", size);
> > @@ -467,8 +486,7 @@ static void __unflatten_device_tree(const void *blob,
> >         pr_debug("  unflattening %p...\n", mem);
> > 
> >         /* Second pass, do actual unflattening */
> > -       start = 0;
> > -       unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
> > +       unflatten_dt_node(blob, mem, NULL, mynodes, false);
> >         if (be32_to_cpup(mem + size) != 0xdeadbeef)
> >                 pr_warning("End of tree marker overwritten: %08x\n",
> >                            be32_to_cpup(mem + size));
> > --
> > 2.1.0
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-12-06 23:54     ` Benjamin Herrenschmidt
@ 2015-12-07  2:21       ` Guenter Roeck
  2015-12-07  2:33         ` Rob Herring
  0 siblings, 1 reply; 157+ messages in thread
From: Guenter Roeck @ 2015-12-07  2:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Rob Herring, Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Michael Ellerman, aik, Bjorn Helgaas,
	Grant Likely, Pantelis Antoniou, Frank Rowand
On 12/06/2015 03:54 PM, Benjamin Herrenschmidt wrote:
> On Sun, 2015-12-06 at 14:28 -0600, Rob Herring wrote:
>>
>> Do you plan to respin the OF parts at least soon? There's another
>> problem Guenter found that of_fdt_unflatten_tree is not re-entrant due
>> to "depth" being static and this series fixes that. So I'd rather
>> apply this and avoid adding a mutex if possible.
>
> Gavin is on vacation until next year.
>
That is a bit more than the timeline I am looking for.
Rob, any chance to accept my patch for now ? After all, it can be
reverted after the rework is complete, and it would be easier
to apply to earlier kernels.
Thanks,
Guenter
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-12-07  2:21       ` Guenter Roeck
@ 2015-12-07  2:33         ` Rob Herring
  2015-12-07  3:40           ` Guenter Roeck
  0 siblings, 1 reply; 157+ messages in thread
From: Rob Herring @ 2015-12-07  2:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Benjamin Herrenschmidt, Gavin Shan, linuxppc-dev,
	linux-pci@vger.kernel.org, devicetree@vger.kernel.org,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Sun, Dec 6, 2015 at 8:21 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> On 12/06/2015 03:54 PM, Benjamin Herrenschmidt wrote:
>>
>> On Sun, 2015-12-06 at 14:28 -0600, Rob Herring wrote:
>>>
>>>
>>> Do you plan to respin the OF parts at least soon? There's another
>>> problem Guenter found that of_fdt_unflatten_tree is not re-entrant due
>>> to "depth" being static and this series fixes that. So I'd rather
>>> apply this and avoid adding a mutex if possible.
>>
>>
>> Gavin is on vacation until next year.
>>
>
> That is a bit more than the timeline I am looking for.
>
> Rob, any chance to accept my patch for now ? After all, it can be
> reverted after the rework is complete, and it would be easier
> to apply to earlier kernels.
Yes, will do. It's only 4.1 and later that it should be marked for stable?
Rob
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-12-07  2:33         ` Rob Herring
@ 2015-12-07  3:40           ` Guenter Roeck
  0 siblings, 0 replies; 157+ messages in thread
From: Guenter Roeck @ 2015-12-07  3:40 UTC (permalink / raw)
  To: Rob Herring
  Cc: Benjamin Herrenschmidt, Gavin Shan, linuxppc-dev,
	linux-pci@vger.kernel.org, devicetree@vger.kernel.org,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On 12/06/2015 06:33 PM, Rob Herring wrote:
> On Sun, Dec 6, 2015 at 8:21 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>> On 12/06/2015 03:54 PM, Benjamin Herrenschmidt wrote:
>>>
>>> On Sun, 2015-12-06 at 14:28 -0600, Rob Herring wrote:
>>>>
>>>>
>>>> Do you plan to respin the OF parts at least soon? There's another
>>>> problem Guenter found that of_fdt_unflatten_tree is not re-entrant due
>>>> to "depth" being static and this series fixes that. So I'd rather
>>>> apply this and avoid adding a mutex if possible.
>>>
>>>
>>> Gavin is on vacation until next year.
>>>
>>
>> That is a bit more than the timeline I am looking for.
>>
>> Rob, any chance to accept my patch for now ? After all, it can be
>> reverted after the rework is complete, and it would be easier
>> to apply to earlier kernels.
>
> Yes, will do. It's only 4.1 and later that it should be marked for stable?
>
Yes, I think so. Earlier kernels would need a manual backport.
Thanks,
Guenter
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [v7,49/50] drivers/of: Export OF changeset functions
  2015-11-04 13:12 ` [PATCH v7 49/50] drivers/of: Export OF changeset functions Gavin Shan
  2015-11-04 16:12   ` Rob Herring
@ 2016-01-13 13:54   ` Wolfram Sang
  2016-01-13 21:18     ` Michael Ellerman
  1 sibling, 1 reply; 157+ messages in thread
From: Wolfram Sang @ 2016-01-13 13:54 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, devicetree, aik, linux-pci, panto, grant.likely,
	robherring2, bhelgaas, frowand.list
[-- Attachment #1: Type: text/plain, Size: 609 bytes --]
On Thu, Nov 05, 2015 at 12:12:49AM +1100, Gavin Shan wrote:
> The PowerNV PCI hotplug driver is going to use the OF changeset
> to manage the changed device sub-tree. This exports those OF
> changeset functions for that.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Rob Herring <robh@kernel.org>
I needed something like this, too [1] and rebased my series on top of
this patch. So:
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Thanks,
   Wolfram
[1] https://lkml.org/lkml/2016/1/6/385
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [v7,49/50] drivers/of: Export OF changeset functions
  2016-01-13 13:54   ` [v7,49/50] " Wolfram Sang
@ 2016-01-13 21:18     ` Michael Ellerman
  2016-01-13 21:20       ` Wolfram Sang
  0 siblings, 1 reply; 157+ messages in thread
From: Michael Ellerman @ 2016-01-13 21:18 UTC (permalink / raw)
  To: Wolfram Sang, Gavin Shan
  Cc: devicetree, frowand.list, aik, linux-pci, panto, bhelgaas,
	robherring2, grant.likely, linuxppc-dev
On Wed, 2016-01-13 at 14:54 +0100, Wolfram Sang wrote:
> On Thu, Nov 05, 2015 at 12:12:49AM +1100, Gavin Shan wrote:
> > The PowerNV PCI hotplug driver is going to use the OF changeset
> > to manage the changed device sub-tree. This exports those OF
> > changeset functions for that.
> > 
> > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > Acked-by: Rob Herring <robh@kernel.org>
> 
> I needed something like this, too [1] and rebased my series on top of
> this patch. So:
> 
> Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
OK. This series is kind of in limbo, it might hit 4.6, but it might not. So if
you need this patch before then you should probably pull it in to your series,
or ask Rob to merge it pre-emptively.
cheers
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [v7,49/50] drivers/of: Export OF changeset functions
  2016-01-13 21:18     ` Michael Ellerman
@ 2016-01-13 21:20       ` Wolfram Sang
  2016-01-13 23:53         ` Rob Herring
  0 siblings, 1 reply; 157+ messages in thread
From: Wolfram Sang @ 2016-01-13 21:20 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Gavin Shan, devicetree, frowand.list, aik, linux-pci, panto,
	bhelgaas, robherring2, grant.likely, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 994 bytes --]
On Thu, Jan 14, 2016 at 08:18:06AM +1100, Michael Ellerman wrote:
> On Wed, 2016-01-13 at 14:54 +0100, Wolfram Sang wrote:
> > On Thu, Nov 05, 2015 at 12:12:49AM +1100, Gavin Shan wrote:
> > > The PowerNV PCI hotplug driver is going to use the OF changeset
> > > to manage the changed device sub-tree. This exports those OF
> > > changeset functions for that.
> > > 
> > > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > > Acked-by: Rob Herring <robh@kernel.org>
> > 
> > I needed something like this, too [1] and rebased my series on top of
> > this patch. So:
> > 
> > Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> > Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> 
> OK. This series is kind of in limbo, it might hit 4.6, but it might not. So if
> you need this patch before then you should probably pull it in to your series,
> or ask Rob to merge it pre-emptively.
Yup, this is what I had in mind, too.
Thanks,
   Wolfram
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [v7,49/50] drivers/of: Export OF changeset functions
  2016-01-13 21:20       ` Wolfram Sang
@ 2016-01-13 23:53         ` Rob Herring
  2016-01-14  7:28           ` Wolfram Sang
  0 siblings, 1 reply; 157+ messages in thread
From: Rob Herring @ 2016-01-13 23:53 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Michael Ellerman, Gavin Shan, devicetree@vger.kernel.org,
	Frank Rowand, aik, linux-pci@vger.kernel.org, Pantelis Antoniou,
	Bjorn Helgaas, Grant Likely, linuxppc-dev
On Wed, Jan 13, 2016 at 3:20 PM, Wolfram Sang <wsa@the-dreams.de> wrote:
> On Thu, Jan 14, 2016 at 08:18:06AM +1100, Michael Ellerman wrote:
>> On Wed, 2016-01-13 at 14:54 +0100, Wolfram Sang wrote:
>> > On Thu, Nov 05, 2015 at 12:12:49AM +1100, Gavin Shan wrote:
>> > > The PowerNV PCI hotplug driver is going to use the OF changeset
>> > > to manage the changed device sub-tree. This exports those OF
>> > > changeset functions for that.
>> > >
>> > > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> > > Acked-by: Rob Herring <robh@kernel.org>
>> >
>> > I needed something like this, too [1] and rebased my series on top of
>> > this patch. So:
>> >
>> > Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>> > Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>>
>> OK. This series is kind of in limbo, it might hit 4.6, but it might not. So if
>> you need this patch before then you should probably pull it in to your series,
>> or ask Rob to merge it pre-emptively.
>
> Yup, this is what I had in mind, too.
Given this has been on the list some time and still works, I've
applied this for 4.5. That should simplify dependencies for 4.6.
Rob
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [v7,49/50] drivers/of: Export OF changeset functions
  2016-01-13 23:53         ` Rob Herring
@ 2016-01-14  7:28           ` Wolfram Sang
  0 siblings, 0 replies; 157+ messages in thread
From: Wolfram Sang @ 2016-01-14  7:28 UTC (permalink / raw)
  To: Rob Herring
  Cc: Michael Ellerman, Gavin Shan, devicetree@vger.kernel.org,
	Frank Rowand, aik, linux-pci@vger.kernel.org, Pantelis Antoniou,
	Bjorn Helgaas, Grant Likely, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 153 bytes --]
> Given this has been on the list some time and still works, I've
> applied this for 4.5. That should simplify dependencies for 4.6.
Cool, thanks Rob!
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 157+ messages in thread
* Re: [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node()
  2015-11-04 16:07   ` Rob Herring
  2015-11-04 23:23     ` Gavin Shan
@ 2016-05-13  7:16     ` Geert Uytterhoeven
  2016-05-13 11:31       ` [PATCH] drivers/of: Fix build warning in populate_node() Gavin Shan
  1 sibling, 1 reply; 157+ messages in thread
From: Geert Uytterhoeven @ 2016-05-13  7:16 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, aik, Bjorn Helgaas, Grant Likely,
	Pantelis Antoniou, Frank Rowand
On Wed, Nov 4, 2015 at 5:07 PM, Rob Herring <robherring2@gmail.com> wrote:
> On Wed, Nov 4, 2015 at 7:12 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> In current implementation, unflatten_dt_node() is called recursively
>> to unflatten device nodes in FDT blob. It's stress to limited stack
>> capacity.
>
> Did you actually hit a problem?
>
> Now we have a max depth of 64. Seems like that should be plenty... Any
> idea how this compares to when we run out of stack space?
FWIW, on arm64:
drivers/of/fdt.c:443:1: warning: the frame size of 1136 bytes is
larger than 1024 bytes [-Wframe-larger-than=]
Gr{oetje,eeting}s,
                        Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
^ permalink raw reply	[flat|nested] 157+ messages in thread
* [PATCH] drivers/of: Fix build warning in populate_node()
  2016-05-13  7:16     ` Geert Uytterhoeven
@ 2016-05-13 11:31       ` Gavin Shan
  2016-05-16 14:11         ` Rob Herring
  0 siblings, 1 reply; 157+ messages in thread
From: Gavin Shan @ 2016-05-13 11:31 UTC (permalink / raw)
  To: devicetree
  Cc: linuxppc-dev, linux-pci, geert, robherring2, benh, mpe, aik,
	bhelgaas, grant.likely, panto, frowand.list, Gavin Shan
Function populate_node() is used to unflatten FDT blob to device
tree. It supports maximal 64 level of device nodes. There is one
array @fpsizes[64] tracking the full name length of last unflattened
device node in the corresponding level (index of element in the
array - 1). Build warning is seen with CONFIG_FRAME_WARN=1024 like
below on ARM64 as Geert reported. The issue can be reproduced on
PPC64 as well.
  $ make drivers/of/fdt.o
  drivers/of/fdt.c:443:1: warning: the frame size of 1136 bytes is \
  larger than 1024 bytes [-Wframe-larger-than=]
This changes the data type of @fpsizes[i] from "unsigned long" to
"unsigned int" to avoid the build warning. The return value type
of populate_node() and its @fpsize argument is adjusted accordingly.
With this applied, 256 bytes saved from the stack frame on ARM64 and
PPC64 platforms and the above warning isn't seen.
Fixes: 9ffa9eb ("drivers/of: Avoid recursively calling unflatten_dt_node()")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c95054c..34344a8 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -264,13 +264,13 @@ static void populate_properties(const void *blob,
 		*pprev = NULL;
 }
 
-static unsigned long populate_node(const void *blob,
-				   int offset,
-				   void **mem,
-				   struct device_node *dad,
-				   unsigned long fpsize,
-				   struct device_node **pnp,
-				   bool dryrun)
+static unsigned int populate_node(const void *blob,
+				  int offset,
+				  void **mem,
+				  struct device_node *dad,
+				  unsigned int fpsize,
+				  struct device_node **pnp,
+				  bool dryrun)
 {
 	struct device_node *np;
 	const char *pathp;
@@ -397,7 +397,7 @@ static int unflatten_dt_nodes(const void *blob,
 	struct device_node *root;
 	int offset = 0, depth = 0;
 #define FDT_MAX_DEPTH	64
-	unsigned long fpsizes[FDT_MAX_DEPTH];
+	unsigned int fpsizes[FDT_MAX_DEPTH];
 	struct device_node *nps[FDT_MAX_DEPTH];
 	void *base = mem;
 	bool dryrun = !base;
-- 
2.1.0
^ permalink raw reply related	[flat|nested] 157+ messages in thread
* Re: [PATCH] drivers/of: Fix build warning in populate_node()
  2016-05-13 11:31       ` [PATCH] drivers/of: Fix build warning in populate_node() Gavin Shan
@ 2016-05-16 14:11         ` Rob Herring
  0 siblings, 0 replies; 157+ messages in thread
From: Rob Herring @ 2016-05-16 14:11 UTC (permalink / raw)
  To: Gavin Shan
  Cc: devicetree, linuxppc-dev, linux-pci, geert, robherring2, benh,
	mpe, aik, bhelgaas, grant.likely, panto, frowand.list
On Fri, May 13, 2016 at 09:31:39PM +1000, Gavin Shan wrote:
> Function populate_node() is used to unflatten FDT blob to device
> tree. It supports maximal 64 level of device nodes. There is one
> array @fpsizes[64] tracking the full name length of last unflattened
> device node in the corresponding level (index of element in the
> array - 1). Build warning is seen with CONFIG_FRAME_WARN=1024 like
> below on ARM64 as Geert reported. The issue can be reproduced on
> PPC64 as well.
> 
>   $ make drivers/of/fdt.o
>   drivers/of/fdt.c:443:1: warning: the frame size of 1136 bytes is \
>   larger than 1024 bytes [-Wframe-larger-than=]
> 
> This changes the data type of @fpsizes[i] from "unsigned long" to
> "unsigned int" to avoid the build warning. The return value type
> of populate_node() and its @fpsize argument is adjusted accordingly.
> With this applied, 256 bytes saved from the stack frame on ARM64 and
> PPC64 platforms and the above warning isn't seen.
> 
> Fixes: 9ffa9eb ("drivers/of: Avoid recursively calling unflatten_dt_node()")
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
Applied, thanks.
Rob
^ permalink raw reply	[flat|nested] 157+ messages in thread
end of thread, other threads:[~2016-05-16 14:11 UTC | newest]
Thread overview: 157+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-04 13:12 [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
2015-11-04 13:12 ` [PATCH v7 01/50] PCI: Add pcibios_setup_bridge() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 02/50] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
2015-11-05 22:27   ` Daniel Axtens
2015-11-05 23:44     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 03/50] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
2015-11-05 22:32   ` Daniel Axtens
2015-11-05 23:45     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 04/50] powerpc/powernv: Cleanup on pnv_pci_ioda_controller_ops Gavin Shan
2015-11-05 22:28   ` Daniel Axtens
2015-11-06  1:09     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 05/50] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 06/50] powerpc/powernv: Drop phb->bdfn_to_pe() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 07/50] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
2015-11-04 13:12 ` [PATCH v7 08/50] powerpc/powernv: Rename PE# " Gavin Shan
2015-11-16  8:01   ` Alexey Kardashevskiy
2015-11-17  1:22     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 09/50] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
2015-11-04 13:12 ` [PATCH v7 10/50] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
2015-11-05 22:56   ` Daniel Axtens
2015-11-05 23:52     ` Gavin Shan
2015-11-16  8:01       ` Alexey Kardashevskiy
2015-11-17  0:54         ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 11/50] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
2015-11-12  3:30   ` Daniel Axtens
2015-11-12  4:55     ` Gavin Shan
2015-11-16  8:01       ` Alexey Kardashevskiy
2015-11-17  1:33         ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 12/50] powerpc/powernv: Track M64 segment consumption Gavin Shan
2015-11-12  4:18   ` Daniel Axtens
2015-11-16  8:01   ` Alexey Kardashevskiy
2015-11-17  1:04     ` Gavin Shan
2015-11-19  0:10       ` Alexey Kardashevskiy
2015-11-23 22:42         ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 13/50] powerpc/powernv: Rename M64 related functions Gavin Shan
2015-11-04 13:12 ` [PATCH v7 14/50] powerpc/powernv: M64 support on P7IOC Gavin Shan
2015-11-16  8:01   ` Alexey Kardashevskiy
2015-11-17  1:37     ` Gavin Shan
2015-11-19  0:18       ` Alexey Kardashevskiy
2015-11-22 22:46         ` Gavin Shan
2015-11-16  8:02   ` Alexey Kardashevskiy
2015-11-17  1:38     ` Gavin Shan
2015-11-17  2:11       ` Alexey Kardashevskiy
2015-11-17  2:44         ` Gavin Shan
2015-11-16  8:02   ` Alexey Kardashevskiy
2015-11-17  1:42     ` Gavin Shan
2015-11-17  2:37       ` Alexey Kardashevskiy
2015-11-17  3:04         ` Gavin Shan
2015-11-17  3:40           ` Benjamin Herrenschmidt
2015-11-17  4:43           ` Alexey Kardashevskiy
2015-11-17  8:44             ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 15/50] powerpc/powernv: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 16/50] powerpc/powernv: Define PNV_IODA1_DMA32_SEGSIZE Gavin Shan
2015-11-04 13:12 ` [PATCH v7 17/50] powerpc/powernv: Avoid calculating DMA32 segments on PHB3 Gavin Shan
2015-11-17  1:07   ` Alexey Kardashevskiy
2015-11-17  8:48     ` Gavin Shan
2015-11-17 23:59       ` Alexey Kardashevskiy
2015-11-04 13:12 ` [PATCH v7 18/50] powerpc/powernv: Remove DMA32 PE list Gavin Shan
2015-11-17  1:54   ` Alexey Kardashevskiy
2015-11-17  2:01     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 19/50] powerpc/powernv: Track DMA32 segment consumption Gavin Shan
2015-11-17  0:28   ` Daniel Axtens
2015-11-17  1:55     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 20/50] powerpc/powernv: Improve DMA32 segment calculation Gavin Shan
2015-11-20  3:14   ` Daniel Axtens
2015-11-04 13:12 ` [PATCH v7 21/50] powerpc/powernv: Increase PE# capacity Gavin Shan
2015-11-17  0:29   ` Daniel Axtens
2015-11-17  1:56     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 22/50] powerpc/powernv: Introduce pnv_ioda_init_pe() Gavin Shan
2015-11-17  0:30   ` Daniel Axtens
2015-11-17  1:58     ` Gavin Shan
2015-11-17  2:37       ` Alexey Kardashevskiy
2015-11-17  2:53         ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 23/50] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
2015-11-17  5:08   ` Alexey Kardashevskiy
2015-11-17  9:03     ` Gavin Shan
2015-11-18  0:13       ` Alexey Kardashevskiy
2015-11-22 22:52         ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 24/50] powerpc/powernv: Allocate PE# in reverse order Gavin Shan
2015-11-04 13:12 ` [PATCH v7 25/50] powerpc/powernv: Reserve PE for root bus Gavin Shan
2015-11-17  6:04   ` Alexey Kardashevskiy
2015-11-17  9:06     ` Gavin Shan
2015-11-19  0:21       ` Alexey Kardashevskiy
2015-11-04 13:12 ` [PATCH v7 26/50] powerpc/powernv: Create PEs at PCI hot plugging time Gavin Shan
2015-11-17  7:57   ` Alexey Kardashevskiy
2015-11-17  9:12     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 27/50] powerpc/powernv: Dynamically release PEs Gavin Shan
2015-11-18  2:23   ` Alexey Kardashevskiy
2015-11-23 23:06     ` Gavin Shan
2015-11-24  0:22       ` Alexey Kardashevskiy
2015-11-04 13:12 ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
2015-11-18  2:43   ` [PATCH v7 28/50] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Alexey Kardashevskiy
2015-11-23 23:08     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 29/50] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
2015-11-18  3:59   ` Alexey Kardashevskiy
2015-11-23 23:11     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 30/50] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
2015-11-04 13:12 ` [PATCH v7 31/50] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 32/50] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 33/50] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
2015-11-18  3:14   ` Alexey Kardashevskiy
2015-11-23 23:23     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 34/50] powerpc/pci: Delay populating pdn Gavin Shan
2015-11-18  4:24   ` Alexey Kardashevskiy
2015-11-23 23:42     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 35/50] powerpc/pci: Don't scan empty slot Gavin Shan
2015-11-04 13:12 ` [PATCH v7 36/50] powerpc/pci: Update bridge windows on PCI plug Gavin Shan
2015-11-04 13:12 ` [PATCH v7 37/50] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
2015-11-12  5:11   ` Daniel Axtens
2015-11-12  6:11     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 38/50] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
2015-11-12 22:59   ` Daniel Axtens
2015-11-12 23:25     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 39/50] powerpc/powernv: Fundamental reset " Gavin Shan
2015-11-12  6:15   ` Gavin Shan
2015-11-13  0:08   ` Daniel Axtens
2015-11-13  0:20     ` Gavin Shan
2015-11-13  0:23     ` Benjamin Herrenschmidt
2015-11-13  0:23   ` Daniel Axtens
2015-11-04 13:12 ` [PATCH v7 40/50] powerpc/powernv: Support PCI slot ID Gavin Shan
2015-11-04 13:12 ` [PATCH v7 41/50] powerpc/powernv: Use firmware PCI slot reset infrastructure Gavin Shan
2015-11-04 13:12 ` [PATCH v7 42/50] powerpc/powernv: Functions to get/set PCI slot status Gavin Shan
2015-11-04 13:12 ` [PATCH v7 43/50] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
2015-11-04 13:12 ` [PATCH v7 44/50] drivers/of: Split unflatten_dt_node() Gavin Shan
2015-11-04 18:43   ` Rob Herring
2015-11-04 23:05     ` Gavin Shan
2015-11-04 13:12 ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Gavin Shan
2015-11-04 16:07   ` Rob Herring
2015-11-04 23:23     ` Gavin Shan
2015-11-04 23:26       ` Gavin Shan
2016-05-13  7:16     ` Geert Uytterhoeven
2016-05-13 11:31       ` [PATCH] drivers/of: Fix build warning in populate_node() Gavin Shan
2016-05-16 14:11         ` Rob Herring
2015-12-06 20:28   ` [PATCH v7 45/50] drivers/of: Avoid recursively calling unflatten_dt_node() Rob Herring
2015-12-06 21:49     ` Guenter Roeck
2015-12-06 23:54     ` Benjamin Herrenschmidt
2015-12-07  2:21       ` Guenter Roeck
2015-12-07  2:33         ` Rob Herring
2015-12-07  3:40           ` Guenter Roeck
2015-11-04 13:12 ` [PATCH v7 46/50] drivers/of: Rename unflatten_dt_node() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 47/50] drivers/of: Specify parent node in of_fdt_unflatten_tree() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 48/50] drivers/of: Return allocated memory from of_fdt_unflatten_tree() Gavin Shan
2015-11-04 13:12 ` [PATCH v7 49/50] drivers/of: Export OF changeset functions Gavin Shan
2015-11-04 16:12   ` Rob Herring
2015-11-04 23:23     ` Gavin Shan
2016-01-13 13:54   ` [v7,49/50] " Wolfram Sang
2016-01-13 21:18     ` Michael Ellerman
2016-01-13 21:20       ` Wolfram Sang
2016-01-13 23:53         ` Rob Herring
2016-01-14  7:28           ` Wolfram Sang
2015-11-04 13:12 ` [PATCH v7 50/50] PCI/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
2015-11-18  7:33   ` Alexey Kardashevskiy
2015-11-23 23:16     ` Gavin Shan
2015-11-09  3:09 ` [PATCH v7 00/50] powerpc/powernv: PCI hotplug support Gavin Shan
2015-11-09  4:24   ` Pramod Sudheendra
2015-11-09  4:29     ` Gavin Shan
2015-11-09  6:43       ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).