* [PATCH v2 0/2] PCI hotplug driver fixes
@ 2024-05-14 13:52 Krishna Kumar
2024-05-14 13:52 ` [PATCH v2 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv Krishna Kumar
2024-05-14 13:52 ` [PATCH v2 2/2] powerpc: hotplug driver bridge support Krishna Kumar
0 siblings, 2 replies; 10+ messages in thread
From: Krishna Kumar @ 2024-05-14 13:52 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Krishna Kumar, linux-pci, linux-kernel, mahesh
The fix of Powerpc hotplug driver (drivers/pci/hotplug/pnv_php.c)
addresses below two issues.
1. Kernel Crash during hot unplug of bridge/switch slot.
2. DPC-Support Enablement - Previously, when we do a hot-unplug
operation on a bridge slot, all the ports and devices behind the
bridge-ports would be hot-unplugged/offline, but when we do a hot-plug
operation on the same bridge slot, all the ports and devices behind the
bridge would not get hot-plugged/online. In this case, Only the first
port of the bridge gets enabled and the remaining port/devices remain
unplugged/offline. After the fix, The hot-unplug and hot-plug
operations on the slot associated with the bridge started behaving
correctly and became in sync. Now, after the hot plug operation on the
same slot, all the bridge ports and devices behind the bridge become
hot-plugged/online/restored in the same manner as it was before the
hot-unplug operation.
Krishna Kumar (2):
pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
powerpc: hotplug driver bridge support
arch/powerpc/include/asm/ppc-pci.h | 4 ++++
arch/powerpc/kernel/pci-hotplug.c | 5 ++---
arch/powerpc/kernel/pci_dn.c | 32 ++++++++++++++++++++++++++++++
drivers/pci/hotplug/pnv_php.c | 3 +--
4 files changed, 39 insertions(+), 5 deletions(-)
Changelog:
==========
v2: 14 May 2024
- Used of_property_read_u32() in place of of_get_property() and
of_read_number(). [patch2]
- Removed some unnecessary variable and changed the function return
type from void* to void. [patch2]
- Removed the export declaration of
pci_traverse_sibling_nodes_and_scan_slot() as its not needed.
[patch2]
--
2.45.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv 2024-05-14 13:52 [PATCH v2 0/2] PCI hotplug driver fixes Krishna Kumar @ 2024-05-14 13:52 ` Krishna Kumar 2024-05-14 13:52 ` [PATCH v2 2/2] powerpc: hotplug driver bridge support Krishna Kumar 1 sibling, 0 replies; 10+ messages in thread From: Krishna Kumar @ 2024-05-14 13:52 UTC (permalink / raw) To: linuxppc-dev Cc: Nathan Lynch, Gaurav Batra, linux-pci, Brian King, linux-kernel, Nicholas Piggin, mahesh, Aneesh Kumar K.V, Bjorn Helgaas, Krishna Kumar Description of the problem: The hotplug driver for powerpc (pci/hotplug/pnv_php.c) gives kernel crash when we try to hot-unplug/disable the PCIe switch/bridge from the PHB. Root Cause of Crash: The crash is due to the reason that, though the msi data structure has been released during disable/hot-unplug path and it has been assigned with NULL, still during unregistartion the code was again trying to explicitly disable the msi which causes the Null pointer dereference and kernel crash. Proposed Fix : The fix is to correct the check during unregistration path so that the code should not try to invoke pci_disable_msi/msix() if its data structure is already freed. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Gaurav Batra <gbatra@linux.ibm.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Krishna Kumar <krishnak@linux.ibm.com> --- Command used for reproducing the bug: echo 0 > /sys/bus/pci/slots/C5/power where C5 is slot associated with bridge. Snippet of Crash: Kernel attempted to read user page (10) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000010 Faulting instruction address: 0xc000000000fad7d4 Oops: Kernel access of bad area, sig: 11 [#1] Hardware name: 5105-22E POWER9 0x4e1203 opal:v7.0-39-g4660e63a PowerNV NIP [c000000000fad7d4] mutex_lock+0x34/0x88 LR [c000000000fad7c8] mutex_lock+0x28/0x88 Call Trace: [c00000017075f940] [c000000000fad7c8] mutex_lock+0x28/0x88 (unreliable) [c00000017075f970] [c000000000214464] msi_lock_descs+0x28/0x3c [c00000017075f990] [c0000000008e8be8] pci_disable_msi+0x68/0xa8 [c00000017075f9c0] [c00000000090f0a4] pnv_php_disable_irq+0x2a0/0x2b0 [c00000017075fab0] [c00000000090f128] pnv_php_free_slot+0x74/0xc4 [c00000017075fb30] [c000000000912184] pnv_php_disable_slot.part.0+0x1b8/0x24c [c00000017075fc00] [c000000000902df0] power_write_file+0xf8/0x18c [c00000017075fc80] [c0000000008f84d8] pci_slot_attr_store+0x40/0x5c [c00000017075fca0] [c0000000006b4834] sysfs_kf_write+0x64/0x78 [c00000017075fcc0] [c0000000006b3300] kernfs_fop_write_iter+0x1b8/0x2dc [c00000017075fd10] [c0000000005b3eb0] vfs_write+0x224/0x4e8 [c00000017075fdc0] [c0000000005b44b0] ksys_write+0x88/0x150 [c00000017075fe10] [c000000000030864] system_call_exception+0x124/0x320 [c00000017075fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fffb9748774 Root-Cause: Its safe to call pci_disable_msi() if its associated data structre are not freed (during bailout path). But when the driver code disables the bridge during hot-unplug operation, its msi data structure becomes NULL (php_slot->pdev->dev.msi.data:0000000000000000). This happens before unregistration and during disable path in function msi_device_data_release(). In this case, its not safe to explicitly call pci_disable_msi/msix() due to NULL pointer dereference. But since the current code does so, the crash is happening at the line mutex_lock(&dev->msi.data->mutex). FIX: In the current code, there are two paths to invoke pci_disable_msi/msix(). In the error/bailout path, first argument of the check - if(disable_device || irq > 0), i.e. disable_device is true, so it will always invoke pci_disable_msi/msix(), it will never depend on second argument. In this path it's fine to call pci_disable_msi/msix(). During the slot releasing/disable/hot-unpug path the disable_device is false, irq is having old value which is making the overall check true and causing the crash. Of course, we should not choose the old/stale value of irq but should choose php_slot->irq for check. Also, since php_slot->irq value is always 0 before the check, so it does not matter if it will not be included into the check. So, the check can be formed with only one argument i.e. disable_device. Based on its value pci_disable_msi/msix() will be invoked and this is the fix for the crash. During the bailout path its value will be true and during the hot-unplug operation on the bridge slot, its value will be false. drivers/pci/hotplug/pnv_php.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 694349be9d0a..573a41869c15 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -40,7 +40,6 @@ static void pnv_php_disable_irq(struct pnv_php_slot *php_slot, bool disable_device) { struct pci_dev *pdev = php_slot->pdev; - int irq = php_slot->irq; u16 ctrl; if (php_slot->irq > 0) { @@ -59,7 +58,7 @@ static void pnv_php_disable_irq(struct pnv_php_slot *php_slot, php_slot->wq = NULL; } - if (disable_device || irq > 0) { + if (disable_device) { if (pdev->msix_enabled) pci_disable_msix(pdev); else if (pdev->msi_enabled) -- 2.45.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-14 13:52 [PATCH v2 0/2] PCI hotplug driver fixes Krishna Kumar 2024-05-14 13:52 ` [PATCH v2 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv Krishna Kumar @ 2024-05-14 13:52 ` Krishna Kumar 2024-05-17 2:42 ` Oliver O'Halloran 1 sibling, 1 reply; 10+ messages in thread From: Krishna Kumar @ 2024-05-14 13:52 UTC (permalink / raw) To: linuxppc-dev Cc: Nathan Lynch, Gaurav Batra, linux-pci, Brian King, linux-kernel, Nicholas Piggin, mahesh, Aneesh Kumar K.V, Bjorn Helgaas, Krishna Kumar There is an issue with the hotplug operation when it's done on the bridge/switch slot. The bridge-port and devices behind the bridge, which become offline by hot-unplug operation, don't get hot-plugged/enabled by doing hot-plug operation on that slot. Only the first port of the bridge gets enabled and the remaining port/devices remain unplugged. The hot plug/unplug operation is done by the hotplug driver (drivers/pci/hotplug/pnv_php.c). Root Cause Analysis: This behavior is due to missing code for the DPC switch/bridge. The existing driver depends on pci_hp_add_devices() function for device enablement. This function calls pci_scan_slot() on only one device-node/port of the bridge, not on all the siblings' device-node/port. The missing code needs to be added which will find all the sibling device-nodes/bridge-ports and will run explicit pci_scan_slot() on those. A new function has been added for this purpose which gets invoked from pci_hp_add_devices(). This new function pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling bridge-ports by traversal and explicitly invokes pci_scan_slot on them. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Gaurav Batra <gbatra@linux.ibm.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Krishna Kumar <krishnak@linux.ibm.com> --- Command for reproducing the issue : For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power For hot plug/enable - echo 1 > /sys/bus/pci/slots/C5/power where C5 is slot associated with bridge. Scenario/Tests: Output of lspci -nn before test is given below. This snippet contains devices used for testing on Powernv machine. 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:08:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) 0004:09:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) Output of lspci -tv before test is as follows: # lspci -tv +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | \-03.0-[0a-0e]-- | \-00.1 PMC-Sierra Inc. Device 4052 C5(bridge) and C6(End Point) slot address are as below: # cat /sys/bus/pci/slots/C5/address 0004:02:00 # cat /sys/bus/pci/slots/C6/address 0004:09:00 Hot-unplug operation on slot associated with bridge: # echo 0 > /sys/bus/pci/slots/C5/power # lspci -tv +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]-- | \-00.1 PMC-Sierra Inc. Device 4052 From the above lspci -tv output, it can be observed that hot unplug operation has removed all the PMC-Sierra bridge ports like: 00.0-[03-07], 01.0-[08], 02.0-[09], 03.0-[0a-0e] and the SAS devices behind the bridge-port. Without the fix, when the hot plug operation is done on the same slot, it adds only the first bridge port and doesn't restore all the bridge-ports and devices that it unplugged earlier. Below snippet shows this. Hot-plug operation on the bridge slot without the fix: # echo 1 > /sys/bus/pci/slots/C5/power # lspci -tv +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- | \-00.1 PMC-Sierra Inc. Device 4052 After the fix, it restores all the devices in the same manner how it unplugged them earlier during the hot unplug operation. The below snippet shows the same. Hot-plug operation on bridge slot with the fix: # echo 1 > /sys/bus/pci/slots/C5/power # lspci -tv +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | \-03.0-[0a-0e]-- | \-00.1 PMC-Sierra Inc. Device 4052 Removal of End point device behind bridge are also intact and behaving correctly. Hot-unplug operation on Endpoint device C6: # echo 0 > /sys/bus/pci/slots/C6/power # lspci -tv +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | +-02.0-[09]-- | | \-03.0-[0a-0e]-- | \-00.1 PMC-Sierra Inc. Device 4052 Hot-plug operation on Endpoint device C6: # echo 1 > /sys/bus/pci/slots/C6/power # lspci -tv +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | \-03.0-[0a-0e]-- | \-00.1 PMC-Sierra Inc. Device 4052 arch/powerpc/include/asm/ppc-pci.h | 4 ++++ arch/powerpc/kernel/pci-hotplug.c | 5 ++--- arch/powerpc/kernel/pci_dn.c | 32 ++++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h index a8b7e8682f5b..83db8d0798ac 100644 --- a/arch/powerpc/include/asm/ppc-pci.h +++ b/arch/powerpc/include/asm/ppc-pci.h @@ -28,6 +28,10 @@ struct pci_dn; void *pci_traverse_device_nodes(struct device_node *start, void *(*fn)(struct device_node *, void *), void *data); + +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, + struct pci_bus *bus); + extern void pci_devs_phb_init_dynamic(struct pci_controller *phb); #if defined(CONFIG_IOMMU_API) && (defined(CONFIG_PPC_PSERIES) || \ diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 0fe251c6ac2c..639a3d592fe2 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -106,7 +106,7 @@ EXPORT_SYMBOL_GPL(pci_hp_remove_devices); */ void pci_hp_add_devices(struct pci_bus *bus) { - int slotno, mode, max; + int mode, max; struct pci_dev *dev; struct pci_controller *phb; struct device_node *dn = pci_bus_to_OF_node(bus); @@ -129,8 +129,7 @@ void pci_hp_add_devices(struct pci_bus *bus) * order for fully rescan all the way down to pick them up. * They can have been removed during partial hotplug. */ - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); - pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); + pci_traverse_sibling_nodes_and_scan_slot(dn, bus); max = bus->busn_res.start; /* * Scan bridges that are already configured. We don't touch diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index 38561d6a2079..bea612759832 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev) pdn = pci_get_pdn(pdev); pdev->dev.archdata.pci_data = pdn; } + +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus) +{ + struct device_node *dn; + int slotno; + + u32 class = 0; + + if (!of_property_read_u32(start->child, "class-code", &class)) { + /* Call of pci_scan_slot for non-bridge/EP case */ + if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) { + slotno = PCI_SLOT(PCI_DN(start->child)->devfn); + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); + return; + } + } + + /* Iterate all siblings */ + for_each_child_of_node(start, dn) { + class = 0; + + if (!of_property_read_u32(start->child, "class-code", &class)) { + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */ + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) { + slotno = PCI_SLOT(PCI_DN(dn)->devfn); + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); + } + } + } + +} + DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup); -- 2.45.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-14 13:52 ` [PATCH v2 2/2] powerpc: hotplug driver bridge support Krishna Kumar @ 2024-05-17 2:42 ` Oliver O'Halloran 2024-05-17 11:14 ` krishna kumar 0 siblings, 1 reply; 10+ messages in thread From: Oliver O'Halloran @ 2024-05-17 2:42 UTC (permalink / raw) To: Krishna Kumar Cc: Nathan Lynch, Gaurav Batra, linux-kernel, Nicholas Piggin, mahesh, Aneesh Kumar K.V, Brian King, linux-pci, Bjorn Helgaas, linuxppc-dev On Tue, May 14, 2024 at 11:54 PM Krishna Kumar <krishnak@linux.ibm.com> wrote: > > There is an issue with the hotplug operation when it's done on the > bridge/switch slot. The bridge-port and devices behind the bridge, which > become offline by hot-unplug operation, don't get hot-plugged/enabled by > doing hot-plug operation on that slot. Only the first port of the bridge > gets enabled and the remaining port/devices remain unplugged. The hot > plug/unplug operation is done by the hotplug driver > (drivers/pci/hotplug/pnv_php.c). > > Root Cause Analysis: This behavior is due to missing code for the DPC > switch/bridge. I don't see anything touching DPC in this series? > *snip* > > Command for reproducing the issue : > > For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power > For hot plug/enable - echo 1 > /sys/bus/pci/slots/C5/power > > where C5 is slot associated with bridge. > > Scenario/Tests: > Output of lspci -nn before test is given below. This snippet contains > devices used for testing on Powernv machine. > > 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:08:00.0 Serial Attached SCSI controller [0107]: > Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) > 0004:09:00.0 Serial Attached SCSI controller [0107]: > Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) > > Output of lspci -tv before test is as follows: > > # lspci -tv > +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- > | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 > | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 > | | \-03.0-[0a-0e]-- > | \-00.1 PMC-Sierra Inc. Device 4052 > > C5(bridge) and C6(End Point) slot address are as below: > # cat /sys/bus/pci/slots/C5/address > 0004:02:00 > # cat /sys/bus/pci/slots/C6/address > 0004:09:00 Uh, if I'm reading this right it looks like your "slot" C5 is actually the PCIe switch's internal bus which is definitely not hot pluggable. I find it helps to look at the PCI topology in terms of where the physical PCIe links are. Here we've got: - A link between the PHB (0004:00:00.0) and the switch upstream port (0004:01:00.0) - A link from switch downstream port 0 (0004:02:00.0) to nothing - A link from switch downstream port 1 (0004:02:01.0) to a SAS card - A link from switch downstream port 2 (0004:02:02.0) to a SAS card - A link from switch downstream port 2 (0004:02:03.0) to nothing Note that there's no PCIe link between the switch upstream port (0004:01:00.0) and the downstream ports on bus 0004:02. The connection between those is invisible to us because it's custom bus logic internal to the PCIe switch ASIC. What I think has happened here is that system firmware has supplied bad PCIe slot information to OPAL which has resulted in pnv_php advertising a slot in the wrong place. Assuming this following the usual IBM convention I'd expect the bridge device for C5 to be the PHB's root port and the bus should be 0004:01. It might be worth adding some logic to pnv_php to verify the PCI bridge upstream of the slot actually has the PCIe slot capability to guard against this problem. > Hot-unplug operation on slot associated with bridge: > # echo 0 > /sys/bus/pci/slots/C5/power > # lspci -tv > +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]-- > | \-00.1 PMC-Sierra Inc. Device 4052 Yep, "powering off" C5 doesn't remove the upstream port device. This would create problems if you physically removed the card from C5 since the kernel would assume the switch device is still present. > *snip* > diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c > index 38561d6a2079..bea612759832 100644 > --- a/arch/powerpc/kernel/pci_dn.c > +++ b/arch/powerpc/kernel/pci_dn.c > @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev) > pdn = pci_get_pdn(pdev); > pdev->dev.archdata.pci_data = pdn; > } > + > +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus) > +{ > + struct device_node *dn; > + int slotno; > + > + u32 class = 0; > + > + if (!of_property_read_u32(start->child, "class-code", &class)) { > + /* Call of pci_scan_slot for non-bridge/EP case */ > + if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) { > + slotno = PCI_SLOT(PCI_DN(start->child)->devfn); > + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); > + return; > + } > + } > + > + /* Iterate all siblings */ > + for_each_child_of_node(start, dn) { > + class = 0; > + > + if (!of_property_read_u32(start->child, "class-code", &class)) { > + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */ > + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) { > + slotno = PCI_SLOT(PCI_DN(dn)->devfn); > + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); > + } > + } > + } If you're going to iterate over all the DT nodes why not just scan all of them rather than special casing bridges? IIRC current logic is the way it is because PowerVM only puts single devices under a PHB and in the PowerNV (pnv_php) case the PCIe spec guarantees that only device 0 will be present on the end of a link. If you want to handle the more generic case then feel free, but do it properly. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-17 2:42 ` Oliver O'Halloran @ 2024-05-17 11:14 ` krishna kumar 2024-05-20 14:59 ` Oliver O'Halloran 0 siblings, 1 reply; 10+ messages in thread From: krishna kumar @ 2024-05-17 11:14 UTC (permalink / raw) To: Oliver O'Halloran Cc: Nathan Lynch, Aneesh Kumar K.V, linux-pci, linux-kernel, Nicholas Piggin, mahesh, Gaurav Batra, Bjorn Helgaas, Brian King, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 8935 bytes --] Hi Oliver, Thanks for review. Please find my response below - On 5/17/24 08:12, Oliver O'Halloran wrote: > On Tue, May 14, 2024 at 11:54 PM Krishna Kumar<krishnak@linux.ibm.com> wrote: >> There is an issue with the hotplug operation when it's done on the >> bridge/switch slot. The bridge-port and devices behind the bridge, which >> become offline by hot-unplug operation, don't get hot-plugged/enabled by >> doing hot-plug operation on that slot. Only the first port of the bridge >> gets enabled and the remaining port/devices remain unplugged. The hot >> plug/unplug operation is done by the hotplug driver >> (drivers/pci/hotplug/pnv_php.c). >> >> Root Cause Analysis: This behavior is due to missing code for the DPC >> switch/bridge. > I don't see anything touching DPC in this series? I apologize for the confusion. When I mentioned DPC, I was referring to the downward bridge port (the sibling ones) that were not enabled after the hot plug operation. I didn't mean to refer to DPC events. It seems that this caused some confusion, and I will remove this word in my next version of the patch. Thank you! > >> *snip* >> >> Command for reproducing the issue : >> >> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power >> For hot plug/enable - echo 1 > /sys/bus/pci/slots/C5/power >> >> where C5 is slot associated with bridge. >> >> Scenario/Tests: >> Output of lspci -nn before test is given below. This snippet contains >> devices used for testing on Powernv machine. >> >> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:08:00.0 Serial Attached SCSI controller [0107]: >> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) >> 0004:09:00.0 Serial Attached SCSI controller [0107]: >> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) >> >> Output of lspci -tv before test is as follows: >> >> # lspci -tv >> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- >> | | +-01.0-[08]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 >> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 >> | | \-03.0-[0a-0e]-- >> | \-00.1 PMC-Sierra Inc. Device 4052 >> >> C5(bridge) and C6(End Point) slot address are as below: >> # cat /sys/bus/pci/slots/C5/address >> 0004:02:00 >> # cat /sys/bus/pci/slots/C6/address >> 0004:09:00 > Uh, if I'm reading this right it looks like your "slot" C5 is actually > the PCIe switch's internal bus which is definitely not hot pluggable. It's a hotplug slot. Please see the snippet below: :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- :~$ :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- :~$ :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- :~$ :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- :~$ > I find it helps to look at the PCI topology in terms of where the > physical PCIe links are. Here we've got: > > - A link between the PHB (0004:00:00.0) and the switch upstream port > (0004:01:00.0) > - A link from switch downstream port 0 (0004:02:00.0) to nothing > - A link from switch downstream port 1 (0004:02:01.0) to a SAS card > - A link from switch downstream port 2 (0004:02:02.0) to a SAS card > - A link from switch downstream port 2 (0004:02:03.0) to nothing > > Note that there's no PCIe link between the switch upstream port > (0004:01:00.0) and the downstream ports on bus 0004:02. The connection > between those is invisible to us because it's custom bus logic > internal to the PCIe switch ASIC. What I think has happened here is > that system firmware has supplied bad PCIe slot information to OPAL > which has resulted in pnv_php advertising a slot in the wrong place. > Assuming this following the usual IBM convention I'd expect the bridge > device for C5 to be the PHB's root port and the bus should be 0004:01. It seems like your explanation about the missing 0004:01:00.0 may be correct and could be due to a firmware bug. However, the scope of this patch does not relate to this issue. Additionally, if it starts with 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug operations will remain inconsistent. This patch aims to address the inconsistent behavior of hot-unplug and hot-plug. > It might be worth adding some logic to pnv_php to verify the PCI > bridge upstream of the slot actually has the PCIe slot capability to > guard against this problem. We can have a look at this problem in another patch. > >> Hot-unplug operation on slot associated with bridge: >> # echo 0 > /sys/bus/pci/slots/C5/power >> # lspci -tv >> +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]-- >> | \-00.1 PMC-Sierra Inc. Device 4052 > Yep, "powering off" C5 doesn't remove the upstream port device. This > would create problems if you physically removed the card from C5 since > the kernel would assume the switch device is still present. Powering off C5 does removes 0004:02:00.0 to 0004:02:00.3 (all the downstream sibling bridge ports) and SAS devices behind these bridge ports. But Powering on does not enable them. The behavior should be in sync. Please see the snippet in patch. > >> *snip* > >> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c >> index 38561d6a2079..bea612759832 100644 >> --- a/arch/powerpc/kernel/pci_dn.c >> +++ b/arch/powerpc/kernel/pci_dn.c >> @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev) >> pdn = pci_get_pdn(pdev); >> pdev->dev.archdata.pci_data = pdn; >> } >> + >> +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, struct pci_bus *bus) >> +{ >> + struct device_node *dn; >> + int slotno; >> + >> + u32 class = 0; >> + >> + if (!of_property_read_u32(start->child, "class-code", &class)) { >> + /* Call of pci_scan_slot for non-bridge/EP case */ >> + if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) { >> + slotno = PCI_SLOT(PCI_DN(start->child)->devfn); >> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); >> + return; >> + } >> + } >> + >> + /* Iterate all siblings */ >> + for_each_child_of_node(start, dn) { >> + class = 0; >> + >> + if (!of_property_read_u32(start->child, "class-code", &class)) { >> + /* Call of pci_scan_slot on each sibling-nodes/bridge-ports */ >> + if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) { >> + slotno = PCI_SLOT(PCI_DN(dn)->devfn); >> + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); >> + } >> + } >> + } > If you're going to iterate over all the DT nodes why not just scan all > of them rather than special casing bridges? IIRC current logic is the > way it is because PowerVM only puts single devices under a PHB and in > the PowerNV (pnv_php) case the PCIe spec guarantees that only device 0 > will be present on the end of a link. If you want to handle the more > generic case then feel free, but do it properly. We wanted to handle the more generic case and did not want to be confined to only one device assumption. We want to fix the current inconsistent behavior more generically. Regarding the fix, the fix is obvious: We have to traverse and find the bridge ports from DT and invoke pci_scan_slot() on them. This will discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on the given bus- 0004:02). There is already an existing function, pci_scan_bridge() which is doing invocation of pci_scan_slot () for the devices behind the bridge, in this case for SAS device. So eventually, we are doing a scan of all the entities behind the slot. Would you like me to combine the non-bridge and bridge cases into one? I can attempt to do this. Hopefully, if we incorporate the iterate sibling logic case correctly, we may not need to maintain these two separate cases for bridge and non-bridge. I will attempt this, and if it works, I will include it in the next patch. Thanks. Best Regards, Krishna [-- Attachment #2: Type: text/html, Size: 12472 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-17 11:14 ` krishna kumar @ 2024-05-20 14:59 ` Oliver O'Halloran 2024-05-23 14:52 ` Krishna Kumar 0 siblings, 1 reply; 10+ messages in thread From: Oliver O'Halloran @ 2024-05-20 14:59 UTC (permalink / raw) To: krishna kumar Cc: Nathan Lynch, Aneesh Kumar K.V, linux-pci, linux-kernel, Nicholas Piggin, mahesh, Gaurav Batra, Bjorn Helgaas, Brian King, linuxppc-dev On Fri, May 17, 2024 at 9:15 PM krishna kumar <krishnak@linux.ibm.com> wrote: > > > Uh, if I'm reading this right it looks like your "slot" C5 is actually > > the PCIe switch's internal bus which is definitely not hot pluggable. > > It's a hotplug slot. Please see the snippet below: > > :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug > SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- > :~$ > > :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug > SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- > :~$ > > :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug > SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- > :~$ > > :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug > SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- > :~$ All this is showing is that the switch downstream ports on bus 0004:02 have a slot capability. I already know that (see what I said previously about physical links). The fact the downstream ports have a slot capability also has absolutely nothing to do with anything I was saying. Look at the lspci output for 0004:01:00.0 which is the switch's upstream port. The upstream port device will not have a slot capability because it's a bridge into the virtual PCI bus that is internal to the switch. > It seems like your explanation about the missing 0004:01:00.0 may be > correct and could be due to a firmware bug. However, the scope of this > patch does not relate to this issue. Additionally, if it starts with > 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug > operations will remain inconsistent. This patch aims to address the > inconsistent behavior of hot-unplug and hot-plug. > > *snip* > > > It might be worth adding some logic to pnv_php to verify the PCI > > bridge upstream of the slot actually has the PCIe slot capability to > > guard against this problem. > > We can have a look at this problem in another patch. The point of this series is to fix the behaviour of pnv_php, is it not? Powering off a PCI(e) slot is supposed to render it safe to remove the card in that slot. Currently if you "power off" C5, the kernel is still going to have active references to the switch's upstream port device (at 0004:01:00.0) and the switch management function (at 0004:01:00.1). If the kernel has active references to PCI devices physically located in the slot we supposedly powered off, then the hotplug driver isn't doing its job. The asymmetry between hot add and removal that you're trying to fix here is a side effect of the fact that pnv_php is advertising the wrong thing as a slot. I think you should stop pnv_php from advertising something as a slot when it's not actually a slot because that's the root of all your problems. > We wanted to handle the more generic case and did not want to be confined to > only one device assumption. We want to fix the current inconsistent behavior > more generically. Right, as I said above I don't think handing the more generic case is actually required if pnv_php is doing its job properly. It doesn't hurt though. > Regarding the fix, the fix is obvious: really? > We have to traverse > and find the bridge ports from DT and invoke pci_scan_slot() on them. This will > discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on > the given bus- 0004:02). There is already an existing function, pci_scan_bridge() > which is doing invocation of pci_scan_slot () for the devices behind the bridge, > in this case for SAS device. So eventually, we are doing a scan of all the entities > behind the slot. I already read your patch so I'm not sure why you feel the need to re-describe it in tedious detail. > Would you like me to combine the non-bridge and bridge cases into one? I can attempt > to do this. Hopefully, if we incorporate the iterate sibling logic case correctly, > we may not need to maintain these two separate cases for bridge and non-bridge. I > will attempt this, and if it works, I will include it in the next patch. Thanks. Yes, do that. Also, do not post HTML emails to linux development lists. It breaks plain text inline quoting which makes your messages annoying to reply to. Some linux development lists will also silently drop HTML emails. Please talk to the other LTC engineers about how to set up your mail client to send plain text emails to avoid these problems in the future. Oliver ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-20 14:59 ` Oliver O'Halloran @ 2024-05-23 14:52 ` Krishna Kumar 2024-05-31 10:44 ` Krishna Kumar 0 siblings, 1 reply; 10+ messages in thread From: Krishna Kumar @ 2024-05-23 14:52 UTC (permalink / raw) To: Oliver O'Halloran, krishna kumar Cc: Nathan Lynch, Gaurav Batra, linux-pci, mahesh, Nicholas Piggin, linux-kernel, Aneesh Kumar K.V, Brian King, Bjorn Helgaas, linuxppc-dev Hi Oliver, Thanks for your suggestions. Pls find my response: On 5/20/24 20:29, Oliver O'Halloran wrote: > On Fri, May 17, 2024 at 9:15 PM krishna kumar <krishnak@linux.ibm.com> wrote: >>> Uh, if I'm reading this right it looks like your "slot" C5 is actually >>> the PCIe switch's internal bus which is definitely not hot pluggable. >> It's a hotplug slot. Please see the snippet below: >> >> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug >> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >> :~$ >> >> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug >> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >> :~$ >> >> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug >> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >> :~$ >> >> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug >> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >> :~$ > All this is showing is that the switch downstream ports on bus 0004:02 > have a slot capability. I already know that (see what I said > previously about physical links). The fact the downstream ports have a > slot capability also has absolutely nothing to do with anything I was > saying. Look at the lspci output for 0004:01:00.0 which is the > switch's upstream port. The upstream port device will not have a slot > capability because it's a bridge into the virtual PCI bus that is > internal to the switch. Let me try to understand your suggestion and what needs to be done now: lspci -nn snippet in current scenario: 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] lspci -tv snippet in current scenario: +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) | \-00.1 PMC-Sierra Inc. Device 4052 C5 bus address: [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address 0004:02:00 [root@ltczzess2-lp1 ~]# 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does have this capability. Below snippet tells about this: [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color HotPlug [root@ltczzess2-lp1 ~]# [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- [root@ltczzess2-lp1 ~]# In Function - pnv_php_register_one() is responsible for slot creation from hotplug capable device node: Below is the current code that does check the device node for hot plug capability and takes the decision /* Check if it's hotpluggable slot */ ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32); if (ret || !prop32){ return -ENXIO; } Its obvious that 0004:01:00.0 does not get above criteria fulfilled but 0004:02:00.0 does, so is the current behavior (Upstream port is not became C5 slot but downstream port became C5 slot). I am summarizing your suggested changes. Please let me know if I've got it right: 1. Do you want me to modify the code so that the C5 device-bdf and bus-address become 0004:01:00/0004:01 instead of 0004:02:00/0004:01? 2. When performing a hot-unplug operation on C5, should all devices from 0004:01 be removed? And should all devices from 0004:02 also be removed? I think the answer is yes, but please confirm. 3. When performing a hot-plug operation on C5, should all the devices removed earlier from 0004:01 and 0004:02 be re-attached? 4. Will there be any PCIe topology changes in this workflow? Once you confirm the above requirements, we can discuss how to proceed further. I have some follow up questions from your last mail and I am putting these questions in below paragraphs as inline statements. It will confirm me if we should do above things or not. >> It seems like your explanation about the missing 0004:01:00.0 may be >> correct and could be due to a firmware bug. However, the scope of this >> patch does not relate to this issue. Additionally, if it starts with >> 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug >> operations will remain inconsistent. This patch aims to address the >> inconsistent behavior of hot-unplug and hot-plug. >> >> *snip* >> >>> It might be worth adding some logic to pnv_php to verify the PCI >>> bridge upstream of the slot actually has the PCIe slot capability to >>> guard against this problem. >> We can have a look at this problem in another patch. > The point of this series is to fix the behaviour of pnv_php, is it > not? Yes and we will do necessary things. > Powering off a PCI(e) slot is supposed to render it safe to > remove the card in that slot. Do you mean physical-removal of the device after power-off ? > Currently if you "power off" C5, the > kernel is still going to have active references to the switch's > upstream port device (at 0004:01:00.0) and the switch management > function (at 0004:01:00.1). Yes, since we are only operating on the downstream port of C5, upstream ports' reference to the kernel will remain the same. > If the kernel has active references to PCI > devices physically located in the slot we supposedly powered off, then > the hotplug driver isn't doing its job. We have only powered off the downstream ports, not the upstream port. The upstream port will remain powered on. Do you mean to say that it will cause a problem if we physically remove the device while the upstream port is powered on and the downstream port is powered off? Will it cause a kernel crash? Is this the reason for designating the upstream port as a C5 slot and performing a hot-plug operation on it? Is it correct to select a device port that is not hot-pluggable, designate it as a C5 slot, and perform a hot-plug operation on it? > The asymmetry between hot add > and removal that you're trying to fix here is a side effect of the > fact that pnv_php is advertising the wrong thing as a slot. Pnv-php is displaying the information, what it receives from the device node property. We will attempt to modify the code path that is responsible for this. I am not sure yet what additional code is needed for this, but I will figure it out. Is it okay to change this code? > I think > you should stop pnv_php from advertising something as a slot when it's > not actually a slot because that's the root of all your problems. Okay, I am aligned but need some more clarification. Currently, we are observing this behavior with the PMC-Sierra bridge. Will this behavior occur with all bridges? In other words, will the upstream port capability not be hot-pluggable for all bridges and switches, and therefore not be considered for slot selection? In a previous email, you mentioned that this problem is due to a firmware bug, causing the driver to behave incorrectly and advertise the wrong port as a slot. Assuming the firmware bug is not present, what will be the behavior? Will there be any expected PCI-topology changes in the above "lspci -tv" command? Also, if the firmware bug is not present, do we still need to make changes to the driver code? Best Regards, Krishna >> We wanted to handle the more generic case and did not want to be confined to >> only one device assumption. We want to fix the current inconsistent behavior >> more generically. > Right, as I said above I don't think handing the more generic case is > actually required if pnv_php is doing its job properly. It doesn't > hurt though. > >> Regarding the fix, the fix is obvious: > really? > >> We have to traverse >> and find the bridge ports from DT and invoke pci_scan_slot() on them. This will >> discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on >> the given bus- 0004:02). There is already an existing function, pci_scan_bridge() >> which is doing invocation of pci_scan_slot () for the devices behind the bridge, >> in this case for SAS device. So eventually, we are doing a scan of all the entities >> behind the slot. > I already read your patch so I'm not sure why you feel the need to > re-describe it in tedious detail. > >> Would you like me to combine the non-bridge and bridge cases into one? I can attempt >> to do this. Hopefully, if we incorporate the iterate sibling logic case correctly, >> we may not need to maintain these two separate cases for bridge and non-bridge. I >> will attempt this, and if it works, I will include it in the next patch. Thanks. > Yes, do that. > > Also, do not post HTML emails to linux development lists. It breaks > plain text inline quoting which makes your messages annoying to reply > to. Some linux development lists will also silently drop HTML emails. > Please talk to the other LTC engineers about how to set up your mail > client to send plain text emails to avoid these problems in the > future. > > Oliver ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-23 14:52 ` Krishna Kumar @ 2024-05-31 10:44 ` Krishna Kumar 2024-05-31 13:52 ` Krishna Kumar 0 siblings, 1 reply; 10+ messages in thread From: Krishna Kumar @ 2024-05-31 10:44 UTC (permalink / raw) To: Oliver O'Halloran, krishna kumar Cc: Nathan Lynch, Gaurav Batra, linux-pci, mahesh, Nicholas Piggin, linux-kernel, Aneesh Kumar K.V, Brian King, Bjorn Helgaas, linuxppc-dev On 5/23/24 20:22, Krishna Kumar wrote: > > > Hi Oliver, > > Thanks for your suggestions. Pls find my response: > > On 5/20/24 20:29, Oliver O'Halloran wrote: >> On Fri, May 17, 2024 at 9:15 PM krishna kumar <krishnak@linux.ibm.com> wrote: >>>> Uh, if I'm reading this right it looks like your "slot" C5 is actually >>>> the PCIe switch's internal bus which is definitely not hot pluggable. >>> It's a hotplug slot. Please see the snippet below: >>> >>> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug >>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>> :~$ >>> >>> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug >>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>> :~$ >>> >>> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug >>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>> :~$ >>> >>> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug >>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>> :~$ >> All this is showing is that the switch downstream ports on bus 0004:02 >> have a slot capability. I already know that (see what I said >> previously about physical links). The fact the downstream ports have a >> slot capability also has absolutely nothing to do with anything I was >> saying. Look at the lspci output for 0004:01:00.0 which is the >> switch's upstream port. The upstream port device will not have a slot >> capability because it's a bridge into the virtual PCI bus that is >> internal to the switch. > > Let me try to understand your suggestion and what needs to be done now: > > lspci -nn snippet in current scenario: > > 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > > lspci -tv snippet in current scenario: > > +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- > | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 > | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) > | \-00.1 PMC-Sierra Inc. Device 4052 > > C5 bus address: > > [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address > 0004:02:00 > [root@ltczzess2-lp1 ~]# > > 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does > have this capability. Below snippet tells about this: > > [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color HotPlug > [root@ltczzess2-lp1 ~]# > [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug > SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- > [root@ltczzess2-lp1 ~]# > > In Function - pnv_php_register_one() is responsible for slot creation from > hotplug capable device node: > > Below is the current code that does check the device node for hot plug > capability and takes the decision > > /* Check if it's hotpluggable slot */ > ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32); > if (ret || !prop32){ > return -ENXIO; > } > > Its obvious that 0004:01:00.0 does not get above criteria fulfilled but > 0004:02:00.0 does, so is the current behavior (Upstream port is not became > C5 slot but downstream port became C5 slot). > > I am summarizing your suggested changes. Please let > me know if I've got it right: > > 1. Do you want me to modify the code so that the C5 > device-bdf and bus-address become 0004:01:00/0004:01 > instead of 0004:02:00/0004:01? > > 2. When performing a hot-unplug operation on C5, > should all devices from 0004:01 be removed? And > should all devices from 0004:02 also be removed? > I think the answer is yes, but please confirm. > > 3. When performing a hot-plug operation on C5, > should all the devices removed earlier from 0004:01 > and 0004:02 be re-attached? > > 4. Will there be any PCIe topology changes in this workflow? > > Once you confirm the above requirements, we can discuss > how to proceed further. > I have some follow up questions from your last mail and I am > putting these questions in below paragraphs as inline statements. > It will confirm me if we should do above things or not. > > >>> It seems like your explanation about the missing 0004:01:00.0 may be >>> correct and could be due to a firmware bug. However, the scope of this >>> patch does not relate to this issue. Additionally, if it starts with >>> 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug >>> operations will remain inconsistent. This patch aims to address the >>> inconsistent behavior of hot-unplug and hot-plug. >>> >>> *snip* >>> >>>> It might be worth adding some logic to pnv_php to verify the PCI >>>> bridge upstream of the slot actually has the PCIe slot capability to >>>> guard against this problem. >>> We can have a look at this problem in another patch. >> The point of this series is to fix the behaviour of pnv_php, is it >> not? > > Yes and we will do necessary things. > >> Powering off a PCI(e) slot is supposed to render it safe to >> remove the card in that slot. > > Do you mean physical-removal of the device after power-off ? > >> Currently if you "power off" C5, the >> kernel is still going to have active references to the switch's >> upstream port device (at 0004:01:00.0) and the switch management >> function (at 0004:01:00.1). > > Yes, since we are only operating on the downstream port of C5, > upstream ports' reference to the kernel will remain the same. > >> If the kernel has active references to PCI >> devices physically located in the slot we supposedly powered off, then >> the hotplug driver isn't doing its job. > > We have only powered off the downstream ports, not the upstream port. > The upstream port will remain powered on. Do you mean to say that it > will cause a problem if we physically remove the device while the > upstream port is powered on and the downstream port is powered off? > Will it cause a kernel crash? Is this the reason for designating the > upstream port as a C5 slot and performing a hot-plug operation on it? > Is it correct to select a device port that is not hot-pluggable, > designate it as a C5 slot, and perform a hot-plug operation on it? > > >> The asymmetry between hot add >> and removal that you're trying to fix here is a side effect of the >> fact that pnv_php is advertising the wrong thing as a slot. > > Pnv-php is displaying the information, what it receives from the > device node property. We will attempt to modify the code > path that is responsible for this. I am not sure yet what > additional code is needed for this, but I will figure it > out. Is it okay to change this code? > >> I think >> you should stop pnv_php from advertising something as a slot when it's >> not actually a slot because that's the root of all your problems. > > Okay, I am aligned but need some more clarification. Currently, > we are observing this behavior with the PMC-Sierra bridge. > Will this behavior occur with all bridges? In other words, > will the upstream port capability not be hot-pluggable for > all bridges and switches, and therefore not be considered > for slot selection? > > In a previous email, you mentioned that this problem is due > to a firmware bug, causing the driver to behave incorrectly > and advertise the wrong port as a slot. Assuming the firmware > bug is not present, what will be the behavior? Will there be > any expected PCI-topology changes in the above "lspci -tv" > command? Also, if the firmware bug is not present, do we still > need to make changes to the driver code? > > > Best Regards, > Krishna While I am still waiting for a response on the above points, I would like to add a few more points here: 1. The connection between the upstream and downstream ports is vendor-specific, and we cannot control this. 2. When running "lspci -vvv" on the upstream port, it neither shows its a slot nor a hotplug slot. This is the reason pnv_php does not advertise the upstream port as a slot. I have observed similar behavior for upstream and downstream ports on other architectures and with other switches as well. Snippet for Upstream Port, showing it is neither a slot nor a hotplug slot. # lspci -vvv -s 0004:01:00.0 | grep -i slot ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- [root@ltczzess2-lp1 ~]# # lspci -vvv -s 0004:01:00.0 | grep -i Upstream Capabilities: [40] Express (v2) Upstream Port, MSI 00 Retimer- 2Retimers- CrosslinkRes: Upstream Port ACSCap: SrcValid- TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans+ ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- # Downstream Port snippet : Its a slot and hotplug slot too. # # lspci -vvv -s 0004:02:00.0 | grep -i slot Physical Slot: C5 Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Slot #1, PowerLimit 0.000W; Interlock- NoCompl+ # # lspci -vvv -s 0004:02:00.0 | grep -i hot SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) # 3. The devices are connected to the downstream port, and hotplug operations should only occur on these ports, not on the upstream port. 4. The upstream port and downstream port remain on different buses. I have observed this behavior with other architectures and switches too. 5. Performing a poweroff operation on different ports of the bridge and the devices behind them does not cause any problems for the available upstream port. I have not encountered any tests/scenarios where this could create a problem. Taking all of the above points into consideration, I do not see any need for further code changes in the pnv_php driver regarding this matter. > >>> We wanted to handle the more generic case and did not want to be confined to >>> only one device assumption. We want to fix the current inconsistent behavior >>> more generically. >> Right, as I said above I don't think handing the more generic case is >> actually required if pnv_php is doing its job properly. It doesn't >> hurt though. >> >>> Regarding the fix, the fix is obvious: >> really? >> >>> We have to traverse >>> and find the bridge ports from DT and invoke pci_scan_slot() on them. This will >>> discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on >>> the given bus- 0004:02). There is already an existing function, pci_scan_bridge() >>> which is doing invocation of pci_scan_slot () for the devices behind the bridge, >>> in this case for SAS device. So eventually, we are doing a scan of all the entities >>> behind the slot. >> I already read your patch so I'm not sure why you feel the need to >> re-describe it in tedious detail. >> >>> Would you like me to combine the non-bridge and bridge cases into one? I can attempt >>> to do this. Hopefully, if we incorporate the iterate sibling logic case correctly, >>> we may not need to maintain these two separate cases for bridge and non-bridge. I >>> will attempt this, and if it works, I will include it in the next patch. Thanks. >> Yes, do that. A single call of pci_scan_slot is sufficient to power on the devices in the scenario with multiple ports on the same card. However, it is not enough for a switch containing multiple ports. If the check is removed and we rely on the logic to traverse all the sibling device nodes and invoke pci_scan_slot() on each, in this case, device initialization of NIC ports (represented below) in terms of bar region and so will occur multiple times. Although this is not a problem and it works fine, we have to make a choice whether to proceed with or without the check. Snippet showing multiple port from a single card. This is not on bridge but on same card. +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) | \-00.1 PMC-Sierra Inc. Device 4052 Best regards, Krishna >> Also, do not post HTML emails to linux development lists. It breaks >> plain text inline quoting which makes your messages annoying to reply >> to. Some linux development lists will also silently drop HTML emails. >> Please talk to the other LTC engineers about how to set up your mail >> client to send plain text emails to avoid these problems in the >> future. >> >> Oliver ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-31 10:44 ` Krishna Kumar @ 2024-05-31 13:52 ` Krishna Kumar 2024-06-25 12:10 ` Krishna Kumar 0 siblings, 1 reply; 10+ messages in thread From: Krishna Kumar @ 2024-05-31 13:52 UTC (permalink / raw) To: Oliver O'Halloran, krishna kumar Cc: Nathan Lynch, Gaurav Batra, linux-pci, mahesh, Nicholas Piggin, linux-kernel, Aneesh Kumar K.V, Brian King, Bjorn Helgaas, linuxppc-dev On 5/31/24 16:14, Krishna Kumar wrote: > On 5/23/24 20:22, Krishna Kumar wrote: >> >> Hi Oliver, >> >> Thanks for your suggestions. Pls find my response: >> >> On 5/20/24 20:29, Oliver O'Halloran wrote: >>> On Fri, May 17, 2024 at 9:15 PM krishna kumar <krishnak@linux.ibm.com> wrote: >>>>> Uh, if I'm reading this right it looks like your "slot" C5 is actually >>>>> the PCIe switch's internal bus which is definitely not hot pluggable. >>>> It's a hotplug slot. Please see the snippet below: >>>> >>>> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug >>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>> :~$ >>>> >>>> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug >>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>> :~$ >>>> >>>> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug >>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>> :~$ >>>> >>>> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug >>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>> :~$ >>> All this is showing is that the switch downstream ports on bus 0004:02 >>> have a slot capability. I already know that (see what I said >>> previously about physical links). The fact the downstream ports have a >>> slot capability also has absolutely nothing to do with anything I was >>> saying. Look at the lspci output for 0004:01:00.0 which is the >>> switch's upstream port. The upstream port device will not have a slot >>> capability because it's a bridge into the virtual PCI bus that is >>> internal to the switch. >> Let me try to understand your suggestion and what needs to be done now: >> >> lspci -nn snippet in current scenario: >> >> 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >> >> lspci -tv snippet in current scenario: >> >> +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- >> | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 >> | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) >> | \-00.1 PMC-Sierra Inc. Device 4052 >> >> C5 bus address: >> >> [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address >> 0004:02:00 >> [root@ltczzess2-lp1 ~]# >> >> 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does >> have this capability. Below snippet tells about this: >> >> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color HotPlug >> [root@ltczzess2-lp1 ~]# >> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug >> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >> [root@ltczzess2-lp1 ~]# >> >> In Function - pnv_php_register_one() is responsible for slot creation from >> hotplug capable device node: >> >> Below is the current code that does check the device node for hot plug >> capability and takes the decision >> >> /* Check if it's hotpluggable slot */ >> ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32); >> if (ret || !prop32){ >> return -ENXIO; >> } >> >> Its obvious that 0004:01:00.0 does not get above criteria fulfilled but >> 0004:02:00.0 does, so is the current behavior (Upstream port is not became >> C5 slot but downstream port became C5 slot). >> >> I am summarizing your suggested changes. Please let >> me know if I've got it right: >> >> 1. Do you want me to modify the code so that the C5 >> device-bdf and bus-address become 0004:01:00/0004:01 >> instead of 0004:02:00/0004:01? >> >> 2. When performing a hot-unplug operation on C5, >> should all devices from 0004:01 be removed? And >> should all devices from 0004:02 also be removed? >> I think the answer is yes, but please confirm. >> >> 3. When performing a hot-plug operation on C5, >> should all the devices removed earlier from 0004:01 >> and 0004:02 be re-attached? >> >> 4. Will there be any PCIe topology changes in this workflow? >> >> Once you confirm the above requirements, we can discuss >> how to proceed further. >> I have some follow up questions from your last mail and I am >> putting these questions in below paragraphs as inline statements. >> It will confirm me if we should do above things or not. >> >> >>>> It seems like your explanation about the missing 0004:01:00.0 may be >>>> correct and could be due to a firmware bug. However, the scope of this >>>> patch does not relate to this issue. Additionally, if it starts with >>>> 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug >>>> operations will remain inconsistent. This patch aims to address the >>>> inconsistent behavior of hot-unplug and hot-plug. >>>> >>>> *snip* >>>> >>>>> It might be worth adding some logic to pnv_php to verify the PCI >>>>> bridge upstream of the slot actually has the PCIe slot capability to >>>>> guard against this problem. >>>> We can have a look at this problem in another patch. >>> The point of this series is to fix the behaviour of pnv_php, is it >>> not? >> Yes and we will do necessary things. >> >>> Powering off a PCI(e) slot is supposed to render it safe to >>> remove the card in that slot. >> Do you mean physical-removal of the device after power-off ? >> >>> Currently if you "power off" C5, the >>> kernel is still going to have active references to the switch's >>> upstream port device (at 0004:01:00.0) and the switch management >>> function (at 0004:01:00.1). >> Yes, since we are only operating on the downstream port of C5, >> upstream ports' reference to the kernel will remain the same. >> >>> If the kernel has active references to PCI >>> devices physically located in the slot we supposedly powered off, then >>> the hotplug driver isn't doing its job. >> We have only powered off the downstream ports, not the upstream port. >> The upstream port will remain powered on. Do you mean to say that it >> will cause a problem if we physically remove the device while the >> upstream port is powered on and the downstream port is powered off? >> Will it cause a kernel crash? Is this the reason for designating the >> upstream port as a C5 slot and performing a hot-plug operation on it? >> Is it correct to select a device port that is not hot-pluggable, >> designate it as a C5 slot, and perform a hot-plug operation on it? >> >> >>> The asymmetry between hot add >>> and removal that you're trying to fix here is a side effect of the >>> fact that pnv_php is advertising the wrong thing as a slot. >> Pnv-php is displaying the information, what it receives from the >> device node property. We will attempt to modify the code >> path that is responsible for this. I am not sure yet what >> additional code is needed for this, but I will figure it >> out. Is it okay to change this code? >> >>> I think >>> you should stop pnv_php from advertising something as a slot when it's >>> not actually a slot because that's the root of all your problems. >> Okay, I am aligned but need some more clarification. Currently, >> we are observing this behavior with the PMC-Sierra bridge. >> Will this behavior occur with all bridges? In other words, >> will the upstream port capability not be hot-pluggable for >> all bridges and switches, and therefore not be considered >> for slot selection? >> >> In a previous email, you mentioned that this problem is due >> to a firmware bug, causing the driver to behave incorrectly >> and advertise the wrong port as a slot. Assuming the firmware >> bug is not present, what will be the behavior? Will there be >> any expected PCI-topology changes in the above "lspci -tv" >> command? Also, if the firmware bug is not present, do we still >> need to make changes to the driver code? >> >> >> Best Regards, >> Krishna > > While I am still waiting for a response on the above points, > I would like to add a few more points here: > > 1. The connection between the upstream and downstream > ports is vendor-specific, and we cannot control this. > > 2. When running "lspci -vvv" on the upstream port, it neither > shows its a slot nor a hotplug slot. This is the reason pnv_php > does not advertise the upstream port as a slot. I have observed > similar behavior for upstream and downstream ports on other > architectures and with other switches as well. > > > Snippet for Upstream Port, showing it is neither a slot nor a hotplug > slot. > > # lspci -vvv -s 0004:01:00.0 | grep -i slot > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > [root@ltczzess2-lp1 ~]# > > # lspci -vvv -s 0004:01:00.0 | grep -i Upstream > Capabilities: [40] Express (v2) Upstream Port, MSI 00 > Retimer- 2Retimers- CrosslinkRes: Upstream Port > ACSCap: SrcValid- TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans+ > ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- > # > > Downstream Port snippet : Its a slot and hotplug slot too. > > # > # lspci -vvv -s 0004:02:00.0 | grep -i slot > Physical Slot: C5 > Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00 > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > Slot #1, PowerLimit 0.000W; Interlock- NoCompl+ > # > > # lspci -vvv -s 0004:02:00.0 | grep -i hot > SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > # > > 3. The devices are connected to the downstream port, and hotplug > operations should only occur on these ports, not on the upstream port. > > 4. The upstream port and downstream port remain on different buses. > I have observed this behavior with other architectures and switches too. > > 5. Performing a poweroff operation on different ports of the bridge > and the devices behind them does not cause any problems for the > available upstream port. I have not encountered any tests/scenarios > where this could create a problem. > > Taking all of the above points into consideration, I do not see any need > for further code changes in the pnv_php driver regarding this matter. > > I got one more setup having below configuration. I did some experiment and here are my observations - # lspci -tv +-[0033:00]---00.0-[01-17]----00.0-[02-17]--+-01.0-[03-07]-- | +-04.0-[08-0c]-- | +-05.0-[0d]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM173X | +-06.0-[0e-12]-- | \-07.0-[13-17]-- # lspci -vvv -s 0033:01:00.0 | grep -i slot Physical Slot: WIO Slot3-1 ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 75.000W TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- VC0: Caps: PATOffset=03 MaxTimeSlots=1 RejSnoopTrans- [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# lspci -vvv -s 0033:01:00.0 | grep -i Upstream Capabilities: [68] Express (v2) Upstream Port, MSI 00 [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# lspci -vvv -s 0033:01:00.0 | grep -i hot Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# lspci -nn | grep 0033 0033:00:00.0 PCI bridge [0604]: IBM POWER9 Host Bridge (PHB4) [1014:04c1] 0033:01:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) 0033:02:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) 0033:02:04.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) 0033:02:05.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) 0033:02:06.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) 0033:02:07.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) 0033:0d:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM173X [144d:a824] [root@ltc-boston11 ~]# cat /sys/bus/pci/slots/WIO\ Slot3-1/address 0033:01:00 Hot-Unplug from above upstream port : # echo 0 > /sys/bus/pci/slots/WIO\ Slot3-1/power # lspci -tv +-[0033:00]---00.0-[01-17]-- Hot-plug from above upstream port : [root@ltc-boston11 ~]# echo 1 > /sys/bus/pci/slots/WIO\ Slot3-1/power [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# [root@ltc-boston11 ~]# lspci -tv +-[0033:00]---00.0-[01-17]----00.0-[02-17]--+-01.0-[03-07]-- | +-04.0-[08-0c]-- | +-05.0-[0d]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM173X | +-06.0-[0e-12]-- | \-07.0-[13-17]-- What I want to say from above experiment is - if the upstream port is having the slot capability, pnv_php driver is able to do hot-plug/unplug operation on this. In our older case the upstream-port doesn't have slot capability, so there is no point of having hot-plug operation on this. Do you want to change the code logic to consider upstream-port always a slot? Is it even possible if lspci doesn't advertise it as a slot ? Best Regards, Krishna >>>> We wanted to handle the more generic case and did not want to be confined to >>>> only one device assumption. We want to fix the current inconsistent behavior >>>> more generically. >>> Right, as I said above I don't think handing the more generic case is >>> actually required if pnv_php is doing its job properly. It doesn't >>> hurt though. >>> >>>> Regarding the fix, the fix is obvious: >>> really? >>> >>>> We have to traverse >>>> and find the bridge ports from DT and invoke pci_scan_slot() on them. This will >>>> discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on >>>> the given bus- 0004:02). There is already an existing function, pci_scan_bridge() >>>> which is doing invocation of pci_scan_slot () for the devices behind the bridge, >>>> in this case for SAS device. So eventually, we are doing a scan of all the entities >>>> behind the slot. >>> I already read your patch so I'm not sure why you feel the need to >>> re-describe it in tedious detail. >>> >>>> Would you like me to combine the non-bridge and bridge cases into one? I can attempt >>>> to do this. Hopefully, if we incorporate the iterate sibling logic case correctly, >>>> we may not need to maintain these two separate cases for bridge and non-bridge. I >>>> will attempt this, and if it works, I will include it in the next patch. Thanks. >>> Yes, do that. > A single call of pci_scan_slot is sufficient to power on the devices in the scenario > with multiple ports on the same card. However, it is not enough for a switch > containing multiple ports. If the check is removed and we rely on the logic to > traverse all the sibling device nodes and invoke pci_scan_slot() on each, in > this case, device initialization of NIC ports (represented below) in terms of bar region > and so will occur multiple times. Although this is not a problem and it works fine, we > have to make a choice whether to proceed with or without the check. > > > Snippet showing multiple port from a single card. This is not on bridge but on same > card. > > +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- > | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe > | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 > | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) > | \-00.1 PMC-Sierra Inc. Device 4052 > > > Best regards, > Krishna > > >>> Also, do not post HTML emails to linux development lists. It breaks >>> plain text inline quoting which makes your messages annoying to reply >>> to. Some linux development lists will also silently drop HTML emails. >>> Please talk to the other LTC engineers about how to set up your mail >>> client to send plain text emails to avoid these problems in the >>> future. >>> >>> Oliver ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support 2024-05-31 13:52 ` Krishna Kumar @ 2024-06-25 12:10 ` Krishna Kumar 0 siblings, 0 replies; 10+ messages in thread From: Krishna Kumar @ 2024-06-25 12:10 UTC (permalink / raw) To: Oliver O'Halloran, krishna kumar Cc: Nathan Lynch, Aneesh Kumar K.V, linux-pci, linux-kernel, Nicholas Piggin, mahesh, Gaurav Batra, Bjorn Helgaas, Brian King, linuxppc-dev On 5/31/24 7:22 PM, Krishna Kumar wrote: > On 5/31/24 16:14, Krishna Kumar wrote: >> On 5/23/24 20:22, Krishna Kumar wrote: >>> Hi Oliver, >>> >>> Thanks for your suggestions. Pls find my response: >>> >>> On 5/20/24 20:29, Oliver O'Halloran wrote: >>>> On Fri, May 17, 2024 at 9:15 PM krishna kumar <krishnak@linux.ibm.com> wrote: >>>>>> Uh, if I'm reading this right it looks like your "slot" C5 is actually >>>>>> the PCIe switch's internal bus which is definitely not hot pluggable. >>>>> It's a hotplug slot. Please see the snippet below: >>>>> >>>>> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug >>>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>>> :~$ >>>>> >>>>> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug >>>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>>> :~$ >>>>> >>>>> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug >>>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>>> :~$ >>>>> >>>>> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug >>>>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>>>> :~$ >>>> All this is showing is that the switch downstream ports on bus 0004:02 >>>> have a slot capability. I already know that (see what I said >>>> previously about physical links). The fact the downstream ports have a >>>> slot capability also has absolutely nothing to do with anything I was >>>> saying. Look at the lspci output for 0004:01:00.0 which is the >>>> switch's upstream port. The upstream port device will not have a slot >>>> capability because it's a bridge into the virtual PCI bus that is >>>> internal to the switch. >>> Let me try to understand your suggestion and what needs to be done now: >>> >>> lspci -nn snippet in current scenario: >>> >>> 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >>> 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052] >>> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >>> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >>> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >>> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] >>> >>> lspci -tv snippet in current scenario: >>> >>> +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- >>> | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >>> | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >>> | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >>> | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >>> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 >>> | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) >>> | \-00.1 PMC-Sierra Inc. Device 4052 >>> >>> C5 bus address: >>> >>> [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address >>> 0004:02:00 >>> [root@ltczzess2-lp1 ~]# >>> >>> 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does >>> have this capability. Below snippet tells about this: >>> >>> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color HotPlug >>> [root@ltczzess2-lp1 ~]# >>> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug >>> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >>> [root@ltczzess2-lp1 ~]# >>> >>> In Function - pnv_php_register_one() is responsible for slot creation from >>> hotplug capable device node: >>> >>> Below is the current code that does check the device node for hot plug >>> capability and takes the decision >>> >>> /* Check if it's hotpluggable slot */ >>> ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32); >>> if (ret || !prop32){ >>> return -ENXIO; >>> } >>> >>> Its obvious that 0004:01:00.0 does not get above criteria fulfilled but >>> 0004:02:00.0 does, so is the current behavior (Upstream port is not became >>> C5 slot but downstream port became C5 slot). >>> >>> I am summarizing your suggested changes. Please let >>> me know if I've got it right: >>> >>> 1. Do you want me to modify the code so that the C5 >>> device-bdf and bus-address become 0004:01:00/0004:01 >>> instead of 0004:02:00/0004:01? >>> >>> 2. When performing a hot-unplug operation on C5, >>> should all devices from 0004:01 be removed? And >>> should all devices from 0004:02 also be removed? >>> I think the answer is yes, but please confirm. >>> >>> 3. When performing a hot-plug operation on C5, >>> should all the devices removed earlier from 0004:01 >>> and 0004:02 be re-attached? >>> >>> 4. Will there be any PCIe topology changes in this workflow? >>> >>> Once you confirm the above requirements, we can discuss >>> how to proceed further. >>> I have some follow up questions from your last mail and I am >>> putting these questions in below paragraphs as inline statements. >>> It will confirm me if we should do above things or not. >>> >>> >>>>> It seems like your explanation about the missing 0004:01:00.0 may be >>>>> correct and could be due to a firmware bug. However, the scope of this >>>>> patch does not relate to this issue. Additionally, if it starts with >>>>> 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug >>>>> operations will remain inconsistent. This patch aims to address the >>>>> inconsistent behavior of hot-unplug and hot-plug. >>>>> >>>>> *snip* >>>>> >>>>>> It might be worth adding some logic to pnv_php to verify the PCI >>>>>> bridge upstream of the slot actually has the PCIe slot capability to >>>>>> guard against this problem. >>>>> We can have a look at this problem in another patch. >>>> The point of this series is to fix the behaviour of pnv_php, is it >>>> not? >>> Yes and we will do necessary things. >>> >>>> Powering off a PCI(e) slot is supposed to render it safe to >>>> remove the card in that slot. >>> Do you mean physical-removal of the device after power-off ? >>> >>>> Currently if you "power off" C5, the >>>> kernel is still going to have active references to the switch's >>>> upstream port device (at 0004:01:00.0) and the switch management >>>> function (at 0004:01:00.1). >>> Yes, since we are only operating on the downstream port of C5, >>> upstream ports' reference to the kernel will remain the same. >>> >>>> If the kernel has active references to PCI >>>> devices physically located in the slot we supposedly powered off, then >>>> the hotplug driver isn't doing its job. >>> We have only powered off the downstream ports, not the upstream port. >>> The upstream port will remain powered on. Do you mean to say that it >>> will cause a problem if we physically remove the device while the >>> upstream port is powered on and the downstream port is powered off? >>> Will it cause a kernel crash? Is this the reason for designating the >>> upstream port as a C5 slot and performing a hot-plug operation on it? >>> Is it correct to select a device port that is not hot-pluggable, >>> designate it as a C5 slot, and perform a hot-plug operation on it? >>> >>> >>>> The asymmetry between hot add >>>> and removal that you're trying to fix here is a side effect of the >>>> fact that pnv_php is advertising the wrong thing as a slot. >>> Pnv-php is displaying the information, what it receives from the >>> device node property. We will attempt to modify the code >>> path that is responsible for this. I am not sure yet what >>> additional code is needed for this, but I will figure it >>> out. Is it okay to change this code? >>> >>>> I think >>>> you should stop pnv_php from advertising something as a slot when it's >>>> not actually a slot because that's the root of all your problems. >>> Okay, I am aligned but need some more clarification. Currently, >>> we are observing this behavior with the PMC-Sierra bridge. >>> Will this behavior occur with all bridges? In other words, >>> will the upstream port capability not be hot-pluggable for >>> all bridges and switches, and therefore not be considered >>> for slot selection? >>> >>> In a previous email, you mentioned that this problem is due >>> to a firmware bug, causing the driver to behave incorrectly >>> and advertise the wrong port as a slot. Assuming the firmware >>> bug is not present, what will be the behavior? Will there be >>> any expected PCI-topology changes in the above "lspci -tv" >>> command? Also, if the firmware bug is not present, do we still >>> need to make changes to the driver code? >>> >>> >>> Best Regards, >>> Krishna >> While I am still waiting for a response on the above points, >> I would like to add a few more points here: >> >> 1. The connection between the upstream and downstream >> ports is vendor-specific, and we cannot control this. >> >> 2. When running "lspci -vvv" on the upstream port, it neither >> shows its a slot nor a hotplug slot. This is the reason pnv_php >> does not advertise the upstream port as a slot. I have observed >> similar behavior for upstream and downstream ports on other >> architectures and with other switches as well. >> >> >> Snippet for Upstream Port, showing it is neither a slot nor a hotplug >> slot. >> >> # lspci -vvv -s 0004:01:00.0 | grep -i slot >> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W >> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >> [root@ltczzess2-lp1 ~]# >> >> # lspci -vvv -s 0004:01:00.0 | grep -i Upstream >> Capabilities: [40] Express (v2) Upstream Port, MSI 00 >> Retimer- 2Retimers- CrosslinkRes: Upstream Port >> ACSCap: SrcValid- TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans+ >> ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- >> # >> >> Downstream Port snippet : Its a slot and hotplug slot too. >> >> # >> # lspci -vvv -s 0004:02:00.0 | grep -i slot >> Physical Slot: C5 >> Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00 >> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >> Slot #1, PowerLimit 0.000W; Interlock- NoCompl+ >> # >> >> # lspci -vvv -s 0004:02:00.0 | grep -i hot >> SltCap: AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >> # >> >> 3. The devices are connected to the downstream port, and hotplug >> operations should only occur on these ports, not on the upstream port. >> >> 4. The upstream port and downstream port remain on different buses. >> I have observed this behavior with other architectures and switches too. >> >> 5. Performing a poweroff operation on different ports of the bridge >> and the devices behind them does not cause any problems for the >> available upstream port. I have not encountered any tests/scenarios >> where this could create a problem. >> >> Taking all of the above points into consideration, I do not see any need >> for further code changes in the pnv_php driver regarding this matter. >> >> > I got one more setup having below configuration. I did some experiment > and here are my observations - > > # lspci -tv > +-[0033:00]---00.0-[01-17]----00.0-[02-17]--+-01.0-[03-07]-- > | +-04.0-[08-0c]-- > | +-05.0-[0d]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM173X > | +-06.0-[0e-12]-- > | \-07.0-[13-17]-- > # lspci -vvv -s 0033:01:00.0 | grep -i slot > Physical Slot: WIO Slot3-1 > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 75.000W > TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- > VC0: Caps: PATOffset=03 MaxTimeSlots=1 RejSnoopTrans- > [root@ltc-boston11 ~]# > [root@ltc-boston11 ~]# lspci -vvv -s 0033:01:00.0 | grep -i Upstream > Capabilities: [68] Express (v2) Upstream Port, MSI 00 > [root@ltc-boston11 ~]# > [root@ltc-boston11 ~]# > [root@ltc-boston11 ~]# lspci -vvv -s 0033:01:00.0 | grep -i hot > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > [root@ltc-boston11 ~]# > > [root@ltc-boston11 ~]# lspci -nn | grep 0033 > 0033:00:00.0 PCI bridge [0604]: IBM POWER9 Host Bridge (PHB4) [1014:04c1] > 0033:01:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) > 0033:02:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) > 0033:02:04.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) > 0033:02:05.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) > 0033:02:06.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) > 0033:02:07.0 PCI bridge [0604]: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:9733] (rev aa) > 0033:0d:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM173X [144d:a824] > > [root@ltc-boston11 ~]# cat /sys/bus/pci/slots/WIO\ Slot3-1/address > 0033:01:00 > > Hot-Unplug from above upstream port : > > # echo 0 > /sys/bus/pci/slots/WIO\ Slot3-1/power > # lspci -tv > +-[0033:00]---00.0-[01-17]-- > > > Hot-plug from above upstream port : > > [root@ltc-boston11 ~]# echo 1 > /sys/bus/pci/slots/WIO\ Slot3-1/power > [root@ltc-boston11 ~]# > [root@ltc-boston11 ~]# > [root@ltc-boston11 ~]# > [root@ltc-boston11 ~]# lspci -tv > +-[0033:00]---00.0-[01-17]----00.0-[02-17]--+-01.0-[03-07]-- > | +-04.0-[08-0c]-- > | +-05.0-[0d]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM173X > | +-06.0-[0e-12]-- > | \-07.0-[13-17]-- > > > What I want to say from above experiment is - if the upstream port is > having the slot capability, pnv_php driver is able to do hot-plug/unplug > operation on this. > > In our older case the upstream-port doesn't have slot capability, so there > is no point of having hot-plug operation on this. > > Do you want to change the code logic to consider upstream-port always a slot? Is it even possible > if lspci doesn't advertise it as a slot ? > > > Best Regards, > > Krishna I have uploaded patch version 3, and removed the DPC keyword from it. Did not do any change in upstream port logic since driver does take care of this if the upstream port is having slot property (explained above). Also I have left the check removal, since if I remove the check, the initialization for non-bridge-port (i.e. for End-point) device happens multiple times (Verified with a nic card having 4 ports). I have explained it below too. It has been a while and I think we can merge this patch into mainline. I request to give +1 on this. These patch set will help Raptor Engineering team as well. They have done the testing with these patch set. Two of the 3 bugs will be fixed by these patch set. And I am working towards adding the NVME hot-plug support in this driver. At present link retraining does not happen due to missing DPC support. Best Regards, Krishna > > >>>>> We wanted to handle the more generic case and did not want to be confined to >>>>> only one device assumption. We want to fix the current inconsistent behavior >>>>> more generically. >>>> Right, as I said above I don't think handing the more generic case is >>>> actually required if pnv_php is doing its job properly. It doesn't >>>> hurt though. >>>> >>>>> Regarding the fix, the fix is obvious: >>>> really? >>>> >>>>> We have to traverse >>>>> and find the bridge ports from DT and invoke pci_scan_slot() on them. This will >>>>> discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on >>>>> the given bus- 0004:02). There is already an existing function, pci_scan_bridge() >>>>> which is doing invocation of pci_scan_slot () for the devices behind the bridge, >>>>> in this case for SAS device. So eventually, we are doing a scan of all the entities >>>>> behind the slot. >>>> I already read your patch so I'm not sure why you feel the need to >>>> re-describe it in tedious detail. >>>> >>>>> Would you like me to combine the non-bridge and bridge cases into one? I can attempt >>>>> to do this. Hopefully, if we incorporate the iterate sibling logic case correctly, >>>>> we may not need to maintain these two separate cases for bridge and non-bridge. I >>>>> will attempt this, and if it works, I will include it in the next patch. Thanks. >>>> Yes, do that. >> A single call of pci_scan_slot is sufficient to power on the devices in the scenario >> with multiple ports on the same card. However, it is not enough for a switch >> containing multiple ports. If the check is removed and we rely on the logic to >> traverse all the sibling device nodes and invoke pci_scan_slot() on each, in >> this case, device initialization of NIC ports (represented below) in terms of bar region >> and so will occur multiple times. Although this is not a problem and it works fine, we >> have to make a choice whether to proceed with or without the check. >> >> >> Snippet showing multiple port from a single card. This is not on bridge but on same >> card. >> >> +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]-- >> | | +-01.0-[08]--+-00.0 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | | +-00.1 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | | +-00.2 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | | \-00.3 Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe >> | | +-02.0-[09]----00.0 Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 >> | | \-03.0-[0a]----00.0 IBM PCI-E IPR SAS Adapter (ASIC) >> | \-00.1 PMC-Sierra Inc. Device 4052 >> >> >> Best regards, >> Krishna >> >> >>>> Also, do not post HTML emails to linux development lists. It breaks >>>> plain text inline quoting which makes your messages annoying to reply >>>> to. Some linux development lists will also silently drop HTML emails. >>>> Please talk to the other LTC engineers about how to set up your mail >>>> client to send plain text emails to avoid these problems in the >>>> future. >>>> >>>> Oliver ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-06-25 12:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-14 13:52 [PATCH v2 0/2] PCI hotplug driver fixes Krishna Kumar 2024-05-14 13:52 ` [PATCH v2 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv Krishna Kumar 2024-05-14 13:52 ` [PATCH v2 2/2] powerpc: hotplug driver bridge support Krishna Kumar 2024-05-17 2:42 ` Oliver O'Halloran 2024-05-17 11:14 ` krishna kumar 2024-05-20 14:59 ` Oliver O'Halloran 2024-05-23 14:52 ` Krishna Kumar 2024-05-31 10:44 ` Krishna Kumar 2024-05-31 13:52 ` Krishna Kumar 2024-06-25 12:10 ` Krishna Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).