* [PATCH v3 0/2] PCI/portdrv: Report inter switch P2P links through sysfs
@ 2024-12-11 7:17 Shivasharan S
2024-12-11 7:17 ` [PATCH v3 1/2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
2024-12-11 7:17 ` [PATCH v3 2/2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
0 siblings, 2 replies; 4+ messages in thread
From: Shivasharan S @ 2024-12-11 7:17 UTC (permalink / raw)
To: linux-pci, bhelgaas, manivannan.sadhasivam, logang,
Jonathan.Cameron
Cc: linux-kernel, sumanesh.samanta, sathya.prakash, sjeaugey,
Shivasharan S
Changes done in v3:
Moved the Inter switch p2p link detection to a separate file that can
be enabled with a config option as suggested by Jonathan.
Fixed review comments from Jonathan.
Changes done in v2:
The previous submission of this series was at [1].
As per the feedback received from Mani, the code is moved to PCI portdrv
to create the sysfs entries instead of having a separate kernel module.
A. Introductory definitions:
Virtual Switch: Broadcom(PLX) switches have a capability where a single
physical switch can be divided up into N number of virtual switches at
start of day. For example, a single physical switch with 64 ports can be
configured to appear to the host as 2 switches with 32 ports each. This is
a static configuration that needs to be done before the switch boots, and
cannot generally be changed on the fly. Now consider a GPU in Virtual
switch 1 and a NIC on Virtual switch 2. The key here is that it's actually
the same switch, and IF P2P is enabled between the two virtual switches,
then that would be almost infinite bandwidth between the GPU and the NIC.
However, today there is no way for the host to know that, and host
applications believe that any data exchange between the GPU and NIC must
go through host root port and thus would be slow.
Note: Any such P2P must follow ACS/IOMMU rules, and has to be enabled in
the Broadcom switches.
Inter Switch Link: While the current use-case is about the virtual switch
config above, this could also extend to physical switch, where the two
physical switches have, say, a x16 PCIe connection between them.
B: Goal/Problem statement:
Goal 1: Summary: Provide user applications a means by which they can
discover two virtual switches to be part of the same physical switch or
when physical switches are physically connected to each other, so that
they can discover optimized data path for HPC/AI applications.
With the rapid progression of High Performance Computing (HPC) and
Artificial Intelligence (AI), it is becoming more and more common to have
complex topologies with multiple GPU, NIC, NVMe devices etc interconnected
using multiple switches. HPC and AI libraries like MPI, UCC, NCCL, RCCL,
HCCL etc analyze this topology to build a topology tree to optimize data
path for collective operations like all-reduce etc.
Example:
Host root bridge
---------------------------------------
| | |
NIC1 --- PCI Switch1 PCI Switch2 PCI Switch3 --- NIC2
| | |
GPU1 ------------- GPU2 ------------- GPU3
SERVER 1
In the simple picture above in Server1, Switch1, Switch2, Switch3
are all connected to the host bridge and each switch has a GPU
connected, and Switch1/3 each has a NIC connected.
In a typical AI setup, there are many such servers, each connected by
upper level network switch, and "rail optimized", ie, NIC1 of all
servers are connected to Ethernet Switch1, NIC2 connected to Ethernet
Switch2 etc (Ethernet switches are not shown in picture above)
The GPUs are connected among themselves by some backend fabric, like
NVLINK (NVIDIA).
Assume that in the above diagram, PCI Switch1 and PCI Switch3 are
virtual switches belonging to the same physical switch and thus a very
high speed data link exists between them, but today host applications
have no knowledge about that.
(This is a very simple example, and modern AI infrastructure can be
way more complex than that.)
Now for collective operations like all-reduce, the HPC/AI libraries
analyze the topology above and typically decide on a data path like
this: NIC1->GPU1->GPU2->GPU3-> NIC2 which is suboptimal, because
ideally data should come go in and out through the same NIC because of
"rail optimized" topology.
Some libraries do this:NIC1->GPU1->GPU2->GPU3-> GPU1->NIC1.
The applications do the above because they think data from GPU3 to
NIC1 needs to go through the host root port, which is very
inefficient. What they do not know is that Switch1 and Switch3 are the
same physical entity with virtually infinite bandwidth between them,
and with that, they would have chosen a path like:
NIC1->GPU1->GPU2->GPU3->NIC1, which is the most optimized in the above
example.
Goal 2: Extend Linux P2PDMA distance function pci_p2pdma_distance to
account for Virtual Switch and physical switches connected by inter
switch link. The current implementation of the function has no
knowledge of Virtual switch and inter switch link.
Consider the example below:
-+ Root Port
\+ Switch1 Upstream Port
+-+ Switch1 Downstream Port 0
\- Device A
\+ Switch2 Upstream Port
+-+ Switch2 Downstream Port 0
\- Device B
Suppose Switch1 and Switch2 are virtual switches belonging to the
same physical switch. Today P2PDMA distance between Device A and
Device B will return PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, as kernel has
no idea that switch1 and switch2 are actually physically connected to
each other. We intend to fix that, so that pci_p2pdma_distance now takes
into account switch connectivity information.
C. FAQs
FAQ 1: How does this feature work with ACS/IOMMU?
This feature does NOT add any new connectivity. The inter-switch
/virtual switch connections already follow all ACS/IOMMU rules, and
only if allowed by ACS settings, they allow for data to follow a
shortcut connection between switches and bypass the root port. The
only thing this patch does is provide the switch connection
information to application software and pci_p2pdma_distance clients,
so that they can make intelligent decisions for the data path.
FAQ 2: Is this feature Broadcom specific and will it work for other
vendors?
The current implementation of the patch looks at Broadcom
Vendor specific extensions to determine if switch p2p is enabled.
Thus, the current implementation works only on Broadcom switches. That
being said, other vendors are free to extend/modify the code to
support their switch. The function names, code structure and sysfs path
that exposes the PCI switch p2p is made generic, to allow for extension of
support to other vendors. All broadcom specific functionality is segregated
into a Broadcom specific function.
FAQ 3: Why can't applications read the Broadcom vendor specific
information directly from the config space? Why do we need the sysfs
path?
The vendor specific section of PCIe config space is not readable by
applications running in non-root mode, as such applications can only
read the first few bytes of the config space. Besides, reading the
vendor specific config space will not make the solution generic.
FAQ 4: Will applications still use the standard P2P model of
registering the provider, client etc?
Absolutely. All existing p2p API will work as is. All that this patch
provides is information that a fast connection exists between switches
and/or PCI endpoints. To make the actual p2p DMA, application need
use existing p2p API and follow existing ACS/IOMMU rules
FAQ 5: Why can't we only modify the existing pci_p2pdma_distance
function, and expose a p2pdistance to userspace? Why do we need the
new sysfs entries for pci switch connectivity?
The existing HPC/AI libraries like MPI, UCC, NCCL, RCCL, HCCL etc work
not only with PCIe switches, but also with other kind of connectivity,
like TCP, network switches, infiniband and backend inter GPU
connectivity like NVLINK and AFL. Because of that, the libraries have
matured code that analyzes all the connections and entire topology to
determine the most optimal data path among nodes. Just using
pci_p2pdma_distance does not work for them, because there might be a
shorter path between two nodes using NVLINK or a network switch. In
theory those libraries could be modified to use pci_p2pdma_distance
for PCIe connection and other method for other connection, but in
practice that is near impossible, as those changes are very intrusive
and those libraries have matured for a long time,. Their respective
maintainers are highly reluctant to make such a big change and rather
get only the missing information, that is whether two switches are
connected together. Broadcom has received such first hand feedback.
Forcing everyone to use p2pdistance only will defeat the whole purpose
of this patch. However, we do want to support those libraries that
want to use pci_p2pdma_distance, and that is why we are extending
pci_p2pdma_distance function too. Thus, our goal here is to enable
existing libraries to get only the information they need, while having
means for new code or more flexible code to use pci_p2pdma_distance as
needed.
[1] https://lore.kernel.org/linux-pci/1718191656-32714-1-git-send-email-shivasharan.srikanteshwara@broadcom.com/
Shivasharan S (2):
PCI/portdrv: Enable reporting inter-switch P2P links
PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
Documentation/ABI/testing/sysfs-bus-pci | 14 +++
drivers/pci/p2pdma.c | 18 ++-
drivers/pci/pcie/Kconfig | 9 ++
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/p2p_link.c | 161 ++++++++++++++++++++++++
drivers/pci/pcie/p2p_link.h | 32 +++++
drivers/pci/pcie/portdrv.c | 5 +-
7 files changed, 238 insertions(+), 2 deletions(-)
create mode 100644 drivers/pci/pcie/p2p_link.c
create mode 100644 drivers/pci/pcie/p2p_link.h
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v3 1/2] PCI/portdrv: Enable reporting inter-switch P2P links
2024-12-11 7:17 [PATCH v3 0/2] PCI/portdrv: Report inter switch P2P links through sysfs Shivasharan S
@ 2024-12-11 7:17 ` Shivasharan S
2024-12-11 7:17 ` [PATCH v3 2/2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
1 sibling, 0 replies; 4+ messages in thread
From: Shivasharan S @ 2024-12-11 7:17 UTC (permalink / raw)
To: linux-pci, bhelgaas, manivannan.sadhasivam, logang,
Jonathan.Cameron
Cc: linux-kernel, sumanesh.samanta, sathya.prakash, sjeaugey,
Shivasharan S
Broadcom PCI switches supports inter-switch P2P links between two
PCI-to-PCI bridges. This presents an optimal data path for data
movement. The patch exports a new sysfs entry for PCI devices that
support the inter switch P2P links and reports the B:D:F information
of the devices that are connected through this inter switch link as
shown below:
Host root bridge
---------------------------------------
| |
NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2
(2c:00.0) (2a:00.0) (3d:00.0) (40:00.0)
| |
GPU1 GPU2
(2d:00.0) (3f:00.0)
SERVER 1
$ find /sys/ -name "links" | xargs grep .
/sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link/links:0000:3d:00.0
/sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link/links:0000:2a:00.0
Current support is added to report the P2P links available for
Broadcom switches based on the capability that is reported by the
upstream bridges through their vendor-specific capability registers.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
---
Changes in v3:
Moved the link detection code to separate file that can be enabled with
config option CONFIG_PCIE_P2P_LINK
Changes in v2:
Integrated the code into PCIe portdrv to create the sysfs entries during
probe, as suggested by Mani.
Documentation/ABI/testing/sysfs-bus-pci | 14 +++
drivers/pci/pcie/Kconfig | 9 ++
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/p2p_link.c | 143 ++++++++++++++++++++++++
drivers/pci/pcie/p2p_link.h | 27 +++++
drivers/pci/pcie/portdrv.c | 5 +-
6 files changed, 198 insertions(+), 1 deletion(-)
create mode 100644 drivers/pci/pcie/p2p_link.c
create mode 100644 drivers/pci/pcie/p2p_link.h
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 5da6a14dc326..200cb3f214bf 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -583,3 +583,17 @@ Description:
enclosure-specific indications "specific0" to "specific7",
hence the corresponding led class devices are unavailable if
the DSM interface is used.
+
+What: /sys/bus/pci/devices/.../p2p_link/links
+Date: September 2024
+Contact: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
+Description:
+ This file appears on PCIe upstream ports which supports an
+ internal P2P link.
+ Reading this attribute will provide the list of other upstream
+ ports on the system which have an internal P2P link available
+ between the two ports.
+Users:
+ Userspace applications interested in determining a optimal P2P
+ link for data transfers between devices connected to the PCIe
+ switches.
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 17919b99fa66..9afa9016cdf3 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -155,3 +155,12 @@ config PCIE_EDR
the PCI Firmware Specification r3.2. Enable this if you want to
support hybrid DPC model which uses both firmware and OS to
implement DPC.
+
+config PCIE_P2P_LINK
+ bool "PCI Express P2P link detection support"
+ depends on PCIEPORTBUS
+ help
+ This option enables the PCIe port driver to export sysfs entries
+ for Inter switch P2P links detected on the PCIe upstream ports.
+ This option enables user space libraries to detect optimal paths
+ for data transfers between endpoints connected to PCIe switches.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 53ccab62314d..d1e71698cbd8 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
obj-$(CONFIG_PCIE_PTM) += ptm.o
obj-$(CONFIG_PCIE_EDR) += edr.o
+obj-$(CONFIG_PCIE_P2P_LINK) += p2p_link.o
diff --git a/drivers/pci/pcie/p2p_link.c b/drivers/pci/pcie/p2p_link.c
new file mode 100644
index 000000000000..dec5c4cbcf13
--- /dev/null
+++ b/drivers/pci/pcie/p2p_link.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Purpose: PCI Express P2P link discovery
+ *
+ * Copyright (C) 2024 Broadcom Inc.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/bitops.h>
+
+#include "../pci.h"
+#include "portdrv.h"
+#include "p2p_link.h"
+
+/**
+ * pcie_brcm_is_p2p_supported - Broadcom device specific handler
+ * to check if the upstream port supports inter switch P2P.
+ *
+ * @dev: PCIe upstream port to check
+ *
+ * This function assumes the PCIe upstream port is a Broadcom
+ * PCIe device.
+ */
+static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)
+{
+ u64 dsn;
+ u16 vsec;
+ u32 vsec_data;
+
+ vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
+ PCIE_BRCM_SW_P2P_VSEC_ID);
+ if (!vsec) {
+ pci_dbg(dev, "Failed to get VSEC capability\n");
+ return false;
+ }
+
+ pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
+ &vsec_data);
+
+ dsn = pci_get_dsn(dev);
+ if (!dsn) {
+ pci_dbg(dev, "DSN capability is not present\n");
+ return false;
+ }
+
+ pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
+ dsn, vsec_data);
+
+ /* Check if the PEX switch has a valid P2P support */
+ if (!(dsn & PCIE_BRCM_SW_DSN_P2P_STATUS))
+ return false;
+
+ return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) ==
+ PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK;
+}
+
+/*
+ * Determine if device supports Inter switch P2P links.
+ *
+ * Return value: true if inter switch P2P is supported, return false otherwise.
+ */
+static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
+{
+ /* P2P link attribute is supported on upstream ports only */
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
+ return false;
+
+ /*
+ * Currently Broadcom PEX switches are supported.
+ */
+ if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
+ (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
+ dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
+ return pcie_brcm_is_p2p_supported(dev);
+
+ return false;
+}
+
+/*
+ * Traverse list of all PCI bridges and find devices that support Inter switch P2P
+ * and have the same serial number to create report the BDF over sysfs.
+ */
+static ssize_t links_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
+ size_t len = 0;
+ u64 dsn, dsn_link;
+
+ /*
+ * pdev's DSN has already been verified to be available before creating
+ * the sysfs entry.
+ */
+ dsn = pci_get_dsn(pdev);
+
+ /* Traverse list of PCI bridges to determine any available P2P links */
+ while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link))
+ != NULL) {
+ if (pdev_link == pdev)
+ continue;
+
+ if (!pcie_port_is_p2p_supported(pdev_link))
+ continue;
+
+ dsn_link = pci_get_dsn(pdev_link);
+ if (!dsn_link)
+ continue;
+
+ if (dsn == dsn_link)
+ len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n",
+ pci_domain_nr(pdev_link->bus),
+ pdev_link->bus->number, PCI_SLOT(pdev_link->devfn),
+ PCI_FUNC(pdev_link->devfn));
+ }
+
+ return len;
+}
+
+/* P2P link sysfs attribute. */
+static struct device_attribute dev_attr_links =
+ __ATTR(links, 0444, links_show, NULL);
+
+static struct attribute *pcie_port_p2p_link_attrs[] = {
+ &dev_attr_links.attr,
+ NULL
+};
+
+const struct attribute_group pcie_port_p2p_link_attr_group = {
+ .name = "p2p_link",
+ .attrs = pcie_port_p2p_link_attrs,
+};
+
+void p2p_link_sysfs_update_group(struct pci_dev *pdev)
+{
+ if (!pcie_port_is_p2p_supported(pdev))
+ return;
+
+ sysfs_update_group(&pdev->dev.kobj, &pcie_port_p2p_link_attr_group);
+}
diff --git a/drivers/pci/pcie/p2p_link.h b/drivers/pci/pcie/p2p_link.h
new file mode 100644
index 000000000000..6c4f57841c79
--- /dev/null
+++ b/drivers/pci/pcie/p2p_link.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Purpose: PCI Express P2P link discovery
+ *
+ * Copyright (C) 2024 Broadcom Inc.
+ */
+
+#ifndef _P2P_LINK_H_
+#define _P2P_LINK_H_
+
+/* P2P Link supported device IDs */
+#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC 0xC030
+#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC 0xC034
+
+#define PCIE_BRCM_SW_P2P_VSEC_ID 0x1
+#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET 0xC
+#define PCIE_BRCM_SW_P2P_MODE_MASK GENMASK(9, 8)
+#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK 0x2
+#define PCIE_BRCM_SW_DSN_P2P_STATUS BIT(3)
+
+#ifdef CONFIG_PCIE_P2P_LINK
+void p2p_link_sysfs_update_group(struct pci_dev *pdev);
+
+#else
+static inline void p2p_link_sysfs_update_group(struct pci_dev *pdev) { }
+#endif
+#endif /* _P2P_LINK_H_ */
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 5e10306b6308..f4ddff78e104 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -21,6 +21,7 @@
#include "../pci.h"
#include "portdrv.h"
+#include "p2p_link.h"
/*
* The PCIe Capability Interrupt Message Number (PCIe r3.1, sec 7.8.2) must
@@ -714,7 +715,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
pm_runtime_put_autosuspend(&dev->dev);
pm_runtime_allow(&dev->dev);
}
-
+#ifdef CONFIG_PCIE_P2P_LINK
+ p2p_link_sysfs_update_group(dev);
+#endif
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v3 2/2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
2024-12-11 7:17 [PATCH v3 0/2] PCI/portdrv: Report inter switch P2P links through sysfs Shivasharan S
2024-12-11 7:17 ` [PATCH v3 1/2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
@ 2024-12-11 7:17 ` Shivasharan S
2024-12-12 5:04 ` Christoph Hellwig
1 sibling, 1 reply; 4+ messages in thread
From: Shivasharan S @ 2024-12-11 7:17 UTC (permalink / raw)
To: linux-pci, bhelgaas, manivannan.sadhasivam, logang,
Jonathan.Cameron
Cc: linux-kernel, sumanesh.samanta, sathya.prakash, sjeaugey,
Shivasharan S
Update the p2p_dma_distance() to determine inter-switch P2P links existing
between two switches and use this to calculate the DMA distance between
two devices. This requires enabling the PCIE_P2P_LINK config option in
the kernel.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
---
Changes in v3:
Fixed review comments from Jonathan
drivers/pci/p2pdma.c | 18 +++++++++++++++++-
drivers/pci/pcie/p2p_link.c | 18 ++++++++++++++++++
drivers/pci/pcie/p2p_link.h | 5 +++++
3 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 7abd4f546d3c..9482bf0b1a02 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,6 +20,7 @@
#include <linux/random.h>
#include <linux/seq_buf.h>
#include <linux/xarray.h>
+#include "pcie/p2p_link.h"
struct pci_p2pdma {
struct gen_pool *pool;
@@ -576,7 +577,7 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
int *dist, bool verbose)
{
enum pci_p2pdma_map_type map_type = PCI_P2PDMA_MAP_THRU_HOST_BRIDGE;
- struct pci_dev *a = provider, *b = client, *bb;
+ struct pci_dev *a = provider, *b = client, *bb, *b_p2p_link = NULL;
bool acs_redirects = false;
struct pci_p2pdma *p2pdma;
struct seq_buf acs_list;
@@ -606,6 +607,18 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
if (a == bb)
goto check_b_path_acs;
+#ifdef CONFIG_PCIE_P2P_LINK
+ /*
+ * If both upstream bridges have Inter switch P2P link
+ * available, P2P DMA distance can account for optimized
+ * path.
+ */
+ if (pcie_port_is_p2p_link_available(a, bb)) {
+ b_p2p_link = bb;
+ goto check_b_path_acs;
+ }
+#endif
+
bb = pci_upstream_bridge(bb);
dist_b++;
}
@@ -629,6 +642,9 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
acs_cnt++;
}
+ if (bb == b_p2p_link)
+ break;
+
bb = pci_upstream_bridge(bb);
}
diff --git a/drivers/pci/pcie/p2p_link.c b/drivers/pci/pcie/p2p_link.c
index dec5c4cbcf13..87651dfa1981 100644
--- a/drivers/pci/pcie/p2p_link.c
+++ b/drivers/pci/pcie/p2p_link.c
@@ -141,3 +141,21 @@ void p2p_link_sysfs_update_group(struct pci_dev *pdev)
sysfs_update_group(&pdev->dev.kobj, &pcie_port_p2p_link_attr_group);
}
+
+/*
+ * pcie_port_is_p2p_link_available: Determine if a P2P link is available
+ * between the two upstream bridges. The serial number of the two devices
+ * will be compared and if they are same then it is considered that the P2P
+ * link is available.
+ *
+ * Return value: true if inter switch P2P is available, return false otherwise.
+ */
+bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b)
+{
+ if (!pcie_port_is_p2p_supported(a) || !pcie_port_is_p2p_supported(b))
+ return false;
+
+ /* the above check validates DSN is valid for both devices */
+ return pci_get_dsn(a) == pci_get_dsn(b);
+}
+EXPORT_SYMBOL_GPL(pcie_port_is_p2p_link_available);
diff --git a/drivers/pci/pcie/p2p_link.h b/drivers/pci/pcie/p2p_link.h
index 6c4f57841c79..6677ed66f397 100644
--- a/drivers/pci/pcie/p2p_link.h
+++ b/drivers/pci/pcie/p2p_link.h
@@ -21,7 +21,12 @@
#ifdef CONFIG_PCIE_P2P_LINK
void p2p_link_sysfs_update_group(struct pci_dev *pdev);
+bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b);
#else
static inline void p2p_link_sysfs_update_group(struct pci_dev *pdev) { }
+static inline bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b)
+{
+ return false;
+}
#endif
#endif /* _P2P_LINK_H_ */
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v3 2/2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
2024-12-11 7:17 ` [PATCH v3 2/2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
@ 2024-12-12 5:04 ` Christoph Hellwig
0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2024-12-12 5:04 UTC (permalink / raw)
To: Shivasharan S
Cc: linux-pci, bhelgaas, manivannan.sadhasivam, logang,
Jonathan.Cameron, linux-kernel, sumanesh.samanta, sathya.prakash,
sjeaugey
On Tue, Dec 10, 2024 at 11:17:48PM -0800, Shivasharan S wrote:
> Update the p2p_dma_distance() to determine inter-switch P2P links existing
> between two switches and use this to calculate the DMA distance between
> two devices. This requires enabling the PCIE_P2P_LINK config option in
> the kernel.
What the heck are "P2P links supposed to be. And why shoud Linux
support something non-standard like this?
NAK for these hacks to the core code unless you can get a vendor
indpendent spec for it.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-12-12 5:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-11 7:17 [PATCH v3 0/2] PCI/portdrv: Report inter switch P2P links through sysfs Shivasharan S
2024-12-11 7:17 ` [PATCH v3 1/2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
2024-12-11 7:17 ` [PATCH v3 2/2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
2024-12-12 5:04 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox