linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v2] PCI/portdrv: Report inter switch P2P links through sysfs
@ 2024-09-19  8:13 Shivasharan S
  2024-09-19  8:13 ` [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
  2024-09-19  8:13 ` [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
  0 siblings, 2 replies; 12+ messages in thread
From: Shivasharan S @ 2024-09-19  8:13 UTC (permalink / raw)
  To: linux-pci, bhelgaas, manivannan.sadhasivam, logang
  Cc: linux-kernel, sumanesh.samanta, sathya.prakash, sjeaugey,
	Shivasharan S

Changes done in v2:
The previous submission of this series was at [1].
As per the feedback received from Mani, the code is moved to PCI portdrv
to create the sysfs entries instead of having a separate kernel module.

A. Introductory definitions:

Virtual Switch: Broadcom(PLX) switches have a capability where a single
physical switch can be divided up into N number of virtual switches at
start of day. For example, a single physical switch with 64 ports can be
configured to appear to the host as 2 switches with 32 ports each. This is
a static configuration that needs to be done before the switch boots, and
cannot generally be changed on the fly. Now consider a GPU in Virtual
switch 1 and a NIC on Virtual switch 2. The key here is that it's actually
the same switch, and IF P2P is enabled between the two virtual switches,
then that would be almost infinite bandwidth between the GPU and the NIC.
However, today there is no way for the host to know that, and host
applications believe that any data exchange between the GPU and NIC must
go through host root port and thus would be slow.
Note: Any such P2P must follow ACS/IOMMU rules, and has to be enabled in
the Broadcom switches.

Inter Switch Link: While the current use-case is about the virtual switch
config above, this could also extend to physical switch, where the two
physical switches have, say, a x16 PCIe connection between them.

B: Goal/Problem statement:

Goal 1: Summary: Provide user applications a means by which they can
discover two virtual switches to be part of the same physical switch or
when physical switches are physically connected to each other, so that
they can discover optimized data path for HPC/AI applications.

With the rapid progression of High Performance Computing (HPC) and
Artificial Intelligence (AI), it is becoming more and more common to have
complex topologies with multiple GPU, NIC, NVMe devices etc interconnected
using multiple switches. HPC and AI libraries like MPI, UCC, NCCL, RCCL,
HCCL etc analyze this topology to build a topology tree to optimize data
path for collective operations like all-reduce etc.

Example:

                             Host root bridge
                ---------------------------------------
                |                  |                  |
  NIC1 --- PCI Switch1        PCI Switch2        PCI Switch3 --- NIC2
                |                  |                  |
               GPU1 ------------- GPU2 ------------- GPU3

                               SERVER 1

In the simple picture above in Server1, Switch1, Switch2, Switch3
are all connected to the host bridge and each switch has a GPU
connected, and Switch1/3 each has a NIC connected.
In a typical AI setup, there are many such servers, each connected by
upper level network switch, and "rail optimized", ie, NIC1 of all
servers are connected to Ethernet Switch1, NIC2 connected to Ethernet
Switch2 etc (Ethernet switches are not shown in picture above)
The GPUs are connected among themselves by some backend fabric, like
NVLINK (NVIDIA).
Assume that in the above diagram, PCI Switch1  and PCI Switch3 are
virtual switches belonging to the same physical switch and thus a very
high speed data link exists between them, but today host applications
have no knowledge about that.
(This is a very simple example, and modern AI infrastructure can be
way more complex than that.)

Now for collective operations like all-reduce, the HPC/AI libraries
analyze the topology above and typically decide on a data path like
this: NIC1->GPU1->GPU2->GPU3-> NIC2 which is suboptimal, because
ideally data should come go in and out through the same NIC because of
"rail optimized" topology.
Some libraries do this:NIC1->GPU1->GPU2->GPU3-> GPU1->NIC1.
The applications do the above because they think data from GPU3 to
NIC1  needs to go through the host root port, which is very
inefficient. What they do not know is that Switch1 and Switch3 are the
same physical entity with virtually infinite bandwidth between them,
and with that, they would have chosen a path like:
NIC1->GPU1->GPU2->GPU3->NIC1, which is the most optimized in the above
example.

Goal 2: Extend Linux P2PDMA distance function pci_p2pdma_distance to
account for Virtual Switch and physical switches connected by inter
switch link. The current implementation of the function has no
knowledge of Virtual switch and inter switch link.
Consider the example below:

     -+  Root Port
      \+ Switch1 Upstream Port
         +-+ Switch1 Downstream Port 0
          \- Device A
      \+ Switch2 Upstream Port
         +-+ Switch2 Downstream Port 0
           \- Device B

Suppose Switch1 and Switch2 are virtual switches belonging  to the
same physical switch. Today P2PDMA distance between Device A and
Device B  will return PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, as kernel has
no idea that switch1 and switch2 are actually physically connected to
each other. We intend to fix that, so that pci_p2pdma_distance now takes
into account switch connectivity information.

C. FAQs

FAQ 1:  How does this feature work with ACS/IOMMU?
This feature does NOT add any new connectivity.  The inter-switch
/virtual switch connections already follow all ACS/IOMMU rules, and
only if allowed by ACS settings, they allow for data to follow a
shortcut connection between switches and bypass the root port. The
only thing this patch does is provide the switch connection
information to application software and pci_p2pdma_distance clients,
so that they can make intelligent decisions for the data path.

FAQ 2:  Is this feature Broadcom specific and will it work for other
vendors?
The current implementation of the patch looks at Broadcom
Vendor specific extensions to determine if switch p2p is enabled.
Thus, the current implementation works only on Broadcom switches. That
being said, other vendors are free to extend/modify the code to
support their switch. The function names, code structure and sysfs path
that exposes the PCI switch p2p is made generic, to allow for extension of
support to other vendors. All broadcom specific functionality is segregated
into a Broadcom specific function.

FAQ 3: Why can't applications read the Broadcom vendor specific
information directly from the config space? Why do we need the sysfs
path?
The vendor specific section of PCIe config space is not readable by
applications running in non-root mode, as such applications can only
read the first few bytes of the config space. Besides, reading the
vendor specific config space will not make the solution generic.

FAQ 4: Will applications still use the standard P2P model of
registering the provider, client etc?
Absolutely. All existing p2p API will work as is. All that this patch
provides is information that a fast connection exists between switches
and/or PCI endpoints. To make the actual p2p DMA, application need
use existing p2p API and follow existing ACS/IOMMU rules

FAQ 5: Why can't we only modify the existing pci_p2pdma_distance
function, and expose a p2pdistance to userspace? Why do we need the
new sysfs entries for pci switch connectivity?
The existing HPC/AI libraries like MPI, UCC, NCCL, RCCL, HCCL etc work
not only with PCIe switches, but also with other kind of connectivity,
like TCP, network switches, infiniband and backend inter GPU
connectivity like NVLINK and AFL. Because of that, the libraries have
matured code that analyzes all the connections and entire topology to
determine the most optimal data path among nodes. Just using
pci_p2pdma_distance does not work for them, because there might be a
shorter path between two nodes using NVLINK or a network switch.  In
theory those libraries could be modified to use pci_p2pdma_distance
for PCIe connection and other method for other connection, but in
practice that is near impossible, as those changes are very intrusive
and those libraries have matured for a long time,. Their respective
maintainers are highly reluctant to make such a big change and rather
get only the missing information, that is whether two switches are
connected together. Broadcom has received such first hand feedback.
Forcing everyone to use p2pdistance only will defeat the whole purpose
of this patch. However, we do want to support those libraries that
want to use pci_p2pdma_distance, and that is why we are extending
pci_p2pdma_distance function too. Thus, our goal here is to enable
existing libraries to get only the information they need, while having
means for new code or more flexible code to use pci_p2pdma_distance as
needed.

[1] https://lore.kernel.org/linux-pci/1718191656-32714-1-git-send-email-shivasharan.srikanteshwara@broadcom.com/

Shivasharan S (2):
  PCI/portdrv: Enable reporting inter-switch P2P links
  PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links

 Documentation/ABI/testing/sysfs-bus-pci |  14 ++
 drivers/pci/p2pdma.c                    |  15 ++-
 drivers/pci/pcie/portdrv.c              | 165 ++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h              |  12 ++
 4 files changed, 205 insertions(+), 1 deletion(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-09-19  8:13 [PATCH 0/2 v2] PCI/portdrv: Report inter switch P2P links through sysfs Shivasharan S
@ 2024-09-19  8:13 ` Shivasharan S
  2024-09-20  4:28   ` kernel test robot
  2024-09-24 14:57   ` Jonathan Cameron
  2024-09-19  8:13 ` [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
  1 sibling, 2 replies; 12+ messages in thread
From: Shivasharan S @ 2024-09-19  8:13 UTC (permalink / raw)
  To: linux-pci, bhelgaas, manivannan.sadhasivam, logang
  Cc: linux-kernel, sumanesh.samanta, sathya.prakash, sjeaugey,
	Shivasharan S

Broadcom PCI switches supports inter-switch P2P links between two
PCI-to-PCI bridges. This presents an optimal data path for data
movement. The patch exports a new sysfs entry for PCI devices that
support the inter switch P2P links and reports the B:D:F information
of the devices that are connected through this inter switch link as
shown below:

                             Host root bridge
                ---------------------------------------
                |                                     |
  NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2
(2c:00.0)   (2a:00.0)                             (3d:00.0)   (40:00.0)
                |                                     |
               GPU1                                  GPU2
            (2d:00.0)                             (3f:00.0)
                               SERVER 1

$ find /sys/ -name "p2p_link" | xargs grep .
/sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0
/sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0

Current support is added to report the P2P links available for
Broadcom switches based on the capability that is reported by the
upstream bridges through their vendor-specific capability registers.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
---
Changes in v2:
Integrated the code into PCIe portdrv to create the sysfs entries during
probe, as suggested by Mani.

 Documentation/ABI/testing/sysfs-bus-pci |  14 +++
 drivers/pci/pcie/portdrv.c              | 131 ++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h              |  10 ++
 3 files changed, 155 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index ecf47559f495..e5d02f20655f 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -572,3 +572,17 @@ Description:
 		enclosure-specific indications "specific0" to "specific7",
 		hence the corresponding led class devices are unavailable if
 		the DSM interface is used.
+
+What:		/sys/bus/pci/devices/.../p2p_link
+Date:		September 2024
+Contact:	Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
+Description:
+		This file appears on PCIe upstream ports which supports an
+		internal P2P link.
+		Reading this attribute will provide the list of other upstream
+		ports on the system which have an internal P2P link available
+		between the two ports.
+Users:
+		Userspace applications interested in determining a optimal P2P
+		link for data transfers between devices connected to the PCIe
+		switches.
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 6af5e0425872..c940b4b242fd 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/slab.h>
 #include <linux/aer.h>
+#include <linux/bitops.h>
 
 #include "../pci.h"
 #include "portdrv.h"
@@ -37,6 +38,134 @@ struct portdrv_service_data {
 	u32 service;
 };
 
+/**
+ * pcie_brcm_is_p2p_supported - Broadcom device specific handler
+ *       to check if the upstream port supports inter switch P2P.
+ *
+ * @dev: PCIe upstream port to check
+ *
+ * This function assumes the PCIe upstream port is a Broadcom
+ * PCIe device.
+ */
+static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)
+{
+	u64 dsn;
+	u16 vsec;
+	u32 vsec_data;
+
+	dsn = pci_get_dsn(dev);
+	if (!dsn) {
+		pci_dbg(dev, "DSN capability is not present\n");
+		return false;
+	}
+
+	vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
+					PCIE_BRCM_SW_P2P_VSEC_ID);
+	if (!vsec) {
+		pci_dbg(dev, "Failed to get VSEC capability\n");
+		return false;
+	}
+
+	pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
+			      &vsec_data);
+
+	pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
+		dsn, vsec_data);
+
+	if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn))
+		return false;
+
+	if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) !=
+		PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK)
+		return false;
+
+	return true;
+}
+
+/**
+ * Determine if device supports Inter switch P2P links.
+ *
+ * Return value: true if inter switch P2P is supported, return false otherwise.
+ */
+static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
+{
+	/* P2P link attribute is supported on upstream ports only */
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
+		return false;
+
+	/*
+	 * Currently Broadcom PEX switches are supported.
+	 */
+	if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
+	    (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
+	     dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
+		return pcie_brcm_is_p2p_supported(dev);
+
+	return false;
+}
+
+/*
+ * Traverse list of all PCI bridges and find devices that support Inter switch P2P
+ * and have the same serial number to create report the BDF over sysfs.
+ */
+static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
+	size_t len = 0;
+	u64 dsn, dsn_link;
+
+	dsn = pci_get_dsn(pdev);
+
+	/* Traverse list of PCI bridges to determine any available P2P links */
+	while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link))
+			!= NULL) {
+		if (pdev_link == pdev)
+			continue;
+
+		if (!pcie_port_is_p2p_supported(pdev_link))
+			continue;
+
+		dsn_link = pci_get_dsn(pdev_link);
+		if (!dsn_link)
+			continue;
+
+		if (dsn == dsn_link)
+			len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n",
+					     pci_domain_nr(pdev_link->bus),
+					     pdev_link->bus->number, PCI_SLOT(pdev_link->devfn),
+					     PCI_FUNC(pdev_link->devfn));
+	}
+
+	return len;
+}
+
+/* P2P link sysfs attribute. */
+static struct device_attribute dev_attr_p2p_link =
+	__ATTR(p2p_link, 0444, p2p_link_show, NULL);
+
+static struct attribute *pcie_port_p2p_link_attrs[] = {
+	&dev_attr_p2p_link.attr,
+	NULL
+};
+
+static umode_t pcie_port_p2p_link_attrs_is_visible(struct kobject *kobj,
+						   struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (!pcie_port_is_p2p_supported(pdev))
+		return 0;
+
+	return a->mode;
+}
+
+const struct attribute_group pcie_port_p2p_link_attr_group = {
+	.attrs = pcie_port_p2p_link_attrs,
+	.is_visible = pcie_port_p2p_link_attrs_is_visible,
+};
+
 /**
  * release_pcie_device - free PCI Express port service device structure
  * @dev: Port service device to release
@@ -715,6 +844,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 		pm_runtime_allow(&dev->dev);
 	}
 
+	sysfs_update_group(&dev->dev.kobj, &pcie_port_p2p_link_attr_group);
+
 	return 0;
 }
 
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 12c89ea0313b..1be06cb45665 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -25,6 +25,16 @@
 
 #define PCIE_PORT_DEVICE_MAXSERVICES   5
 
+/* P2P Link supported device IDs */
+#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC	0xC030
+#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC	0xC034
+
+#define PCIE_BRCM_SW_P2P_VSEC_ID		0x1
+#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET	0xC
+#define PCIE_BRCM_SW_P2P_MODE_MASK		GENMASK(9, 8)
+#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK	0x2
+#define PCIE_BRCM_SW_IS_SECURE_PART(dsn)	((dsn) & 0x8)
+
 extern bool pcie_ports_dpc_native;
 
 #ifdef CONFIG_PCIEAER
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
  2024-09-19  8:13 [PATCH 0/2 v2] PCI/portdrv: Report inter switch P2P links through sysfs Shivasharan S
  2024-09-19  8:13 ` [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
@ 2024-09-19  8:13 ` Shivasharan S
  2024-09-20  5:51   ` kernel test robot
  2024-09-24 15:08   ` Jonathan Cameron
  1 sibling, 2 replies; 12+ messages in thread
From: Shivasharan S @ 2024-09-19  8:13 UTC (permalink / raw)
  To: linux-pci, bhelgaas, manivannan.sadhasivam, logang
  Cc: linux-kernel, sumanesh.samanta, sathya.prakash, sjeaugey,
	Shivasharan S

Update the p2p_dma_distance() to determine inter-switch P2P links existing
between two switches and use this to calculate the DMA distance between
two devices.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
---
 drivers/pci/p2pdma.c       | 17 ++++++++++++++++-
 drivers/pci/pcie/portdrv.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h |  2 ++
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 4f47a13cb500..eed3b69e7293 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -21,6 +21,8 @@
 #include <linux/seq_buf.h>
 #include <linux/xarray.h>
 
+extern bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b);
+
 struct pci_p2pdma {
 	struct gen_pool *pool;
 	bool p2pmem_published;
@@ -576,7 +578,7 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
 		int *dist, bool verbose)
 {
 	enum pci_p2pdma_map_type map_type = PCI_P2PDMA_MAP_THRU_HOST_BRIDGE;
-	struct pci_dev *a = provider, *b = client, *bb;
+	struct pci_dev *a = provider, *b = client, *bb, *b_p2p_link = NULL;
 	bool acs_redirects = false;
 	struct pci_p2pdma *p2pdma;
 	struct seq_buf acs_list;
@@ -606,6 +608,16 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
 			if (a == bb)
 				goto check_b_path_acs;
 
+			/*
+			 * If both upstream bridges have Inter switch P2P link
+			 * available, P2P DMA distance can account for optimized
+			 * path.
+			 */
+			if (pcie_port_is_p2p_link_available(a, bb)) {
+				b_p2p_link = bb;
+				goto check_b_path_acs;
+			}
+
 			bb = pci_upstream_bridge(bb);
 			dist_b++;
 		}
@@ -629,6 +641,9 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
 			acs_cnt++;
 		}
 
+		if (bb == b_p2p_link)
+			break;
+
 		bb = pci_upstream_bridge(bb);
 	}
 
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index c940b4b242fd..2fe1598fc684 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -104,6 +104,40 @@ static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
 	return false;
 }
 
+/**
+ * pcie_port_is_p2p_link_available: Determine if a P2P link is available
+ * between the two upstream bridges. The serial number of the two devices
+ * will be compared and if they are same then it is considered that the P2P
+ * link is available.
+ *
+ * Return value: true if inter switch P2P is available, return false otherwise.
+ */
+bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b)
+{
+	u64 dsn_a, dsn_b;
+
+	/*
+	 * Check if the devices support Inter switch P2P.
+	 */
+	if (!pcie_port_is_p2p_supported(a) ||
+	    !pcie_port_is_p2p_supported(b))
+		return false;
+
+	dsn_a = pci_get_dsn(a);
+	if (!dsn_a)
+		return false;
+
+	dsn_b = pci_get_dsn(b);
+	if (!dsn_b)
+		return false;
+
+	if (dsn_a == dsn_b)
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(pcie_port_is_p2p_link_available);
+
 /*
  * Traverse list of all PCI bridges and find devices that support Inter switch P2P
  * and have the same serial number to create report the BDF over sysfs.
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 1be06cb45665..b341aad6eb49 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -130,5 +130,7 @@ static inline bool pcie_pme_no_msi(void) { return false; }
 static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
 #endif /* !CONFIG_PCIE_PME */
 
+bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b);
+
 struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
 #endif /* _PORTDRV_H_ */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-09-19  8:13 ` [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
@ 2024-09-20  4:28   ` kernel test robot
  2024-09-24 14:57   ` Jonathan Cameron
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2024-09-20  4:28 UTC (permalink / raw)
  To: Shivasharan S, linux-pci, bhelgaas, manivannan.sadhasivam, logang
  Cc: llvm, oe-kbuild-all, linux-kernel, sumanesh.samanta,
	sathya.prakash, sjeaugey, Shivasharan S

Hi Shivasharan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on pci/next]
[also build test WARNING on next-20240919]
[cannot apply to pci/for-linus linus/master v6.11]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shivasharan-S/PCI-portdrv-Enable-reporting-inter-switch-P2P-links/20240919-162626
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/1726733624-2142-2-git-send-email-shivasharan.srikanteshwara%40broadcom.com
patch subject: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20240920/202409201219.feYAxGor-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240920/202409201219.feYAxGor-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409201219.feYAxGor-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/pci/pcie/portdrv.c:86: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Determine if device supports Inter switch P2P links.


vim +86 drivers/pci/pcie/portdrv.c

    84	
    85	/**
  > 86	 * Determine if device supports Inter switch P2P links.
    87	 *
    88	 * Return value: true if inter switch P2P is supported, return false otherwise.
    89	 */
    90	static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
    91	{
    92		/* P2P link attribute is supported on upstream ports only */
    93		if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
    94			return false;
    95	
    96		/*
    97		 * Currently Broadcom PEX switches are supported.
    98		 */
    99		if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
   100		    (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
   101		     dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
   102			return pcie_brcm_is_p2p_supported(dev);
   103	
   104		return false;
   105	}
   106	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
  2024-09-19  8:13 ` [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
@ 2024-09-20  5:51   ` kernel test robot
  2024-09-24 15:08   ` Jonathan Cameron
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2024-09-20  5:51 UTC (permalink / raw)
  To: Shivasharan S, linux-pci, bhelgaas, manivannan.sadhasivam, logang
  Cc: llvm, oe-kbuild-all, linux-kernel, sumanesh.samanta,
	sathya.prakash, sjeaugey, Shivasharan S

Hi Shivasharan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on pci/next]
[also build test WARNING on next-20240919]
[cannot apply to pci/for-linus linus/master v6.11]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shivasharan-S/PCI-portdrv-Enable-reporting-inter-switch-P2P-links/20240919-162626
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/1726733624-2142-3-git-send-email-shivasharan.srikanteshwara%40broadcom.com
patch subject: [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20240920/202409201333.LCx5k41f-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240920/202409201333.LCx5k41f-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409201333.LCx5k41f-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/pci/pcie/portdrv.c:86: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Determine if device supports Inter switch P2P links.
>> drivers/pci/pcie/portdrv.c:116: warning: Function parameter or struct member 'a' not described in 'pcie_port_is_p2p_link_available'
>> drivers/pci/pcie/portdrv.c:116: warning: Function parameter or struct member 'b' not described in 'pcie_port_is_p2p_link_available'


vim +116 drivers/pci/pcie/portdrv.c

   106	
   107	/**
   108	 * pcie_port_is_p2p_link_available: Determine if a P2P link is available
   109	 * between the two upstream bridges. The serial number of the two devices
   110	 * will be compared and if they are same then it is considered that the P2P
   111	 * link is available.
   112	 *
   113	 * Return value: true if inter switch P2P is available, return false otherwise.
   114	 */
   115	bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b)
 > 116	{
   117		u64 dsn_a, dsn_b;
   118	
   119		/*
   120		 * Check if the devices support Inter switch P2P.
   121		 */
   122		if (!pcie_port_is_p2p_supported(a) ||
   123		    !pcie_port_is_p2p_supported(b))
   124			return false;
   125	
   126		dsn_a = pci_get_dsn(a);
   127		if (!dsn_a)
   128			return false;
   129	
   130		dsn_b = pci_get_dsn(b);
   131		if (!dsn_b)
   132			return false;
   133	
   134		if (dsn_a == dsn_b)
   135			return true;
   136	
   137		return false;
   138	}
   139	EXPORT_SYMBOL_GPL(pcie_port_is_p2p_link_available);
   140	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-09-19  8:13 ` [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
  2024-09-20  4:28   ` kernel test robot
@ 2024-09-24 14:57   ` Jonathan Cameron
  2024-10-03 20:41     ` Sumanesh Samanta
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2024-09-24 14:57 UTC (permalink / raw)
  To: Shivasharan S
  Cc: linux-pci, bhelgaas, manivannan.sadhasivam, logang, linux-kernel,
	sumanesh.samanta, sathya.prakash, sjeaugey

On Thu, 19 Sep 2024 01:13:43 -0700
Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote:

> Broadcom PCI switches supports inter-switch P2P links between two
> PCI-to-PCI bridges. This presents an optimal data path for data
> movement. The patch exports a new sysfs entry for PCI devices that
> support the inter switch P2P links and reports the B:D:F information
> of the devices that are connected through this inter switch link as
> shown below:
> 
>                              Host root bridge
>                 ---------------------------------------
>                 |                                     |
>   NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2
> (2c:00.0)   (2a:00.0)                             (3d:00.0)   (40:00.0)
>                 |                                     |
>                GPU1                                  GPU2
>             (2d:00.0)                             (3f:00.0)
>                                SERVER 1
> 
> $ find /sys/ -name "p2p_link" | xargs grep .
> /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0
> /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0
> 
> Current support is added to report the P2P links available for
> Broadcom switches based on the capability that is reported by the
> upstream bridges through their vendor-specific capability registers.
> 
> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
> ---
> Changes in v2:
> Integrated the code into PCIe portdrv to create the sysfs entries during
> probe, as suggested by Mani.

Hmm. So we are trying to rework portdrv in general to support extensibility.
I'm a little nervous about taking in vendor specific code in the meantime
even if it is trivial like this is.  We will be having an extensible
discovery scheme but for now the plan is that will be child device based
for non PCI standard features.

It is a fairly small bit of code, so maybe it is fine - I'm not keen
on adding the implementation of the vendor specific parts to the
main driver though. Push that to an optional c file.

A few general comments inline.

> 
>  Documentation/ABI/testing/sysfs-bus-pci |  14 +++
>  drivers/pci/pcie/portdrv.c              | 131 ++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h              |  10 ++
>  3 files changed, 155 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index ecf47559f495..e5d02f20655f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -572,3 +572,17 @@ Description:
>  		enclosure-specific indications "specific0" to "specific7",
>  		hence the corresponding led class devices are unavailable if
>  		the DSM interface is used.
> +
> +What:		/sys/bus/pci/devices/.../p2p_link
> +Date:		September 2024
> +Contact:	Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> +Description:
> +		This file appears on PCIe upstream ports which supports an
> +		internal P2P link.
> +		Reading this attribute will provide the list of other upstream
> +		ports on the system which have an internal P2P link available
> +		between the two ports.

Given this only applies to 'internal' links and not inter switch physical links
I think you should rename it.  An eventual p2p link between physical switches
is going to be much more complex as will need routing information.
Let us avoid trampling on that space.

> +Users:
> +		Userspace applications interested in determining a optimal P2P
> +		link for data transfers between devices connected to the PCIe
> +		switches.

Need more data that 'there is a link' for this.
I'd like to see some info on bandwidth and latency. I've previously been
in discussions about similar devices that provide a low latency but low
bandwidth direct path.  That gets more likely if we scale up to
multiple physical switches or the software stack is choosing between
multiple p2p targets (e.g. getting nearest path to a multiheaded NIC).

Perhaps best bet is to leave space for that by using a directory
here to cover everything about p2p?  Maybe have links under there to the
other upstream ports? That might be fiddly to manage though.

> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 6af5e0425872..c940b4b242fd 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -18,6 +18,7 @@
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/aer.h>
> +#include <linux/bitops.h>
>  
>  #include "../pci.h"
>  #include "portdrv.h"
> @@ -37,6 +38,134 @@ struct portdrv_service_data {
>  	u32 service;
>  };
>  
> +/**
> + * pcie_brcm_is_p2p_supported - Broadcom device specific handler
> + *       to check if the upstream port supports inter switch P2P.
> + *
> + * @dev: PCIe upstream port to check
> + *
> + * This function assumes the PCIe upstream port is a Broadcom
> + * PCIe device.
> + */
> +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)

Put this in a separate c file + use a config option and some
stubs to make it go away if people don't want to support it.
The layering and separation in portdrv is currently messy but
we shouldn't make it worse :(

> +{
> +	u64 dsn;
> +	u16 vsec;
> +	u32 vsec_data;
> +
> +	dsn = pci_get_dsn(dev);
> +	if (!dsn) {
> +		pci_dbg(dev, "DSN capability is not present\n");
> +		return false;
> +	}

Why get the dsn (which will frequently exist on other devices)
before getting the vsec which won't?  Reorder these first
two checks. For most devices the match on vendor will fail in the
vsec check.

> +
> +	vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
> +					PCIE_BRCM_SW_P2P_VSEC_ID);
> +	if (!vsec) {
> +		pci_dbg(dev, "Failed to get VSEC capability\n");
> +		return false;
> +	}
> +
> +	pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
> +			      &vsec_data);
> +
> +	pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
> +		dsn, vsec_data);
> +
> +	if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn))

Add a comment on this. Not obvious what this is checking as it's picking
a single bit out of a serial number...

> +		return false;
> +
> +	if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) !=
> +		PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK)
> +		return false;
> +
> +	return true;
	return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) ==
	       PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK;
perhaps.

> +}
> +
> +/**
> + * Determine if device supports Inter switch P2P links.
> + *
> + * Return value: true if inter switch P2P is supported, return false otherwise.
> + */
> +static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
> +{
> +	/* P2P link attribute is supported on upstream ports only */
> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> +		return false;
> +
> +	/*
> +	 * Currently Broadcom PEX switches are supported.
> +	 */
> +	if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
> +	    (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
> +	     dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
> +		return pcie_brcm_is_p2p_supported(dev);
> +
> +	return false;
> +}
> +
> +/*
> + * Traverse list of all PCI bridges and find devices that support Inter switch P2P
> + * and have the same serial number to create report the BDF over sysfs.
> + */
> +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
> +	size_t len = 0;
> +	u64 dsn, dsn_link;
> +
> +	dsn = pci_get_dsn(pdev);

Maybe add a comment that we don't need to repeat checks that were done
to make the attribute visible. Hence dsn will exist and it will be p2p link capable.

> +
> +	/* Traverse list of PCI bridges to determine any available P2P links */
> +	while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link))

Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev()
that does this, but that is already defined to mean something else...

Bjorn, is this something we should be looking to make more consistent
perhaps with naming to make it clear what is a search of all instances
on any bus and what is a search below a particular bus?

> +			!= NULL) {
> +		if (pdev_link == pdev)
> +			continue;
> +
> +		if (!pcie_port_is_p2p_supported(pdev_link))
> +			continue;
> +
> +		dsn_link = pci_get_dsn(pdev_link);
> +		if (!dsn_link)
> +			continue;
> +
> +		if (dsn == dsn_link)
> +			len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n",
> +					     pci_domain_nr(pdev_link->bus),
> +					     pdev_link->bus->number, PCI_SLOT(pdev_link->devfn),
> +					     PCI_FUNC(pdev_link->devfn));
> +	}
> +
> +	return len;
> +}


> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 12c89ea0313b..1be06cb45665 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -25,6 +25,16 @@
>  
>  #define PCIE_PORT_DEVICE_MAXSERVICES   5
>  
> +/* P2P Link supported device IDs */
> +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC	0xC030
> +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC	0xC034
> +
> +#define PCIE_BRCM_SW_P2P_VSEC_ID		0x1
> +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET	0xC
> +#define PCIE_BRCM_SW_P2P_MODE_MASK		GENMASK(9, 8)
> +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK	0x2
> +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn)	((dsn) & 0x8)
Define the mask, and use FIELD_GET() to get that.
> +
>  extern bool pcie_ports_dpc_native;
>  
>  #ifdef CONFIG_PCIEAER


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
  2024-09-19  8:13 ` [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
  2024-09-20  5:51   ` kernel test robot
@ 2024-09-24 15:08   ` Jonathan Cameron
  2024-10-14  9:44     ` Shivasharan Srikanteshwara
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2024-09-24 15:08 UTC (permalink / raw)
  To: Shivasharan S
  Cc: linux-pci, bhelgaas, manivannan.sadhasivam, logang, linux-kernel,
	sumanesh.samanta, sathya.prakash, sjeaugey

On Thu, 19 Sep 2024 01:13:44 -0700
Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote:

> Update the p2p_dma_distance() to determine inter-switch P2P links existing
> between two switches and use this to calculate the DMA distance between
> two devices.
> 
> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> ---
>  drivers/pci/p2pdma.c       | 17 ++++++++++++++++-
>  drivers/pci/pcie/portdrv.c | 34 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h |  2 ++
>  3 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 4f47a13cb500..eed3b69e7293 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -21,6 +21,8 @@
>  #include <linux/seq_buf.h>
>  #include <linux/xarray.h>
>  
> +extern bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b);

That's nasty.  Include the header so you get a clean stub if
this support is not built in etc.

> +

> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index c940b4b242fd..2fe1598fc684 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -104,6 +104,40 @@ static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
>  	return false;
>  }
>  
> +/**
> + * pcie_port_is_p2p_link_available: Determine if a P2P link is available
> + * between the two upstream bridges. The serial number of the two devices
> + * will be compared and if they are same then it is considered that the P2P
> + * link is available.
> + *
> + * Return value: true if inter switch P2P is available, return false otherwise.
> + */
> +bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b)
> +{
> +	u64 dsn_a, dsn_b;
> +
> +	/*
> +	 * Check if the devices support Inter switch P2P.
> +	 */

Single line comment syntax fine here.  However it's kind
of obvious, so I'd just drop the comment.


> +	if (!pcie_port_is_p2p_supported(a) ||
> +	    !pcie_port_is_p2p_supported(b))

Don't wrap this. I think it's under 80 chars anyway.

> +		return false;
> +
> +	dsn_a = pci_get_dsn(a);
> +	if (!dsn_a)
> +		return false;
If we assume that dsn is the only right way to detect this
(I'm fine with that for now) then we know the supported tests
above would only pass if this is true. Hence

return pci_get_dsn(a) == pci_get_dsn(b);

should be fine.

> +
> +	dsn_b = pci_get_dsn(b);
> +	if (!dsn_b)
> +		return false;
> +
> +	if (dsn_a == dsn_b)
> +		return true;

	return dsn_a == dsn_b;

> +
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(pcie_port_is_p2p_link_available);
> +
>  /*
>   * Traverse list of all PCI bridges and find devices that support Inter switch P2P
>   * and have the same serial number to create report the BDF over sysfs.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-09-24 14:57   ` Jonathan Cameron
@ 2024-10-03 20:41     ` Sumanesh Samanta
  2024-10-04 10:39       ` Jonathan Cameron
  0 siblings, 1 reply; 12+ messages in thread
From: Sumanesh Samanta @ 2024-10-03 20:41 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shivasharan S, linux-pci, bhelgaas, manivannan.sadhasivam, logang,
	linux-kernel, sathya.prakash, sjeaugey

[-- Attachment #1: Type: text/plain, Size: 12956 bytes --]

Hi Jonathan,

>> Need more data that 'there is a link' for this.
>>I'd like to see some info on bandwidth and latency.

As you too noted in your comments, for now, we are only addressing p2p
connection between "virtual switches", i.e. switches that look
different to the host, but are actually part of the same physical
hardware.
Given that, I am not sure what we should display for bandwidth and
latency. There is no physical link to traverse between the virtual
switches, and usually we take that as "infinite" bandwidth and "zero"
latency. As such, any number here will make little sense until we
start supporting p2p connection between physical switches. We could,
of course, have some encoding for the time being, like have "INF" for
bandwidth and 0 for latency, but again, those will not be very useful
till the day this scheme is extended to physical switch and we display
real values, like bandwidth and latency for a x16 PCIe link. Thoughts?

sincerely,
Sumanesh


On Tue, Sep 24, 2024 at 8:57 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 19 Sep 2024 01:13:43 -0700
> Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote:
>
> > Broadcom PCI switches supports inter-switch P2P links between two
> > PCI-to-PCI bridges. This presents an optimal data path for data
> > movement. The patch exports a new sysfs entry for PCI devices that
> > support the inter switch P2P links and reports the B:D:F information
> > of the devices that are connected through this inter switch link as
> > shown below:
> >
> >                              Host root bridge
> >                 ---------------------------------------
> >                 |                                     |
> >   NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2
> > (2c:00.0)   (2a:00.0)                             (3d:00.0)   (40:00.0)
> >                 |                                     |
> >                GPU1                                  GPU2
> >             (2d:00.0)                             (3f:00.0)
> >                                SERVER 1
> >
> > $ find /sys/ -name "p2p_link" | xargs grep .
> > /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0
> > /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0
> >
> > Current support is added to report the P2P links available for
> > Broadcom switches based on the capability that is reported by the
> > upstream bridges through their vendor-specific capability registers.
> >
> > Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
> > ---
> > Changes in v2:
> > Integrated the code into PCIe portdrv to create the sysfs entries during
> > probe, as suggested by Mani.
>
> Hmm. So we are trying to rework portdrv in general to support extensibility.
> I'm a little nervous about taking in vendor specific code in the meantime
> even if it is trivial like this is.  We will be having an extensible
> discovery scheme but for now the plan is that will be child device based
> for non PCI standard features.
>
> It is a fairly small bit of code, so maybe it is fine - I'm not keen
> on adding the implementation of the vendor specific parts to the
> main driver though. Push that to an optional c file.
>
> A few general comments inline.
>
> >
> >  Documentation/ABI/testing/sysfs-bus-pci |  14 +++
> >  drivers/pci/pcie/portdrv.c              | 131 ++++++++++++++++++++++++
> >  drivers/pci/pcie/portdrv.h              |  10 ++
> >  3 files changed, 155 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> > index ecf47559f495..e5d02f20655f 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > @@ -572,3 +572,17 @@ Description:
> >               enclosure-specific indications "specific0" to "specific7",
> >               hence the corresponding led class devices are unavailable if
> >               the DSM interface is used.
> > +
> > +What:                /sys/bus/pci/devices/.../p2p_link
> > +Date:                September 2024
> > +Contact:     Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> > +Description:
> > +             This file appears on PCIe upstream ports which supports an
> > +             internal P2P link.
> > +             Reading this attribute will provide the list of other upstream
> > +             ports on the system which have an internal P2P link available
> > +             between the two ports.
>
> Given this only applies to 'internal' links and not inter switch physical links
> I think you should rename it.  An eventual p2p link between physical switches
> is going to be much more complex as will need routing information.
> Let us avoid trampling on that space.
>
> > +Users:
> > +             Userspace applications interested in determining a optimal P2P
> > +             link for data transfers between devices connected to the PCIe
> > +             switches.
>
> Need more data that 'there is a link' for this.
> I'd like to see some info on bandwidth and latency. I've previously been
> in discussions about similar devices that provide a low latency but low
> bandwidth direct path.  That gets more likely if we scale up to
> multiple physical switches or the software stack is choosing between
> multiple p2p targets (e.g. getting nearest path to a multiheaded NIC).
>
> Perhaps best bet is to leave space for that by using a directory
> here to cover everything about p2p?  Maybe have links under there to the
> other upstream ports? That might be fiddly to manage though.
>
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 6af5e0425872..c940b4b242fd 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -18,6 +18,7 @@
> >  #include <linux/string.h>
> >  #include <linux/slab.h>
> >  #include <linux/aer.h>
> > +#include <linux/bitops.h>
> >
> >  #include "../pci.h"
> >  #include "portdrv.h"
> > @@ -37,6 +38,134 @@ struct portdrv_service_data {
> >       u32 service;
> >  };
> >
> > +/**
> > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler
> > + *       to check if the upstream port supports inter switch P2P.
> > + *
> > + * @dev: PCIe upstream port to check
> > + *
> > + * This function assumes the PCIe upstream port is a Broadcom
> > + * PCIe device.
> > + */
> > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)
>
> Put this in a separate c file + use a config option and some
> stubs to make it go away if people don't want to support it.
> The layering and separation in portdrv is currently messy but
> we shouldn't make it worse :(
>
> > +{
> > +     u64 dsn;
> > +     u16 vsec;
> > +     u32 vsec_data;
> > +
> > +     dsn = pci_get_dsn(dev);
> > +     if (!dsn) {
> > +             pci_dbg(dev, "DSN capability is not present\n");
> > +             return false;
> > +     }
>
> Why get the dsn (which will frequently exist on other devices)
> before getting the vsec which won't?  Reorder these first
> two checks. For most devices the match on vendor will fail in the
> vsec check.
>
> > +
> > +     vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
> > +                                     PCIE_BRCM_SW_P2P_VSEC_ID);
> > +     if (!vsec) {
> > +             pci_dbg(dev, "Failed to get VSEC capability\n");
> > +             return false;
> > +     }
> > +
> > +     pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
> > +                           &vsec_data);
> > +
> > +     pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
> > +             dsn, vsec_data);
> > +
> > +     if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn))
>
> Add a comment on this. Not obvious what this is checking as it's picking
> a single bit out of a serial number...
>
> > +             return false;
> > +
> > +     if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) !=
> > +             PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK)
> > +             return false;
> > +
> > +     return true;
>         return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) ==
>                PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK;
> perhaps.
>
> > +}
> > +
> > +/**
> > + * Determine if device supports Inter switch P2P links.
> > + *
> > + * Return value: true if inter switch P2P is supported, return false otherwise.
> > + */
> > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
> > +{
> > +     /* P2P link attribute is supported on upstream ports only */
> > +     if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> > +             return false;
> > +
> > +     /*
> > +      * Currently Broadcom PEX switches are supported.
> > +      */
> > +     if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
> > +         (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
> > +          dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
> > +             return pcie_brcm_is_p2p_supported(dev);
> > +
> > +     return false;
> > +}
> > +
> > +/*
> > + * Traverse list of all PCI bridges and find devices that support Inter switch P2P
> > + * and have the same serial number to create report the BDF over sysfs.
> > + */
> > +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr,
> > +                          char *buf)
> > +{
> > +     struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
> > +     size_t len = 0;
> > +     u64 dsn, dsn_link;
> > +
> > +     dsn = pci_get_dsn(pdev);
>
> Maybe add a comment that we don't need to repeat checks that were done
> to make the attribute visible. Hence dsn will exist and it will be p2p link capable.
>
> > +
> > +     /* Traverse list of PCI bridges to determine any available P2P links */
> > +     while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link))
>
> Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev()
> that does this, but that is already defined to mean something else...
>
> Bjorn, is this something we should be looking to make more consistent
> perhaps with naming to make it clear what is a search of all instances
> on any bus and what is a search below a particular bus?
>
> > +                     != NULL) {
> > +             if (pdev_link == pdev)
> > +                     continue;
> > +
> > +             if (!pcie_port_is_p2p_supported(pdev_link))
> > +                     continue;
> > +
> > +             dsn_link = pci_get_dsn(pdev_link);
> > +             if (!dsn_link)
> > +                     continue;
> > +
> > +             if (dsn == dsn_link)
> > +                     len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n",
> > +                                          pci_domain_nr(pdev_link->bus),
> > +                                          pdev_link->bus->number, PCI_SLOT(pdev_link->devfn),
> > +                                          PCI_FUNC(pdev_link->devfn));
> > +     }
> > +
> > +     return len;
> > +}
>
>
> > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> > index 12c89ea0313b..1be06cb45665 100644
> > --- a/drivers/pci/pcie/portdrv.h
> > +++ b/drivers/pci/pcie/portdrv.h
> > @@ -25,6 +25,16 @@
> >
> >  #define PCIE_PORT_DEVICE_MAXSERVICES   5
> >
> > +/* P2P Link supported device IDs */
> > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC     0xC030
> > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC     0xC034
> > +
> > +#define PCIE_BRCM_SW_P2P_VSEC_ID             0x1
> > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET    0xC
> > +#define PCIE_BRCM_SW_P2P_MODE_MASK           GENMASK(9, 8)
> > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK  0x2
> > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn)     ((dsn) & 0x8)
> Define the mask, and use FIELD_GET() to get that.
> > +
> >  extern bool pcie_ports_dpc_native;
> >
> >  #ifdef CONFIG_PCIEAER
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4221 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-10-03 20:41     ` Sumanesh Samanta
@ 2024-10-04 10:39       ` Jonathan Cameron
  2024-10-14  9:40         ` Shivasharan Srikanteshwara
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2024-10-04 10:39 UTC (permalink / raw)
  To: Sumanesh Samanta
  Cc: Shivasharan S, linux-pci, bhelgaas, manivannan.sadhasivam, logang,
	linux-kernel, sathya.prakash, sjeaugey

On Thu, 3 Oct 2024 14:41:07 -0600
Sumanesh Samanta <sumanesh.samanta@broadcom.com> wrote:

> Hi Jonathan,
> 
> >> Need more data that 'there is a link' for this.
> >>I'd like to see some info on bandwidth and latency.  
> 
> As you too noted in your comments, for now, we are only addressing p2p
> connection between "virtual switches", i.e. switches that look
> different to the host, but are actually part of the same physical
> hardware.
> Given that, I am not sure what we should display for bandwidth and
> latency. There is no physical link to traverse between the virtual
> switches, and usually we take that as "infinite" bandwidth and "zero"
> latency.

For a case where you have no information, not having attributes is
sensible. If there is information (CXL CDAT provides this for switches
for instance) then we should have an interface that provides space for
that information.

> As such, any number here will make little sense until we
> start supporting p2p connection between physical switches.

As above, it makes sense in a switch as well - if the information
is available.

> We could,
> of course, have some encoding for the time being, like have "INF" for
> bandwidth and 0 for latency, but again, those will not be very useful
> till the day this scheme is extended to physical switch and we display
> real values, like bandwidth and latency for a x16 PCIe link. Thoughts?

Hide the sysfs attributes for latency and bandwidth if we simply don't
know.  Software built on top of this can then assume full bandwidth
is available or better still run some measurements to establish the
missing data.

All I really meant by this suggestion is a directory with space for
other info is probably more extensible than a single file.

Jonathan

> 
> sincerely,
> Sumanesh
> 
> 
> On Tue, Sep 24, 2024 at 8:57 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 19 Sep 2024 01:13:43 -0700
> > Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote:
> >  
> > > Broadcom PCI switches supports inter-switch P2P links between two
> > > PCI-to-PCI bridges. This presents an optimal data path for data
> > > movement. The patch exports a new sysfs entry for PCI devices that
> > > support the inter switch P2P links and reports the B:D:F information
> > > of the devices that are connected through this inter switch link as
> > > shown below:
> > >
> > >                              Host root bridge
> > >                 ---------------------------------------
> > >                 |                                     |
> > >   NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2
> > > (2c:00.0)   (2a:00.0)                             (3d:00.0)   (40:00.0)
> > >                 |                                     |
> > >                GPU1                                  GPU2
> > >             (2d:00.0)                             (3f:00.0)
> > >                                SERVER 1
> > >
> > > $ find /sys/ -name "p2p_link" | xargs grep .
> > > /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0
> > > /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0
> > >
> > > Current support is added to report the P2P links available for
> > > Broadcom switches based on the capability that is reported by the
> > > upstream bridges through their vendor-specific capability registers.
> > >
> > > Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> > > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
> > > ---
> > > Changes in v2:
> > > Integrated the code into PCIe portdrv to create the sysfs entries during
> > > probe, as suggested by Mani.  
> >
> > Hmm. So we are trying to rework portdrv in general to support extensibility.
> > I'm a little nervous about taking in vendor specific code in the meantime
> > even if it is trivial like this is.  We will be having an extensible
> > discovery scheme but for now the plan is that will be child device based
> > for non PCI standard features.
> >
> > It is a fairly small bit of code, so maybe it is fine - I'm not keen
> > on adding the implementation of the vendor specific parts to the
> > main driver though. Push that to an optional c file.
> >
> > A few general comments inline.
> >  
> > >
> > >  Documentation/ABI/testing/sysfs-bus-pci |  14 +++
> > >  drivers/pci/pcie/portdrv.c              | 131 ++++++++++++++++++++++++
> > >  drivers/pci/pcie/portdrv.h              |  10 ++
> > >  3 files changed, 155 insertions(+)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> > > index ecf47559f495..e5d02f20655f 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > > @@ -572,3 +572,17 @@ Description:
> > >               enclosure-specific indications "specific0" to "specific7",
> > >               hence the corresponding led class devices are unavailable if
> > >               the DSM interface is used.
> > > +
> > > +What:                /sys/bus/pci/devices/.../p2p_link
> > > +Date:                September 2024
> > > +Contact:     Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> > > +Description:
> > > +             This file appears on PCIe upstream ports which supports an
> > > +             internal P2P link.
> > > +             Reading this attribute will provide the list of other upstream
> > > +             ports on the system which have an internal P2P link available
> > > +             between the two ports.  
> >
> > Given this only applies to 'internal' links and not inter switch physical links
> > I think you should rename it.  An eventual p2p link between physical switches
> > is going to be much more complex as will need routing information.
> > Let us avoid trampling on that space.
> >  
> > > +Users:
> > > +             Userspace applications interested in determining a optimal P2P
> > > +             link for data transfers between devices connected to the PCIe
> > > +             switches.  
> >
> > Need more data that 'there is a link' for this.
> > I'd like to see some info on bandwidth and latency. I've previously been
> > in discussions about similar devices that provide a low latency but low
> > bandwidth direct path.  That gets more likely if we scale up to
> > multiple physical switches or the software stack is choosing between
> > multiple p2p targets (e.g. getting nearest path to a multiheaded NIC).
> >
> > Perhaps best bet is to leave space for that by using a directory
> > here to cover everything about p2p?  Maybe have links under there to the
> > other upstream ports? That might be fiddly to manage though.
> >  
> > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > > index 6af5e0425872..c940b4b242fd 100644
> > > --- a/drivers/pci/pcie/portdrv.c
> > > +++ b/drivers/pci/pcie/portdrv.c
> > > @@ -18,6 +18,7 @@
> > >  #include <linux/string.h>
> > >  #include <linux/slab.h>
> > >  #include <linux/aer.h>
> > > +#include <linux/bitops.h>
> > >
> > >  #include "../pci.h"
> > >  #include "portdrv.h"
> > > @@ -37,6 +38,134 @@ struct portdrv_service_data {
> > >       u32 service;
> > >  };
> > >
> > > +/**
> > > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler
> > > + *       to check if the upstream port supports inter switch P2P.
> > > + *
> > > + * @dev: PCIe upstream port to check
> > > + *
> > > + * This function assumes the PCIe upstream port is a Broadcom
> > > + * PCIe device.
> > > + */
> > > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)  
> >
> > Put this in a separate c file + use a config option and some
> > stubs to make it go away if people don't want to support it.
> > The layering and separation in portdrv is currently messy but
> > we shouldn't make it worse :(
> >  
> > > +{
> > > +     u64 dsn;
> > > +     u16 vsec;
> > > +     u32 vsec_data;
> > > +
> > > +     dsn = pci_get_dsn(dev);
> > > +     if (!dsn) {
> > > +             pci_dbg(dev, "DSN capability is not present\n");
> > > +             return false;
> > > +     }  
> >
> > Why get the dsn (which will frequently exist on other devices)
> > before getting the vsec which won't?  Reorder these first
> > two checks. For most devices the match on vendor will fail in the
> > vsec check.
> >  
> > > +
> > > +     vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
> > > +                                     PCIE_BRCM_SW_P2P_VSEC_ID);
> > > +     if (!vsec) {
> > > +             pci_dbg(dev, "Failed to get VSEC capability\n");
> > > +             return false;
> > > +     }
> > > +
> > > +     pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
> > > +                           &vsec_data);
> > > +
> > > +     pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
> > > +             dsn, vsec_data);
> > > +
> > > +     if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn))  
> >
> > Add a comment on this. Not obvious what this is checking as it's picking
> > a single bit out of a serial number...
> >  
> > > +             return false;
> > > +
> > > +     if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) !=
> > > +             PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK)
> > > +             return false;
> > > +
> > > +     return true;  
> >         return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) ==
> >                PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK;
> > perhaps.
> >  
> > > +}
> > > +
> > > +/**
> > > + * Determine if device supports Inter switch P2P links.
> > > + *
> > > + * Return value: true if inter switch P2P is supported, return false otherwise.
> > > + */
> > > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
> > > +{
> > > +     /* P2P link attribute is supported on upstream ports only */
> > > +     if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> > > +             return false;
> > > +
> > > +     /*
> > > +      * Currently Broadcom PEX switches are supported.
> > > +      */
> > > +     if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
> > > +         (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
> > > +          dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
> > > +             return pcie_brcm_is_p2p_supported(dev);
> > > +
> > > +     return false;
> > > +}
> > > +
> > > +/*
> > > + * Traverse list of all PCI bridges and find devices that support Inter switch P2P
> > > + * and have the same serial number to create report the BDF over sysfs.
> > > + */
> > > +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr,
> > > +                          char *buf)
> > > +{
> > > +     struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
> > > +     size_t len = 0;
> > > +     u64 dsn, dsn_link;
> > > +
> > > +     dsn = pci_get_dsn(pdev);  
> >
> > Maybe add a comment that we don't need to repeat checks that were done
> > to make the attribute visible. Hence dsn will exist and it will be p2p link capable.
> >  
> > > +
> > > +     /* Traverse list of PCI bridges to determine any available P2P links */
> > > +     while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link))  
> >
> > Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev()
> > that does this, but that is already defined to mean something else...
> >
> > Bjorn, is this something we should be looking to make more consistent
> > perhaps with naming to make it clear what is a search of all instances
> > on any bus and what is a search below a particular bus?
> >  
> > > +                     != NULL) {
> > > +             if (pdev_link == pdev)
> > > +                     continue;
> > > +
> > > +             if (!pcie_port_is_p2p_supported(pdev_link))
> > > +                     continue;
> > > +
> > > +             dsn_link = pci_get_dsn(pdev_link);
> > > +             if (!dsn_link)
> > > +                     continue;
> > > +
> > > +             if (dsn == dsn_link)
> > > +                     len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n",
> > > +                                          pci_domain_nr(pdev_link->bus),
> > > +                                          pdev_link->bus->number, PCI_SLOT(pdev_link->devfn),
> > > +                                          PCI_FUNC(pdev_link->devfn));
> > > +     }
> > > +
> > > +     return len;
> > > +}  
> >
> >  
> > > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> > > index 12c89ea0313b..1be06cb45665 100644
> > > --- a/drivers/pci/pcie/portdrv.h
> > > +++ b/drivers/pci/pcie/portdrv.h
> > > @@ -25,6 +25,16 @@
> > >
> > >  #define PCIE_PORT_DEVICE_MAXSERVICES   5
> > >
> > > +/* P2P Link supported device IDs */
> > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC     0xC030
> > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC     0xC034
> > > +
> > > +#define PCIE_BRCM_SW_P2P_VSEC_ID             0x1
> > > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET    0xC
> > > +#define PCIE_BRCM_SW_P2P_MODE_MASK           GENMASK(9, 8)
> > > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK  0x2
> > > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn)     ((dsn) & 0x8)  
> > Define the mask, and use FIELD_GET() to get that.  
> > > +
> > >  extern bool pcie_ports_dpc_native;
> > >
> > >  #ifdef CONFIG_PCIEAER  
> >  
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-10-04 10:39       ` Jonathan Cameron
@ 2024-10-14  9:40         ` Shivasharan Srikanteshwara
  2024-10-16 13:25           ` Jonathan Cameron
  0 siblings, 1 reply; 12+ messages in thread
From: Shivasharan Srikanteshwara @ 2024-10-14  9:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Sumanesh Samanta, linux-pci, bhelgaas, manivannan.sadhasivam,
	logang, linux-kernel, sathya.prakash, sjeaugey


[-- Attachment #1.1: Type: text/plain, Size: 14962 bytes --]

On Fri, Oct 4, 2024 at 4:09 PM Jonathan Cameron <Jonathan.Cameron@huawei.com>
wrote:
>
> On Thu, 3 Oct 2024 14:41:07 -0600
> Sumanesh Samanta <sumanesh.samanta@broadcom.com> wrote:
>
> > Hi Jonathan,
> >
> > >> Need more data that 'there is a link' for this.
> > >>I'd like to see some info on bandwidth and latency.
> >
> > As you too noted in your comments, for now, we are only addressing p2p
> > connection between "virtual switches", i.e. switches that look
> > different to the host, but are actually part of the same physical
> > hardware.
> > Given that, I am not sure what we should display for bandwidth and
> > latency. There is no physical link to traverse between the virtual
> > switches, and usually we take that as "infinite" bandwidth and "zero"
> > latency.
>
> For a case where you have no information, not having attributes is
> sensible. If there is information (CXL CDAT provides this for switches
> for instance) then we should have an interface that provides space for
> that information.
>
> > As such, any number here will make little sense until we
> > start supporting p2p connection between physical switches.
>
> As above, it makes sense in a switch as well - if the information
> is available.
>
> > We could,
> > of course, have some encoding for the time being, like have "INF" for
> > bandwidth and 0 for latency, but again, those will not be very useful
> > till the day this scheme is extended to physical switch and we display
> > real values, like bandwidth and latency for a x16 PCIe link. Thoughts?
>
> Hide the sysfs attributes for latency and bandwidth if we simply don't
> know.  Software built on top of this can then assume full bandwidth
> is available or better still run some measurements to establish the
> missing data.
>
> All I really meant by this suggestion is a directory with space for
> other info is probably more extensible than a single file.

Hi Jonathan,
We will make the changes to add a directory for p2p_link related information
to be exposed to user. We will only populate the information related to the
inter-switch P2P links. Rest of the attributes can be added for devices that
report them at a later stage.
Please check if the directory structure makes sense to you:
/sys/devices/.../B:D:F/p2p_link/links -> Reading this file will return the
same
information that is returned currently by the p2p_link file.

>
> Jonathan
>
> >
> > sincerely,
> > Sumanesh
> >
> >
> > On Tue, Sep 24, 2024 at 8:57 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Thu, 19 Sep 2024 01:13:43 -0700
> > > Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote:
> > >
> > > > Broadcom PCI switches supports inter-switch P2P links between two
> > > > PCI-to-PCI bridges. This presents an optimal data path for data
> > > > movement. The patch exports a new sysfs entry for PCI devices that
> > > > support the inter switch P2P links and reports the B:D:F information
> > > > of the devices that are connected through this inter switch link as
> > > > shown below:
> > > >
> > > >                              Host root bridge
> > > >                 ---------------------------------------
> > > >                 |                                     |
> > > >   NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 ---
NIC2
> > > > (2c:00.0)   (2a:00.0)                             (3d:00.0)
(40:00.0)
> > > >                 |                                     |
> > > >                GPU1                                  GPU2
> > > >             (2d:00.0)                             (3f:00.0)
> > > >                                SERVER 1
> > > >
> > > > $ find /sys/ -name "p2p_link" | xargs grep .
> > > >
/sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0
> > > >
/sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0
> > > >
> > > > Current support is added to report the P2P links available for
> > > > Broadcom switches based on the capability that is reported by the
> > > > upstream bridges through their vendor-specific capability registers.
> > > >
> > > > Signed-off-by: Shivasharan S <
shivasharan.srikanteshwara@broadcom.com>
> > > > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
> > > > ---
> > > > Changes in v2:
> > > > Integrated the code into PCIe portdrv to create the sysfs entries
during
> > > > probe, as suggested by Mani.
> > >
> > > Hmm. So we are trying to rework portdrv in general to support
extensibility.
> > > I'm a little nervous about taking in vendor specific code in the
meantime
> > > even if it is trivial like this is.  We will be having an extensible
> > > discovery scheme but for now the plan is that will be child device
based
> > > for non PCI standard features.
> > >
> > > It is a fairly small bit of code, so maybe it is fine - I'm not keen
> > > on adding the implementation of the vendor specific parts to the
> > > main driver though. Push that to an optional c file.
> > >
> > > A few general comments inline.
> > >
> > > >
> > > >  Documentation/ABI/testing/sysfs-bus-pci |  14 +++
> > > >  drivers/pci/pcie/portdrv.c              | 131
++++++++++++++++++++++++
> > > >  drivers/pci/pcie/portdrv.h              |  10 ++
> > > >  3 files changed, 155 insertions(+)
> > > >
> > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci
b/Documentation/ABI/testing/sysfs-bus-pci
> > > > index ecf47559f495..e5d02f20655f 100644
> > > > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > > > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > > > @@ -572,3 +572,17 @@ Description:
> > > >               enclosure-specific indications "specific0" to
"specific7",
> > > >               hence the corresponding led class devices are
unavailable if
> > > >               the DSM interface is used.
> > > > +
> > > > +What:                /sys/bus/pci/devices/.../p2p_link
> > > > +Date:                September 2024
> > > > +Contact:     Shivasharan S <shivasharan.srikanteshwara@broadcom.com
>
> > > > +Description:
> > > > +             This file appears on PCIe upstream ports which
supports an
> > > > +             internal P2P link.
> > > > +             Reading this attribute will provide the list of other
upstream
> > > > +             ports on the system which have an internal P2P link
available
> > > > +             between the two ports.
> > >
> > > Given this only applies to 'internal' links and not inter switch
physical links
> > > I think you should rename it.  An eventual p2p link between physical
switches
> > > is going to be much more complex as will need routing information.
> > > Let us avoid trampling on that space.
> > >
> > > > +Users:
> > > > +             Userspace applications interested in determining a
optimal P2P
> > > > +             link for data transfers between devices connected to
the PCIe
> > > > +             switches.
> > >
> > > Need more data that 'there is a link' for this.
> > > I'd like to see some info on bandwidth and latency. I've previously
been
> > > in discussions about similar devices that provide a low latency but
low
> > > bandwidth direct path.  That gets more likely if we scale up to
> > > multiple physical switches or the software stack is choosing between
> > > multiple p2p targets (e.g. getting nearest path to a multiheaded NIC).
> > >
> > > Perhaps best bet is to leave space for that by using a directory
> > > here to cover everything about p2p?  Maybe have links under there to
the
> > > other upstream ports? That might be fiddly to manage though.
> > >
> > > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > > > index 6af5e0425872..c940b4b242fd 100644
> > > > --- a/drivers/pci/pcie/portdrv.c
> > > > +++ b/drivers/pci/pcie/portdrv.c
> > > > @@ -18,6 +18,7 @@
> > > >  #include <linux/string.h>
> > > >  #include <linux/slab.h>
> > > >  #include <linux/aer.h>
> > > > +#include <linux/bitops.h>
> > > >
> > > >  #include "../pci.h"
> > > >  #include "portdrv.h"
> > > > @@ -37,6 +38,134 @@ struct portdrv_service_data {
> > > >       u32 service;
> > > >  };
> > > >
> > > > +/**
> > > > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler
> > > > + *       to check if the upstream port supports inter switch P2P.
> > > > + *
> > > > + * @dev: PCIe upstream port to check
> > > > + *
> > > > + * This function assumes the PCIe upstream port is a Broadcom
> > > > + * PCIe device.
> > > > + */
> > > > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)
> > >
> > > Put this in a separate c file + use a config option and some
> > > stubs to make it go away if people don't want to support it.
> > > The layering and separation in portdrv is currently messy but
> > > we shouldn't make it worse :(
> > >
Understood. We will move the p2p_link creation code to a separate .c/.h file
.
> > > > +{
> > > > +     u64 dsn;
> > > > +     u16 vsec;
> > > > +     u32 vsec_data;
> > > > +
> > > > +     dsn = pci_get_dsn(dev);
> > > > +     if (!dsn) {
> > > > +             pci_dbg(dev, "DSN capability is not present\n");
> > > > +             return false;
> > > > +     }
> > >
> > > Why get the dsn (which will frequently exist on other devices)
> > > before getting the vsec which won't?  Reorder these first
> > > two checks. For most devices the match on vendor will fail in the
> > > vsec check.
> > >
This will be fixed in the next version of this patch.

> > > > +
> > > > +     vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
> > > > +                                     PCIE_BRCM_SW_P2P_VSEC_ID);
> > > > +     if (!vsec) {
> > > > +             pci_dbg(dev, "Failed to get VSEC capability\n");
> > > > +             return false;
> > > > +     }
> > > > +
> > > > +     pci_read_config_dword(dev, vsec +
PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
> > > > +                           &vsec_data);
> > > > +
> > > > +     pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
> > > > +             dsn, vsec_data);
> > > > +
> > > > +     if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn))
> > >
> > > Add a comment on this. Not obvious what this is checking as it's
picking
> > > a single bit out of a serial number...
> > >
Will do.
> > > > +             return false;
> > > > +
> > > > +     if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) !=
> > > > +             PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK)
> > > > +             return false;
> > > > +
> > > > +     return true;
> > >         return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) ==
> > >                PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK;
> > > perhaps.
> > >
Will take care of this.
> > > > +}
> > > > +
> > > > +/**
> > > > + * Determine if device supports Inter switch P2P links.
> > > > + *
> > > > + * Return value: true if inter switch P2P is supported, return
false otherwise.
> > > > + */
> > > > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
> > > > +{
> > > > +     /* P2P link attribute is supported on upstream ports only */
> > > > +     if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> > > > +             return false;
> > > > +
> > > > +     /*
> > > > +      * Currently Broadcom PEX switches are supported.
> > > > +      */
> > > > +     if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
> > > > +         (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
> > > > +          dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
> > > > +             return pcie_brcm_is_p2p_supported(dev);
> > > > +
> > > > +     return false;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Traverse list of all PCI bridges and find devices that support
Inter switch P2P
> > > > + * and have the same serial number to create report the BDF over
sysfs.
> > > > + */
> > > > +static ssize_t p2p_link_show(struct device *dev, struct
device_attribute *attr,
> > > > +                          char *buf)
> > > > +{
> > > > +     struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
> > > > +     size_t len = 0;
> > > > +     u64 dsn, dsn_link;
> > > > +
> > > > +     dsn = pci_get_dsn(pdev);
> > >
> > > Maybe add a comment that we don't need to repeat checks that were done
> > > to make the attribute visible. Hence dsn will exist and it will be
p2p link capable.
> > >
Will take care of this.

> > > > +
> > > > +     /* Traverse list of PCI bridges to determine any available
P2P links */
> > > > +     while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8,
pdev_link))
> > >
> > > Feels like we should have a for_each_pci_bridge() similar to
for_each_pci_dev()
> > > that does this, but that is already defined to mean something else...
> > >
> > > Bjorn, is this something we should be looking to make more consistent
> > > perhaps with naming to make it clear what is a search of all instances
> > > on any bus and what is a search below a particular bus?
> > >
> > > > +                     != NULL) {
> > > > +             if (pdev_link == pdev)
> > > > +                     continue;
> > > > +
> > > > +             if (!pcie_port_is_p2p_supported(pdev_link))
> > > > +                     continue;
> > > > +
> > > > +             dsn_link = pci_get_dsn(pdev_link);
> > > > +             if (!dsn_link)
> > > > +                     continue;
> > > > +
> > > > +             if (dsn == dsn_link)
> > > > +                     len += sysfs_emit_at(buf, len,
"%04x:%02x:%02x.%d\n",
> > > > +
 pci_domain_nr(pdev_link->bus),
> > > > +                                          pdev_link->bus->number,
PCI_SLOT(pdev_link->devfn),
> > > > +
 PCI_FUNC(pdev_link->devfn));
> > > > +     }
> > > > +
> > > > +     return len;
> > > > +}
> > >
> > >
> > > > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> > > > index 12c89ea0313b..1be06cb45665 100644
> > > > --- a/drivers/pci/pcie/portdrv.h
> > > > +++ b/drivers/pci/pcie/portdrv.h
> > > > @@ -25,6 +25,16 @@
> > > >
> > > >  #define PCIE_PORT_DEVICE_MAXSERVICES   5
> > > >
> > > > +/* P2P Link supported device IDs */
> > > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC     0xC030
> > > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC     0xC034
> > > > +
> > > > +#define PCIE_BRCM_SW_P2P_VSEC_ID             0x1
> > > > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET    0xC
> > > > +#define PCIE_BRCM_SW_P2P_MODE_MASK           GENMASK(9, 8)
> > > > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK  0x2
> > > > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn)     ((dsn) & 0x8)
> > > Define the mask, and use FIELD_GET() to get that.
Will take care of this.

Best Regards,
Shivasharan

> > > > +
> > > >  extern bool pcie_ports_dpc_native;
> > > >
> > > >  #ifdef CONFIG_PCIEAER
> > >
> >
>

[-- Attachment #1.2: Type: text/html, Size: 22459 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4251 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect P2P links
  2024-09-24 15:08   ` Jonathan Cameron
@ 2024-10-14  9:44     ` Shivasharan Srikanteshwara
  0 siblings, 0 replies; 12+ messages in thread
From: Shivasharan Srikanteshwara @ 2024-10-14  9:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-pci, bhelgaas, manivannan.sadhasivam, logang, linux-kernel,
	sumanesh.samanta, sathya.prakash, sjeaugey

[-- Attachment #1: Type: text/plain, Size: 3432 bytes --]

On Tue, Sep 24, 2024 at 8:38 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 19 Sep 2024 01:13:44 -0700
> Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote:
>
> > Update the p2p_dma_distance() to determine inter-switch P2P links existing
> > between two switches and use this to calculate the DMA distance between
> > two devices.
> >
> > Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
> > ---
> >  drivers/pci/p2pdma.c       | 17 ++++++++++++++++-
> >  drivers/pci/pcie/portdrv.c | 34 ++++++++++++++++++++++++++++++++++
> >  drivers/pci/pcie/portdrv.h |  2 ++
> >  3 files changed, 52 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> > index 4f47a13cb500..eed3b69e7293 100644
> > --- a/drivers/pci/p2pdma.c
> > +++ b/drivers/pci/p2pdma.c
> > @@ -21,6 +21,8 @@
> >  #include <linux/seq_buf.h>
> >  #include <linux/xarray.h>
> >
> > +extern bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b);
>
> That's nasty.  Include the header so you get a clean stub if
> this support is not built in etc.
>
Will move this to the new header file that will be added.
> > +
>
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index c940b4b242fd..2fe1598fc684 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -104,6 +104,40 @@ static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
> >       return false;
> >  }
> >
> > +/**
> > + * pcie_port_is_p2p_link_available: Determine if a P2P link is available
> > + * between the two upstream bridges. The serial number of the two devices
> > + * will be compared and if they are same then it is considered that the P2P
> > + * link is available.
> > + *
> > + * Return value: true if inter switch P2P is available, return false otherwise.
> > + */
> > +bool pcie_port_is_p2p_link_available(struct pci_dev *a, struct pci_dev *b)
> > +{
> > +     u64 dsn_a, dsn_b;
> > +
> > +     /*
> > +      * Check if the devices support Inter switch P2P.
> > +      */
>
> Single line comment syntax fine here.  However it's kind
> of obvious, so I'd just drop the comment.
>
>
Will do.

> > +     if (!pcie_port_is_p2p_supported(a) ||
> > +         !pcie_port_is_p2p_supported(b))
>
> Don't wrap this. I think it's under 80 chars anyway.
>
Will do.

> > +             return false;
> > +
> > +     dsn_a = pci_get_dsn(a);
> > +     if (!dsn_a)
> > +             return false;
> If we assume that dsn is the only right way to detect this
> (I'm fine with that for now) then we know the supported tests
> above would only pass if this is true. Hence
>
> return pci_get_dsn(a) == pci_get_dsn(b);
>
> should be fine.
>
Agreed. Will rework this as suggested and update in the next
patch.

> > +
> > +     dsn_b = pci_get_dsn(b);
> > +     if (!dsn_b)
> > +             return false;
> > +
> > +     if (dsn_a == dsn_b)
> > +             return true;
>
>         return dsn_a == dsn_b;
>
Above changes will take care of this as well.

> > +
> > +     return false;
> > +}
> > +EXPORT_SYMBOL_GPL(pcie_port_is_p2p_link_available);
> > +
> >  /*
> >   * Traverse list of all PCI bridges and find devices that support Inter switch P2P
> >   * and have the same serial number to create report the BDF over sysfs.
>
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4251 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links
  2024-10-14  9:40         ` Shivasharan Srikanteshwara
@ 2024-10-16 13:25           ` Jonathan Cameron
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2024-10-16 13:25 UTC (permalink / raw)
  To: Shivasharan Srikanteshwara
  Cc: Sumanesh Samanta, linux-pci, bhelgaas, manivannan.sadhasivam,
	logang, linux-kernel, sathya.prakash, sjeaugey

On Mon, 14 Oct 2024 15:10:57 +0530
Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> wrote:

> On Fri, Oct 4, 2024 at 4:09 PM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> wrote:
> >
> > On Thu, 3 Oct 2024 14:41:07 -0600
> > Sumanesh Samanta <sumanesh.samanta@broadcom.com> wrote:
> >  
> > > Hi Jonathan,
> > >  
> > > >> Need more data that 'there is a link' for this.
> > > >>I'd like to see some info on bandwidth and latency.  
> > >
> > > As you too noted in your comments, for now, we are only addressing p2p
> > > connection between "virtual switches", i.e. switches that look
> > > different to the host, but are actually part of the same physical
> > > hardware.
> > > Given that, I am not sure what we should display for bandwidth and
> > > latency. There is no physical link to traverse between the virtual
> > > switches, and usually we take that as "infinite" bandwidth and "zero"
> > > latency.  
> >
> > For a case where you have no information, not having attributes is
> > sensible. If there is information (CXL CDAT provides this for switches
> > for instance) then we should have an interface that provides space for
> > that information.
> >  
> > > As such, any number here will make little sense until we
> > > start supporting p2p connection between physical switches.  
> >
> > As above, it makes sense in a switch as well - if the information
> > is available.
> >  
> > > We could,
> > > of course, have some encoding for the time being, like have "INF" for
> > > bandwidth and 0 for latency, but again, those will not be very useful
> > > till the day this scheme is extended to physical switch and we display
> > > real values, like bandwidth and latency for a x16 PCIe link. Thoughts?  
> >
> > Hide the sysfs attributes for latency and bandwidth if we simply don't
> > know.  Software built on top of this can then assume full bandwidth
> > is available or better still run some measurements to establish the
> > missing data.
> >
> > All I really meant by this suggestion is a directory with space for
> > other info is probably more extensible than a single file.  
> 
> Hi Jonathan,
> We will make the changes to add a directory for p2p_link related information
> to be exposed to user. We will only populate the information related to the
> inter-switch P2P links. Rest of the attributes can be added for devices that
> report them at a later stage.
> Please check if the directory structure makes sense to you:
> /sys/devices/.../B:D:F/p2p_link/links -> Reading this file will return the
> same
> information that is returned currently by the p2p_link file.
Sounds good to me.

Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-10-16 13:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-19  8:13 [PATCH 0/2 v2] PCI/portdrv: Report inter switch P2P links through sysfs Shivasharan S
2024-09-19  8:13 ` [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links Shivasharan S
2024-09-20  4:28   ` kernel test robot
2024-09-24 14:57   ` Jonathan Cameron
2024-10-03 20:41     ` Sumanesh Samanta
2024-10-04 10:39       ` Jonathan Cameron
2024-10-14  9:40         ` Shivasharan Srikanteshwara
2024-10-16 13:25           ` Jonathan Cameron
2024-09-19  8:13 ` [PATCH 2/2 v2] PCI/P2PDMA: Modify p2p_dma_distance to detect " Shivasharan S
2024-09-20  5:51   ` kernel test robot
2024-09-24 15:08   ` Jonathan Cameron
2024-10-14  9:44     ` Shivasharan Srikanteshwara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).