Linux CXL
 help / color / mirror / Atom feed
* [PATCH 1/3] cxl/region: Calculate performance data for a region
@ 2023-12-07 23:30 Dave Jiang
  2023-12-07 23:34 ` Dave Jiang
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Jiang @ 2023-12-07 23:30 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/region.c |   94 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h         |    1 
 2 files changed, 95 insertions(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 56e575c79bb4..d879f5702cf2 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2934,6 +2934,98 @@ static int is_system_ram(struct resource *res, void *arg)
 	return 1;
 }
 
+static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled;
+	unsigned int rd_bw = 0, rd_lat = 0;
+	unsigned int wr_bw = 0, wr_lat = 0;
+	struct access_coordinate *coord;
+	struct list_head *perf_list;
+	int rc = 0, i;
+
+	lockdep_assert_held(&cxl_region_rwsem);
+
+	/* No need to proceed if hmem attributes are already present */
+	if (cxlr->coord)
+		return 0;
+
+	coord = devm_kzalloc(&cxlr->dev, sizeof(*coord), GFP_KERNEL);
+	if (!coord)
+		return -ENOMEM;
+
+	cxled = p->targets[0];
+
+	for (i = 0; i < p->nr_targets; i++) {
+		struct range dpa = {
+			.start = cxled->dpa_res->start,
+			.end = cxled->dpa_res->end,
+		};
+		struct cxl_memdev_state *mds;
+		struct perf_prop_entry *perf;
+		struct cxl_dev_state *cxlds;
+		struct cxl_memdev *cxlmd;
+		bool found = false;
+
+		cxled = p->targets[i];
+		cxlmd = cxled_to_memdev(cxled);
+		cxlds = cxlmd->cxlds;
+		mds = to_cxl_memdev_state(cxlds);
+
+		switch (cxlr->mode) {
+		case CXL_DECODER_RAM:
+			perf_list = &mds->ram_perf_list;
+			break;
+		case CXL_DECODER_PMEM:
+			perf_list = &mds->pmem_perf_list;
+			break;
+		default:
+			rc = -EINVAL;
+			goto err;
+		}
+
+		if (list_empty(perf_list)) {
+			rc = -ENOENT;
+			goto err;
+		}
+
+		list_for_each_entry(perf, perf_list, list) {
+			if (range_contains(&perf->dpa_range, &dpa)) {
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			rc = -ENOENT;
+			goto err;
+		}
+
+		/* Get total bandwidth and the worst latency for the cxl region */
+		rd_lat = max_t(unsigned int, rd_lat,
+			       perf->coord.read_latency);
+		rd_bw += perf->coord.read_bandwidth;
+		wr_lat = max_t(unsigned int, wr_lat,
+			       perf->coord.write_latency);
+		wr_bw += perf->coord.write_bandwidth;
+	}
+
+	*coord = (struct access_coordinate) {
+		.read_latency = rd_lat,
+		.read_bandwidth = rd_bw,
+		.write_latency = wr_lat,
+		.write_bandwidth = wr_bw,
+	};
+
+	cxlr->coord = coord;
+
+	return 0;
+
+err:
+	devm_kfree(&cxlr->dev, coord);
+	return rc;
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -2959,6 +3051,8 @@ static int cxl_region_probe(struct device *dev)
 		goto out;
 	}
 
+	cxl_region_perf_data_calculate(cxlr);
+
 	/*
 	 * From this point on any path that changes the region's state away from
 	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 004534cf0361..265da412c5bd 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -529,6 +529,7 @@ struct cxl_region {
 	struct cxl_pmem_region *cxlr_pmem;
 	unsigned long flags;
 	struct cxl_region_params params;
+	struct access_coordinate *coord;
 };
 
 struct cxl_nvdimm_bridge {



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 0/3] cxl: Add support to report region access coordinates to numa nodes
@ 2023-12-07 23:31 Dave Jiang
  2023-12-07 23:31 ` [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-07 23:31 UTC (permalink / raw)
  To: linux-cxl
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, dan.j.williams, ira.weiny,
	vishal.l.verma, alison.schofield, jonathan.cameron, dave

This series adds support for computing the performance data of a CXL region
and also updates the performance data to the NUMA node. The series depends on
the posted QTG ID support series [1].

CXL memory devices already attached before boot are enumerated by the BIOS.
The SRAT and HMAT tables are properly setup to including memory regions
enumerated from those CXL memory devices. For regions not programmed or a
hot-plugged CXL memory device, the BIOS does not have the relevant
information and the performance data has to be caluclated by the driver
post region assembly.

Recall from [1] that the performance data for the ranges of a CXL memory device
is computed and cached. A CXL memory region can be backed by one or more
devices. Thus the performance data would be the aggregated bandwidth of all
devices that back a region and the worst latency out of all devices backing
the region.

[1]: https://lore.kernel.org/linux-cxl/170198976423.3522351.8359845516235306693.stgit@djiang5-mobl3/T/#t

---

Dave Jiang (3):
      cxl/region: Calculate performance data for a region
      cxl/region: Add sysfs attribute for locality attributes of CXL regions
      cxl: Add memory hotplug notifier for cxl region


 Documentation/ABI/testing/sysfs-bus-cxl |  40 ++++++
 drivers/base/node.c                     |   1 +
 drivers/cxl/core/region.c               | 162 ++++++++++++++++++++++++
 drivers/cxl/cxl.h                       |   3 +
 4 files changed, 206 insertions(+)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] cxl/region: Calculate performance data for a region
  2023-12-07 23:31 [PATCH 0/3] cxl: Add support to report region access coordinates to numa nodes Dave Jiang
@ 2023-12-07 23:31 ` Dave Jiang
  2023-12-11 17:44   ` fan
  2023-12-12  0:19   ` Dan Williams
  2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
  2023-12-07 23:32 ` [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region Dave Jiang
  2 siblings, 2 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-07 23:31 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/region.c |   94 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h         |    1 
 2 files changed, 95 insertions(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 56e575c79bb4..d879f5702cf2 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2934,6 +2934,98 @@ static int is_system_ram(struct resource *res, void *arg)
 	return 1;
 }
 
+static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled;
+	unsigned int rd_bw = 0, rd_lat = 0;
+	unsigned int wr_bw = 0, wr_lat = 0;
+	struct access_coordinate *coord;
+	struct list_head *perf_list;
+	int rc = 0, i;
+
+	lockdep_assert_held(&cxl_region_rwsem);
+
+	/* No need to proceed if hmem attributes are already present */
+	if (cxlr->coord)
+		return 0;
+
+	coord = devm_kzalloc(&cxlr->dev, sizeof(*coord), GFP_KERNEL);
+	if (!coord)
+		return -ENOMEM;
+
+	cxled = p->targets[0];
+
+	for (i = 0; i < p->nr_targets; i++) {
+		struct range dpa = {
+			.start = cxled->dpa_res->start,
+			.end = cxled->dpa_res->end,
+		};
+		struct cxl_memdev_state *mds;
+		struct perf_prop_entry *perf;
+		struct cxl_dev_state *cxlds;
+		struct cxl_memdev *cxlmd;
+		bool found = false;
+
+		cxled = p->targets[i];
+		cxlmd = cxled_to_memdev(cxled);
+		cxlds = cxlmd->cxlds;
+		mds = to_cxl_memdev_state(cxlds);
+
+		switch (cxlr->mode) {
+		case CXL_DECODER_RAM:
+			perf_list = &mds->ram_perf_list;
+			break;
+		case CXL_DECODER_PMEM:
+			perf_list = &mds->pmem_perf_list;
+			break;
+		default:
+			rc = -EINVAL;
+			goto err;
+		}
+
+		if (list_empty(perf_list)) {
+			rc = -ENOENT;
+			goto err;
+		}
+
+		list_for_each_entry(perf, perf_list, list) {
+			if (range_contains(&perf->dpa_range, &dpa)) {
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			rc = -ENOENT;
+			goto err;
+		}
+
+		/* Get total bandwidth and the worst latency for the cxl region */
+		rd_lat = max_t(unsigned int, rd_lat,
+			       perf->coord.read_latency);
+		rd_bw += perf->coord.read_bandwidth;
+		wr_lat = max_t(unsigned int, wr_lat,
+			       perf->coord.write_latency);
+		wr_bw += perf->coord.write_bandwidth;
+	}
+
+	*coord = (struct access_coordinate) {
+		.read_latency = rd_lat,
+		.read_bandwidth = rd_bw,
+		.write_latency = wr_lat,
+		.write_bandwidth = wr_bw,
+	};
+
+	cxlr->coord = coord;
+
+	return 0;
+
+err:
+	devm_kfree(&cxlr->dev, coord);
+	return rc;
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -2959,6 +3051,8 @@ static int cxl_region_probe(struct device *dev)
 		goto out;
 	}
 
+	cxl_region_perf_data_calculate(cxlr);
+
 	/*
 	 * From this point on any path that changes the region's state away from
 	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 004534cf0361..265da412c5bd 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -529,6 +529,7 @@ struct cxl_region {
 	struct cxl_pmem_region *cxlr_pmem;
 	unsigned long flags;
 	struct cxl_region_params params;
+	struct access_coordinate *coord;
 };
 
 struct cxl_nvdimm_bridge {



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-07 23:31 [PATCH 0/3] cxl: Add support to report region access coordinates to numa nodes Dave Jiang
  2023-12-07 23:31 ` [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
@ 2023-12-07 23:31 ` Dave Jiang
  2023-12-11  9:06   ` Brice Goglin
                     ` (3 more replies)
  2023-12-07 23:32 ` [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region Dave Jiang
  2 siblings, 4 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-07 23:31 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
region. The bandwidth is the aggregated bandwidth of all devices that
contributes to the CXL region. The latency is the worst latency of the
device amongst all the devices that contributes to the CXL region.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
 drivers/cxl/core/region.c               |   24 +++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index fff2581b8033..e96f172eb6a6 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -552,3 +552,43 @@ Description:
 		attribute is only visible for devices supporting the
 		capability. The retrieved errors are logged as kernel
 		events when cxl_poison event tracing is enabled.
+
+
+What:		/sys/bus/cxl/devices/regionZ/read_bandwidth
+Date:		Apr, 2023
+KernelVersion:	v6.8
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The aggregated read bandwidth of the region. The number is
+		the accumulated read bandwidth of all CXL memory devices that
+		contributes to the region.
+
+
+What:		/sys/bus/cxl/devices/regionZ/write_bandwidth
+Date:		Apr, 2023
+KernelVersion:	v6.8
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The aggregated write bandwidth of the region. The number is
+		the accumulated write bandwidth of all CXL memory devices that
+		contributes to the region.
+
+
+What:		/sys/bus/cxl/devices/regionZ/read_latency
+Date:		Apr, 2023
+KernelVersion:	v6.8
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The read latency of the region. The number is
+		the worst read latency of all CXL memory devices that
+		contributes to the region.
+
+
+What:		/sys/bus/cxl/devices/regionZ/write_latency
+Date:		Apr, 2023
+KernelVersion:	v6.8
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The write latency of the region. The number is
+		the worst write latency of all CXL memory devices that
+		contributes to the region.
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index d879f5702cf2..72c47f624d63 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(size);
 
+#define ACCESS_ATTR(attrib)					\
+static ssize_t attrib##_show(struct device *dev,		\
+			   struct device_attribute *attr,	\
+			   char *buf)				\
+{								\
+	struct cxl_region *cxlr = to_cxl_region(dev);		\
+								\
+	if (!cxlr->coord)					\
+		return 0;					\
+								\
+	return sysfs_emit(buf, "%u\n",				\
+			  cxlr->coord->attrib);			\
+}								\
+static DEVICE_ATTR_RO(attrib)
+
+ACCESS_ATTR(read_bandwidth);
+ACCESS_ATTR(read_latency);
+ACCESS_ATTR(write_bandwidth);
+ACCESS_ATTR(write_latency);
+
 static struct attribute *cxl_region_attrs[] = {
 	&dev_attr_uuid.attr,
 	&dev_attr_commit.attr,
@@ -653,6 +673,10 @@ static struct attribute *cxl_region_attrs[] = {
 	&dev_attr_resource.attr,
 	&dev_attr_size.attr,
 	&dev_attr_mode.attr,
+	&dev_attr_read_bandwidth.attr,
+	&dev_attr_write_bandwidth.attr,
+	&dev_attr_read_latency.attr,
+	&dev_attr_write_latency.attr,
 	NULL,
 };
 



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region
  2023-12-07 23:31 [PATCH 0/3] cxl: Add support to report region access coordinates to numa nodes Dave Jiang
  2023-12-07 23:31 ` [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
  2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
@ 2023-12-07 23:32 ` Dave Jiang
  2023-12-08  3:35   ` Huang, Ying
  2023-12-12  0:30   ` Dan Williams
  2 siblings, 2 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-07 23:32 UTC (permalink / raw)
  To: linux-cxl
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, dan.j.williams, ira.weiny,
	vishal.l.verma, alison.schofield, jonathan.cameron, dave

When the CXL region is formed, the driver would computed the performance
data for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the performance
data to the node hmem_attrs to expose the newly calculated region
performance data. The CXL region is created under specific CFMWS. The
node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws().
The notifier will run once only and turn itself off after the initial
run. Additional regions may overwrite the initial data, but since this is
for the same poximity domain it's a don't care for now.

node_set_perf_attrs() is exported to allow update of perf attribs for a
node. Given that only CXL is using this, export only to CXL namespace.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/base/node.c       |    1 +
 drivers/cxl/core/region.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h         |    2 ++
 3 files changed, 47 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index cb2b6cc7f6e6..f5b5a3f11894 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord,
 		}
 	}
 }
+EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL);
 
 /**
  * struct node_cache_info - Internal tracking for memory node caches
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 72c47f624d63..3794e91e12b1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -4,6 +4,7 @@
 #include <linux/genalloc.h>
 #include <linux/device.h>
 #include <linux/module.h>
+#include <linux/memory.h>
 #include <linux/slab.h>
 #include <linux/uuid.h>
 #include <linux/sort.h>
@@ -2958,6 +2959,37 @@ static int is_system_ram(struct resource *res, void *arg)
 	return 1;
 }
 
+static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
+					  unsigned long action, void *arg)
+{
+	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
+					       memory_notifier);
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled = p->targets[0];
+	struct cxl_decoder *cxld = &cxled->cxld;
+	struct memory_notify *mnb = arg;
+	int nid = mnb->status_change_nid;
+	struct access_coordinate coord;
+	int region_nid;
+
+	if (nid == NUMA_NO_NODE || action != MEM_ONLINE || !cxlr->coord)
+		return NOTIFY_STOP;
+
+	region_nid = phys_to_target_node(cxld->hpa_range.start);
+	if (nid != region_nid)
+		return NOTIFY_STOP;
+
+	/* Adjust latencies from psec to nsec to be consistent with HMAT targets */
+	coord = *cxlr->coord;
+	coord.read_latency = DIV_ROUND_UP(coord.read_latency, 1000);
+	coord.write_latency = DIV_ROUND_UP(coord.write_latency, 1000);
+
+	node_set_perf_attrs(nid, &coord, 0);
+	node_set_perf_attrs(nid, &coord, 1);
+
+	return NOTIFY_STOP;
+}
+
 static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
 {
 	struct cxl_region_params *p = &cxlr->params;
@@ -3077,6 +3109,10 @@ static int cxl_region_probe(struct device *dev)
 
 	cxl_region_perf_data_calculate(cxlr);
 
+	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
+	cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI;
+	register_memory_notifier(&cxlr->memory_notifier);
+
 	/*
 	 * From this point on any path that changes the region's state away from
 	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
@@ -3108,9 +3144,17 @@ static int cxl_region_probe(struct device *dev)
 	}
 }
 
+static void cxl_region_remove(struct device *dev)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	unregister_memory_notifier(&cxlr->memory_notifier);
+}
+
 static struct cxl_driver cxl_region_driver = {
 	.name = "cxl_region",
 	.probe = cxl_region_probe,
+	.remove = cxl_region_remove,
 	.id = CXL_DEVICE_REGION,
 };
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 265da412c5bd..c326ee8956ec 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -6,6 +6,7 @@
 
 #include <linux/libnvdimm.h>
 #include <linux/bitfield.h>
+#include <linux/notifier.h>
 #include <linux/bitops.h>
 #include <linux/node.h>
 #include <linux/log2.h>
@@ -530,6 +531,7 @@ struct cxl_region {
 	unsigned long flags;
 	struct cxl_region_params params;
 	struct access_coordinate *coord;
+	struct notifier_block memory_notifier;
 };
 
 struct cxl_nvdimm_bridge {



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] cxl/region: Calculate performance data for a region
  2023-12-07 23:30 [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
@ 2023-12-07 23:34 ` Dave Jiang
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-07 23:34 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave



On 12/7/23 16:30, Dave Jiang wrote:
> Calculate and store the performance data for a CXL region. Find the worst
> read and write latency for all the included ranges from each of the devices
> that attributes to the region and designate that as the latency data. Sum
> all the read and write bandwidth data for each of the device region and
> that is the total bandwidth for the region.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Please ignore and see the series. This was an accident. 

> ---
>  drivers/cxl/core/region.c |   94 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |    1 
>  2 files changed, 95 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 56e575c79bb4..d879f5702cf2 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2934,6 +2934,98 @@ static int is_system_ram(struct resource *res, void *arg)
>  	return 1;
>  }
>  
> +static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_endpoint_decoder *cxled;
> +	unsigned int rd_bw = 0, rd_lat = 0;
> +	unsigned int wr_bw = 0, wr_lat = 0;
> +	struct access_coordinate *coord;
> +	struct list_head *perf_list;
> +	int rc = 0, i;
> +
> +	lockdep_assert_held(&cxl_region_rwsem);
> +
> +	/* No need to proceed if hmem attributes are already present */
> +	if (cxlr->coord)
> +		return 0;
> +
> +	coord = devm_kzalloc(&cxlr->dev, sizeof(*coord), GFP_KERNEL);
> +	if (!coord)
> +		return -ENOMEM;
> +
> +	cxled = p->targets[0];
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct range dpa = {
> +			.start = cxled->dpa_res->start,
> +			.end = cxled->dpa_res->end,
> +		};
> +		struct cxl_memdev_state *mds;
> +		struct perf_prop_entry *perf;
> +		struct cxl_dev_state *cxlds;
> +		struct cxl_memdev *cxlmd;
> +		bool found = false;
> +
> +		cxled = p->targets[i];
> +		cxlmd = cxled_to_memdev(cxled);
> +		cxlds = cxlmd->cxlds;
> +		mds = to_cxl_memdev_state(cxlds);
> +
> +		switch (cxlr->mode) {
> +		case CXL_DECODER_RAM:
> +			perf_list = &mds->ram_perf_list;
> +			break;
> +		case CXL_DECODER_PMEM:
> +			perf_list = &mds->pmem_perf_list;
> +			break;
> +		default:
> +			rc = -EINVAL;
> +			goto err;
> +		}
> +
> +		if (list_empty(perf_list)) {
> +			rc = -ENOENT;
> +			goto err;
> +		}
> +
> +		list_for_each_entry(perf, perf_list, list) {
> +			if (range_contains(&perf->dpa_range, &dpa)) {
> +				found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!found) {
> +			rc = -ENOENT;
> +			goto err;
> +		}
> +
> +		/* Get total bandwidth and the worst latency for the cxl region */
> +		rd_lat = max_t(unsigned int, rd_lat,
> +			       perf->coord.read_latency);
> +		rd_bw += perf->coord.read_bandwidth;
> +		wr_lat = max_t(unsigned int, wr_lat,
> +			       perf->coord.write_latency);
> +		wr_bw += perf->coord.write_bandwidth;
> +	}
> +
> +	*coord = (struct access_coordinate) {
> +		.read_latency = rd_lat,
> +		.read_bandwidth = rd_bw,
> +		.write_latency = wr_lat,
> +		.write_bandwidth = wr_bw,
> +	};
> +
> +	cxlr->coord = coord;
> +
> +	return 0;
> +
> +err:
> +	devm_kfree(&cxlr->dev, coord);
> +	return rc;
> +}
> +
>  static int cxl_region_probe(struct device *dev)
>  {
>  	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -2959,6 +3051,8 @@ static int cxl_region_probe(struct device *dev)
>  		goto out;
>  	}
>  
> +	cxl_region_perf_data_calculate(cxlr);
> +
>  	/*
>  	 * From this point on any path that changes the region's state away from
>  	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 004534cf0361..265da412c5bd 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -529,6 +529,7 @@ struct cxl_region {
>  	struct cxl_pmem_region *cxlr_pmem;
>  	unsigned long flags;
>  	struct cxl_region_params params;
> +	struct access_coordinate *coord;
>  };
>  
>  struct cxl_nvdimm_bridge {
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region
  2023-12-07 23:32 ` [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region Dave Jiang
@ 2023-12-08  3:35   ` Huang, Ying
  2023-12-12  0:30   ` Dan Williams
  1 sibling, 0 replies; 21+ messages in thread
From: Huang, Ying @ 2023-12-08  3:35 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, Greg Kroah-Hartman, Rafael J. Wysocki, dan.j.williams,
	ira.weiny, vishal.l.verma, alison.schofield, jonathan.cameron,
	dave

Dave Jiang <dave.jiang@intel.com> writes:

> When the CXL region is formed, the driver would computed the performance
> data for the region. However this data is not available at the node data
> collection that has been populated by the HMAT during kernel
> initialization. Add a memory hotplug notifier to update the performance
> data to the node hmem_attrs to expose the newly calculated region
> performance data. The CXL region is created under specific CFMWS. The
> node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws().
> The notifier will run once only and turn itself off after the initial
> run. Additional regions may overwrite the initial data, but since this is
> for the same poximity domain it's a don't care for now.
>
> node_set_perf_attrs() is exported to allow update of perf attribs for a
> node. Given that only CXL is using this, export only to CXL namespace.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/base/node.c       |    1 +
>  drivers/cxl/core/region.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |    2 ++
>  3 files changed, 47 insertions(+)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index cb2b6cc7f6e6..f5b5a3f11894 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord,
>  		}
>  	}
>  }
> +EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL);
>  
>  /**
>   * struct node_cache_info - Internal tracking for memory node caches
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 72c47f624d63..3794e91e12b1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -4,6 +4,7 @@
>  #include <linux/genalloc.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
> +#include <linux/memory.h>
>  #include <linux/slab.h>
>  #include <linux/uuid.h>
>  #include <linux/sort.h>
> @@ -2958,6 +2959,37 @@ static int is_system_ram(struct resource *res, void *arg)
>  	return 1;
>  }
>  
> +static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
> +					  unsigned long action, void *arg)
> +{
> +	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
> +					       memory_notifier);
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_endpoint_decoder *cxled = p->targets[0];
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct memory_notify *mnb = arg;
> +	int nid = mnb->status_change_nid;
> +	struct access_coordinate coord;
> +	int region_nid;
> +
> +	if (nid == NUMA_NO_NODE || action != MEM_ONLINE || !cxlr->coord)
> +		return NOTIFY_STOP;
> +
> +	region_nid = phys_to_target_node(cxld->hpa_range.start);
> +	if (nid != region_nid)
> +		return NOTIFY_STOP;
> +
> +	/* Adjust latencies from psec to nsec to be consistent with HMAT targets */
> +	coord = *cxlr->coord;
> +	coord.read_latency = DIV_ROUND_UP(coord.read_latency, 1000);
> +	coord.write_latency = DIV_ROUND_UP(coord.write_latency, 1000);
> +
> +	node_set_perf_attrs(nid, &coord, 0);
> +	node_set_perf_attrs(nid, &coord, 1);
> +
> +	return NOTIFY_STOP;
> +}

It unfortunate that we still need to add another callback for abstract
distance calculation.  Because the abstract distance needs to be
calculated before hot-add the node.  But, this provides all information
to do that, and we can add another callback for that.

The patch itself looks good to me.  Feel free to add,

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

>  static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
>  {
>  	struct cxl_region_params *p = &cxlr->params;
> @@ -3077,6 +3109,10 @@ static int cxl_region_probe(struct device *dev)
>  
>  	cxl_region_perf_data_calculate(cxlr);
>  
> +	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
> +	cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI;
> +	register_memory_notifier(&cxlr->memory_notifier);
> +
>  	/*
>  	 * From this point on any path that changes the region's state away from
>  	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
> @@ -3108,9 +3144,17 @@ static int cxl_region_probe(struct device *dev)
>  	}
>  }
>  
> +static void cxl_region_remove(struct device *dev)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +	unregister_memory_notifier(&cxlr->memory_notifier);
> +}
> +
>  static struct cxl_driver cxl_region_driver = {
>  	.name = "cxl_region",
>  	.probe = cxl_region_probe,
> +	.remove = cxl_region_remove,
>  	.id = CXL_DEVICE_REGION,
>  };
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 265da412c5bd..c326ee8956ec 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -6,6 +6,7 @@
>  
>  #include <linux/libnvdimm.h>
>  #include <linux/bitfield.h>
> +#include <linux/notifier.h>
>  #include <linux/bitops.h>
>  #include <linux/node.h>
>  #include <linux/log2.h>
> @@ -530,6 +531,7 @@ struct cxl_region {
>  	unsigned long flags;
>  	struct cxl_region_params params;
>  	struct access_coordinate *coord;
> +	struct notifier_block memory_notifier;
>  };
>  
>  struct cxl_nvdimm_bridge {

--
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
@ 2023-12-11  9:06   ` Brice Goglin
  2023-12-12 19:30     ` Dave Jiang
  2023-12-11 18:03   ` fan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Brice Goglin @ 2023-12-11  9:06 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Le 08/12/2023 à 00:31, Dave Jiang a écrit :
> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
> region. The bandwidth is the aggregated bandwidth of all devices that
> contributes to the CXL region. The latency is the worst latency of the
> device amongst all the devices that contributes to the CXL region.


Hello

Which initiator do these bandwidths/latencies refer to? Local CPUs near 
the root port? This should be specified in the doc.

Brice



>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>   Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
>   drivers/cxl/core/region.c               |   24 +++++++++++++++++++
>   2 files changed, 64 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index fff2581b8033..e96f172eb6a6 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -552,3 +552,43 @@ Description:
>   		attribute is only visible for devices supporting the
>   		capability. The retrieved errors are logged as kernel
>   		events when cxl_poison event tracing is enabled.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/read_bandwidth
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The aggregated read bandwidth of the region. The number is
> +		the accumulated read bandwidth of all CXL memory devices that
> +		contributes to the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/write_bandwidth
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The aggregated write bandwidth of the region. The number is
> +		the accumulated write bandwidth of all CXL memory devices that
> +		contributes to the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/read_latency
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The read latency of the region. The number is
> +		the worst read latency of all CXL memory devices that
> +		contributes to the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/write_latency
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The write latency of the region. The number is
> +		the worst write latency of all CXL memory devices that
> +		contributes to the region.
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d879f5702cf2..72c47f624d63 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>   }
>   static DEVICE_ATTR_RW(size);
>   
> +#define ACCESS_ATTR(attrib)					\
> +static ssize_t attrib##_show(struct device *dev,		\
> +			   struct device_attribute *attr,	\
> +			   char *buf)				\
> +{								\
> +	struct cxl_region *cxlr = to_cxl_region(dev);		\
> +								\
> +	if (!cxlr->coord)					\
> +		return 0;					\
> +								\
> +	return sysfs_emit(buf, "%u\n",				\
> +			  cxlr->coord->attrib);			\
> +}								\
> +static DEVICE_ATTR_RO(attrib)
> +
> +ACCESS_ATTR(read_bandwidth);
> +ACCESS_ATTR(read_latency);
> +ACCESS_ATTR(write_bandwidth);
> +ACCESS_ATTR(write_latency);
> +
>   static struct attribute *cxl_region_attrs[] = {
>   	&dev_attr_uuid.attr,
>   	&dev_attr_commit.attr,
> @@ -653,6 +673,10 @@ static struct attribute *cxl_region_attrs[] = {
>   	&dev_attr_resource.attr,
>   	&dev_attr_size.attr,
>   	&dev_attr_mode.attr,
> +	&dev_attr_read_bandwidth.attr,
> +	&dev_attr_write_bandwidth.attr,
> +	&dev_attr_read_latency.attr,
> +	&dev_attr_write_latency.attr,
>   	NULL,
>   };
>   
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] cxl/region: Calculate performance data for a region
  2023-12-07 23:31 ` [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
@ 2023-12-11 17:44   ` fan
  2023-12-12  0:19   ` Dan Williams
  1 sibling, 0 replies; 21+ messages in thread
From: fan @ 2023-12-11 17:44 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma,
	alison.schofield, jonathan.cameron, dave

On Thu, Dec 07, 2023 at 04:31:49PM -0700, Dave Jiang wrote:
> Calculate and store the performance data for a CXL region. Find the worst
> read and write latency for all the included ranges from each of the devices
> that attributes to the region and designate that as the latency data. Sum
> all the read and write bandwidth data for each of the device region and
> that is the total bandwidth for the region.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/cxl/core/region.c |   94 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |    1 
>  2 files changed, 95 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 56e575c79bb4..d879f5702cf2 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2934,6 +2934,98 @@ static int is_system_ram(struct resource *res, void *arg)
>  	return 1;
>  }
>  
> +static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_endpoint_decoder *cxled;
> +	unsigned int rd_bw = 0, rd_lat = 0;
> +	unsigned int wr_bw = 0, wr_lat = 0;
> +	struct access_coordinate *coord;
> +	struct list_head *perf_list;
> +	int rc = 0, i;
> +
> +	lockdep_assert_held(&cxl_region_rwsem);
> +
> +	/* No need to proceed if hmem attributes are already present */
> +	if (cxlr->coord)
> +		return 0;
> +
> +	coord = devm_kzalloc(&cxlr->dev, sizeof(*coord), GFP_KERNEL);
> +	if (!coord)
> +		return -ENOMEM;
> +
> +	cxled = p->targets[0];

cxled is only used in the for loop below, maybe we can move it into the loop.

Fan

> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct range dpa = {
> +			.start = cxled->dpa_res->start,
> +			.end = cxled->dpa_res->end,
> +		};
> +		struct cxl_memdev_state *mds;
> +		struct perf_prop_entry *perf;
> +		struct cxl_dev_state *cxlds;
> +		struct cxl_memdev *cxlmd;
> +		bool found = false;
> +
> +		cxled = p->targets[i];
> +		cxlmd = cxled_to_memdev(cxled);
> +		cxlds = cxlmd->cxlds;
> +		mds = to_cxl_memdev_state(cxlds);
> +
> +		switch (cxlr->mode) {
> +		case CXL_DECODER_RAM:
> +			perf_list = &mds->ram_perf_list;
> +			break;
> +		case CXL_DECODER_PMEM:
> +			perf_list = &mds->pmem_perf_list;
> +			break;
> +		default:
> +			rc = -EINVAL;
> +			goto err;
> +		}
> +
> +		if (list_empty(perf_list)) {
> +			rc = -ENOENT;
> +			goto err;
> +		}
> +
> +		list_for_each_entry(perf, perf_list, list) {
> +			if (range_contains(&perf->dpa_range, &dpa)) {
> +				found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!found) {
> +			rc = -ENOENT;
> +			goto err;
> +		}
> +
> +		/* Get total bandwidth and the worst latency for the cxl region */
> +		rd_lat = max_t(unsigned int, rd_lat,
> +			       perf->coord.read_latency);
> +		rd_bw += perf->coord.read_bandwidth;
> +		wr_lat = max_t(unsigned int, wr_lat,
> +			       perf->coord.write_latency);
> +		wr_bw += perf->coord.write_bandwidth;
> +	}
> +
> +	*coord = (struct access_coordinate) {
> +		.read_latency = rd_lat,
> +		.read_bandwidth = rd_bw,
> +		.write_latency = wr_lat,
> +		.write_bandwidth = wr_bw,
> +	};
> +
> +	cxlr->coord = coord;
> +
> +	return 0;
> +
> +err:
> +	devm_kfree(&cxlr->dev, coord);
> +	return rc;
> +}
> +
>  static int cxl_region_probe(struct device *dev)
>  {
>  	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -2959,6 +3051,8 @@ static int cxl_region_probe(struct device *dev)
>  		goto out;
>  	}
>  
> +	cxl_region_perf_data_calculate(cxlr);
> +
>  	/*
>  	 * From this point on any path that changes the region's state away from
>  	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 004534cf0361..265da412c5bd 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -529,6 +529,7 @@ struct cxl_region {
>  	struct cxl_pmem_region *cxlr_pmem;
>  	unsigned long flags;
>  	struct cxl_region_params params;
> +	struct access_coordinate *coord;
>  };
>  
>  struct cxl_nvdimm_bridge {
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
  2023-12-11  9:06   ` Brice Goglin
@ 2023-12-11 18:03   ` fan
  2023-12-11 18:13     ` Dave Jiang
  2023-12-12  0:23   ` Dan Williams
  2023-12-12 13:46   ` Brice Goglin
  3 siblings, 1 reply; 21+ messages in thread
From: fan @ 2023-12-11 18:03 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma,
	alison.schofield, jonathan.cameron, dave

On Thu, Dec 07, 2023 at 04:31:56PM -0700, Dave Jiang wrote:
> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
> region. The bandwidth is the aggregated bandwidth of all devices that
> contributes to the CXL region. The latency is the worst latency of the
> device amongst all the devices that contributes to the CXL region.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c               |   24 +++++++++++++++++++
>  2 files changed, 64 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index fff2581b8033..e96f172eb6a6 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -552,3 +552,43 @@ Description:
>  		attribute is only visible for devices supporting the
>  		capability. The retrieved errors are logged as kernel
>  		events when cxl_poison event tracing is enabled.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/read_bandwidth
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The aggregated read bandwidth of the region. The number is
> +		the accumulated read bandwidth of all CXL memory devices that
> +		contributes to the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/write_bandwidth
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The aggregated write bandwidth of the region. The number is
> +		the accumulated write bandwidth of all CXL memory devices that
> +		contributes to the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/read_latency
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The read latency of the region. The number is
> +		the worst read latency of all CXL memory devices that
> +		contributes to the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/write_latency
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The write latency of the region. The number is
> +		the worst write latency of all CXL memory devices that
> +		contributes to the region.
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d879f5702cf2..72c47f624d63 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RW(size);
>  
> +#define ACCESS_ATTR(attrib)					\
> +static ssize_t attrib##_show(struct device *dev,		\
> +			   struct device_attribute *attr,	\
> +			   char *buf)				\
> +{								\
> +	struct cxl_region *cxlr = to_cxl_region(dev);		\
> +								\
> +	if (!cxlr->coord)					\
> +		return 0;					\
> +								\
> +	return sysfs_emit(buf, "%u\n",				\
> +			  cxlr->coord->attrib);			\
> +}								\
> +static DEVICE_ATTR_RO(attrib)
> +
> +ACCESS_ATTR(read_bandwidth);
> +ACCESS_ATTR(read_latency);
> +ACCESS_ATTR(write_bandwidth);
> +ACCESS_ATTR(write_latency);
> +
>  static struct attribute *cxl_region_attrs[] = {
>  	&dev_attr_uuid.attr,
>  	&dev_attr_commit.attr,
> @@ -653,6 +673,10 @@ static struct attribute *cxl_region_attrs[] = {
>  	&dev_attr_resource.attr,
>  	&dev_attr_size.attr,
>  	&dev_attr_mode.attr,
> +	&dev_attr_read_bandwidth.attr,
> +	&dev_attr_write_bandwidth.attr,
> +	&dev_attr_read_latency.attr,
> +	&dev_attr_write_latency.attr,
>  	NULL,
>  };
>  
This way latency and bandwidth ABI are defined seems not to be consistent
with what we have now.

For example, for other attributes like interleave_ways, we only define
one attribute "interleave_ways", and use two separate functions
interleave_ways_show and interleave_ways_store for read/write operation.
for ABI interface, only one is provided
"/sys/bus/cxl/devices/regionZ/interleave_granularity"

for latency and bandwidth, should we provide two interfaces like below
instead?
/sys/bus/cxl/devices/regionZ/bandwidth --rw
/sys/bus/cxl/devices/regionZ/latency --rw

Fan
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-11 18:03   ` fan
@ 2023-12-11 18:13     ` Dave Jiang
  2023-12-11 18:27       ` fan
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Jiang @ 2023-12-11 18:13 UTC (permalink / raw)
  To: fan
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma,
	alison.schofield, jonathan.cameron, dave



On 12/11/23 11:03, fan wrote:
> On Thu, Dec 07, 2023 at 04:31:56PM -0700, Dave Jiang wrote:
>> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
>> region. The bandwidth is the aggregated bandwidth of all devices that
>> contributes to the CXL region. The latency is the worst latency of the
>> device amongst all the devices that contributes to the CXL region.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>  Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
>>  drivers/cxl/core/region.c               |   24 +++++++++++++++++++
>>  2 files changed, 64 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
>> index fff2581b8033..e96f172eb6a6 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-cxl
>> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
>> @@ -552,3 +552,43 @@ Description:
>>  		attribute is only visible for devices supporting the
>>  		capability. The retrieved errors are logged as kernel
>>  		events when cxl_poison event tracing is enabled.
>> +
>> +
>> +What:		/sys/bus/cxl/devices/regionZ/read_bandwidth
>> +Date:		Apr, 2023
>> +KernelVersion:	v6.8
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(RO) The aggregated read bandwidth of the region. The number is
>> +		the accumulated read bandwidth of all CXL memory devices that
>> +		contributes to the region.
>> +
>> +
>> +What:		/sys/bus/cxl/devices/regionZ/write_bandwidth
>> +Date:		Apr, 2023
>> +KernelVersion:	v6.8
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(RO) The aggregated write bandwidth of the region. The number is
>> +		the accumulated write bandwidth of all CXL memory devices that
>> +		contributes to the region.
>> +
>> +
>> +What:		/sys/bus/cxl/devices/regionZ/read_latency
>> +Date:		Apr, 2023
>> +KernelVersion:	v6.8
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(RO) The read latency of the region. The number is
>> +		the worst read latency of all CXL memory devices that
>> +		contributes to the region.
>> +
>> +
>> +What:		/sys/bus/cxl/devices/regionZ/write_latency
>> +Date:		Apr, 2023
>> +KernelVersion:	v6.8
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(RO) The write latency of the region. The number is
>> +		the worst write latency of all CXL memory devices that
>> +		contributes to the region.
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index d879f5702cf2..72c47f624d63 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>>  }
>>  static DEVICE_ATTR_RW(size);
>>  
>> +#define ACCESS_ATTR(attrib)					\
>> +static ssize_t attrib##_show(struct device *dev,		\
>> +			   struct device_attribute *attr,	\
>> +			   char *buf)				\
>> +{								\
>> +	struct cxl_region *cxlr = to_cxl_region(dev);		\
>> +								\
>> +	if (!cxlr->coord)					\
>> +		return 0;					\
>> +								\
>> +	return sysfs_emit(buf, "%u\n",				\
>> +			  cxlr->coord->attrib);			\
>> +}								\
>> +static DEVICE_ATTR_RO(attrib)
>> +
>> +ACCESS_ATTR(read_bandwidth);
>> +ACCESS_ATTR(read_latency);
>> +ACCESS_ATTR(write_bandwidth);
>> +ACCESS_ATTR(write_latency);
>> +
>>  static struct attribute *cxl_region_attrs[] = {
>>  	&dev_attr_uuid.attr,
>>  	&dev_attr_commit.attr,
>> @@ -653,6 +673,10 @@ static struct attribute *cxl_region_attrs[] = {
>>  	&dev_attr_resource.attr,
>>  	&dev_attr_size.attr,
>>  	&dev_attr_mode.attr,
>> +	&dev_attr_read_bandwidth.attr,
>> +	&dev_attr_write_bandwidth.attr,
>> +	&dev_attr_read_latency.attr,
>> +	&dev_attr_write_latency.attr,
>>  	NULL,
>>  };
>>  
> This way latency and bandwidth ABI are defined seems not to be consistent
> with what we have now.
> 
> For example, for other attributes like interleave_ways, we only define
> one attribute "interleave_ways", and use two separate functions
> interleave_ways_show and interleave_ways_store for read/write operation.
> for ABI interface, only one is provided
> "/sys/bus/cxl/devices/regionZ/interleave_granularity"
> 
> for latency and bandwidth, should we provide two interfaces like below
> instead?
> /sys/bus/cxl/devices/regionZ/bandwidth --rw
> /sys/bus/cxl/devices/regionZ/latency --rw

Hi Fan. I think there's a misunderstanding here. This is not reading and writing of bandwidth and latency. This is read latency and write latency and read bandwidth and write bandwidth. They are separate and unique properties of the path. i.e. upstream and downstream direction traffic performances. 


> 
> Fan
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-11 18:13     ` Dave Jiang
@ 2023-12-11 18:27       ` fan
  0 siblings, 0 replies; 21+ messages in thread
From: fan @ 2023-12-11 18:27 UTC (permalink / raw)
  To: Dave Jiang
  Cc: fan, linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma,
	alison.schofield, jonathan.cameron, dave

On Mon, Dec 11, 2023 at 11:13:42AM -0700, Dave Jiang wrote:
> 
> 
> On 12/11/23 11:03, fan wrote:
> > On Thu, Dec 07, 2023 at 04:31:56PM -0700, Dave Jiang wrote:
> >> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
> >> region. The bandwidth is the aggregated bandwidth of all devices that
> >> contributes to the CXL region. The latency is the worst latency of the
s/contributes/contribute/
> >> device amongst all the devices that contributes to the CXL region.
s/contributes/contribute/
> >> device amongst all the devices that contributes to the CXL region.
> >>
> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> >> ---
> >>  Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
> >>  drivers/cxl/core/region.c               |   24 +++++++++++++++++++
> >>  2 files changed, 64 insertions(+)
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> >> index fff2581b8033..e96f172eb6a6 100644
> >> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> >> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> >> @@ -552,3 +552,43 @@ Description:
> >>  		attribute is only visible for devices supporting the
> >>  		capability. The retrieved errors are logged as kernel
> >>  		events when cxl_poison event tracing is enabled.
> >> +
> >> +
> >> +What:		/sys/bus/cxl/devices/regionZ/read_bandwidth
> >> +Date:		Apr, 2023
> >> +KernelVersion:	v6.8
> >> +Contact:	linux-cxl@vger.kernel.org
> >> +Description:
> >> +		(RO) The aggregated read bandwidth of the region. The number is
> >> +		the accumulated read bandwidth of all CXL memory devices that
> >> +		contributes to the region.
> >> +
> >> +
> >> +What:		/sys/bus/cxl/devices/regionZ/write_bandwidth
> >> +Date:		Apr, 2023
> >> +KernelVersion:	v6.8
> >> +Contact:	linux-cxl@vger.kernel.org
> >> +Description:
> >> +		(RO) The aggregated write bandwidth of the region. The number is
> >> +		the accumulated write bandwidth of all CXL memory devices that
> >> +		contributes to the region.
> >> +
> >> +
> >> +What:		/sys/bus/cxl/devices/regionZ/read_latency
> >> +Date:		Apr, 2023
> >> +KernelVersion:	v6.8
> >> +Contact:	linux-cxl@vger.kernel.org
> >> +Description:
> >> +		(RO) The read latency of the region. The number is
> >> +		the worst read latency of all CXL memory devices that
> >> +		contributes to the region.
> >> +
> >> +
> >> +What:		/sys/bus/cxl/devices/regionZ/write_latency
> >> +Date:		Apr, 2023
> >> +KernelVersion:	v6.8
> >> +Contact:	linux-cxl@vger.kernel.org
> >> +Description:
> >> +		(RO) The write latency of the region. The number is
> >> +		the worst write latency of all CXL memory devices that
> >> +		contributes to the region.
> >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> >> index d879f5702cf2..72c47f624d63 100644
> >> --- a/drivers/cxl/core/region.c
> >> +++ b/drivers/cxl/core/region.c
> >> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> >>  }
> >>  static DEVICE_ATTR_RW(size);
> >>  
> >> +#define ACCESS_ATTR(attrib)					\
> >> +static ssize_t attrib##_show(struct device *dev,		\
> >> +			   struct device_attribute *attr,	\
> >> +			   char *buf)				\
> >> +{								\
> >> +	struct cxl_region *cxlr = to_cxl_region(dev);		\
> >> +								\
> >> +	if (!cxlr->coord)					\
> >> +		return 0;					\
> >> +								\
> >> +	return sysfs_emit(buf, "%u\n",				\
> >> +			  cxlr->coord->attrib);			\
> >> +}								\
> >> +static DEVICE_ATTR_RO(attrib)
> >> +
> >> +ACCESS_ATTR(read_bandwidth);
> >> +ACCESS_ATTR(read_latency);
> >> +ACCESS_ATTR(write_bandwidth);
> >> +ACCESS_ATTR(write_latency);
> >> +
> >>  static struct attribute *cxl_region_attrs[] = {
> >>  	&dev_attr_uuid.attr,
> >>  	&dev_attr_commit.attr,
> >> @@ -653,6 +673,10 @@ static struct attribute *cxl_region_attrs[] = {
> >>  	&dev_attr_resource.attr,
> >>  	&dev_attr_size.attr,
> >>  	&dev_attr_mode.attr,
> >> +	&dev_attr_read_bandwidth.attr,
> >> +	&dev_attr_write_bandwidth.attr,
> >> +	&dev_attr_read_latency.attr,
> >> +	&dev_attr_write_latency.attr,
> >>  	NULL,
> >>  };
> >>  
> > This way latency and bandwidth ABI are defined seems not to be consistent
> > with what we have now.
> > 
> > For example, for other attributes like interleave_ways, we only define
> > one attribute "interleave_ways", and use two separate functions
> > interleave_ways_show and interleave_ways_store for read/write operation.
> > for ABI interface, only one is provided
> > "/sys/bus/cxl/devices/regionZ/interleave_granularity"
> > 
> > for latency and bandwidth, should we provide two interfaces like below
> > instead?
> > /sys/bus/cxl/devices/regionZ/bandwidth --rw
> > /sys/bus/cxl/devices/regionZ/latency --rw
> 
> Hi Fan. I think there's a misunderstanding here. This is not reading and writing of bandwidth and latency. This is read latency and write latency and read bandwidth and write bandwidth. They are separate and unique properties of the path. i.e. upstream and downstream direction traffic performances. 
> 
Oh. I totally misunderstood, thanks for clarification. Then the patch
looks good to me expect for a minor typo in the commit comments as shown
above.

Fan
> 
> > 
> > Fan
> >>
> >>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 1/3] cxl/region: Calculate performance data for a region
  2023-12-07 23:31 ` [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
  2023-12-11 17:44   ` fan
@ 2023-12-12  0:19   ` Dan Williams
  1 sibling, 0 replies; 21+ messages in thread
From: Dan Williams @ 2023-12-12  0:19 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Dave Jiang wrote:
> Calculate and store the performance data for a CXL region. Find the worst
> read and write latency for all the included ranges from each of the devices
> that attributes to the region and designate that as the latency data. Sum
> all the read and write bandwidth data for each of the device region and
> that is the total bandwidth for the region.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/cxl/core/region.c |   94 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |    1 
>  2 files changed, 95 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 56e575c79bb4..d879f5702cf2 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2934,6 +2934,98 @@ static int is_system_ram(struct resource *res, void *arg)
>  	return 1;
>  }
>  
> +static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_endpoint_decoder *cxled;
> +	unsigned int rd_bw = 0, rd_lat = 0;
> +	unsigned int wr_bw = 0, wr_lat = 0;
> +	struct access_coordinate *coord;
> +	struct list_head *perf_list;
> +	int rc = 0, i;
> +
> +	lockdep_assert_held(&cxl_region_rwsem);
> +
> +	/* No need to proceed if hmem attributes are already present */
> +	if (cxlr->coord)
> +		return 0;
> +
> +	coord = devm_kzalloc(&cxlr->dev, sizeof(*coord), GFP_KERNEL);
> +	if (!coord)
> +		return -ENOMEM;

Why does this need to be dynamically allocated? It's only a few fields
that all regions will likely have, just include a 'struct
access_coordinate' instance in 'struct cxl_region' and check if the
values are non-zero (memcmp()) to see if it is initialized.

Saves a devm_free() error case.

> +
> +	cxled = p->targets[0];
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct range dpa = {
> +			.start = cxled->dpa_res->start,
> +			.end = cxled->dpa_res->end,
> +		};
> +		struct cxl_memdev_state *mds;
> +		struct perf_prop_entry *perf;
> +		struct cxl_dev_state *cxlds;
> +		struct cxl_memdev *cxlmd;
> +		bool found = false;
> +
> +		cxled = p->targets[i];
> +		cxlmd = cxled_to_memdev(cxled);
> +		cxlds = cxlmd->cxlds;
> +		mds = to_cxl_memdev_state(cxlds);
> +
> +		switch (cxlr->mode) {
> +		case CXL_DECODER_RAM:
> +			perf_list = &mds->ram_perf_list;
> +			break;
> +		case CXL_DECODER_PMEM:
> +			perf_list = &mds->pmem_perf_list;
> +			break;
> +		default:
> +			rc = -EINVAL;
> +			goto err;
> +		}
> +
> +		if (list_empty(perf_list)) {
> +			rc = -ENOENT;
> +			goto err;
> +		}

No need for this check since list_for_each_entry() will already be a nop
and not set @found.

> +
> +		list_for_each_entry(perf, perf_list, list) {

This looks like a potential race / problem as region-sysfs and
auto-discovered regions are allowed to start running before
cxl_qos_class_verify() runs. cxl_qos_class_verify() needs to be
done with its work before any region might see it. It needs to be the
case that these unlocked walks of memdev lists are ok because they are
known to be stable for the lifetime of any region that might read them.

Somewhat glad that we never went back and added support for "immediate"
(CXL_SET_PARTITION_IMMEDIATE_FLAG) DPA partition changes, that would
make all of this that much more complicated.

> +			if (range_contains(&perf->dpa_range, &dpa)) {
> +				found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!found) {
> +			rc = -ENOENT;
> +			goto err;
> +		}
> +
> +		/* Get total bandwidth and the worst latency for the cxl region */
> +		rd_lat = max_t(unsigned int, rd_lat,
> +			       perf->coord.read_latency);
> +		rd_bw += perf->coord.read_bandwidth;
> +		wr_lat = max_t(unsigned int, wr_lat,
> +			       perf->coord.write_latency);
> +		wr_bw += perf->coord.write_bandwidth;
> +	}
> +
> +	*coord = (struct access_coordinate) {
> +		.read_latency = rd_lat,
> +		.read_bandwidth = rd_bw,
> +		.write_latency = wr_lat,
> +		.write_bandwidth = wr_bw,
> +	};
> +
> +	cxlr->coord = coord;
> +
> +	return 0;
> +
> +err:
> +	devm_kfree(&cxlr->dev, coord);

Another reason I do not like open-coded devm_kfree() is because it is
not supported by cleanup.h and still needs "goto".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
  2023-12-11  9:06   ` Brice Goglin
  2023-12-11 18:03   ` fan
@ 2023-12-12  0:23   ` Dan Williams
  2023-12-12 13:46   ` Brice Goglin
  3 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2023-12-12  0:23 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Dave Jiang wrote:
> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
> region. The bandwidth is the aggregated bandwidth of all devices that
> contributes to the CXL region. The latency is the worst latency of the
> device amongst all the devices that contributes to the CXL region.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c               |   24 +++++++++++++++++++
>  2 files changed, 64 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index fff2581b8033..e96f172eb6a6 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -552,3 +552,43 @@ Description:
>  		attribute is only visible for devices supporting the
>  		capability. The retrieved errors are logged as kernel
>  		events when cxl_poison event tracing is enabled.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/read_bandwidth
> +Date:		Apr, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The aggregated read bandwidth of the region. The number is
> +		the accumulated read bandwidth of all CXL memory devices that
> +		contributes to the region.

Units?

...ditto for the rest. Ideally these descriptions match the units of the
same named fields in Documentation/ABI/stable/sysfs-devices-node for the
HMEM_REPORTING attributes. You might even link that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region
  2023-12-07 23:32 ` [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region Dave Jiang
  2023-12-08  3:35   ` Huang, Ying
@ 2023-12-12  0:30   ` Dan Williams
  1 sibling, 0 replies; 21+ messages in thread
From: Dan Williams @ 2023-12-12  0:30 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, dan.j.williams, ira.weiny,
	vishal.l.verma, alison.schofield, jonathan.cameron, dave

Dave Jiang wrote:
> When the CXL region is formed, the driver would computed the performance
> data for the region. However this data is not available at the node data
> collection that has been populated by the HMAT during kernel
> initialization. Add a memory hotplug notifier to update the performance
> data to the node hmem_attrs to expose the newly calculated region
> performance data. The CXL region is created under specific CFMWS. The
> node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws().
> The notifier will run once only and turn itself off after the initial
> run. Additional regions may overwrite the initial data, but since this is
> for the same poximity domain it's a don't care for now.
> 
> node_set_perf_attrs() is exported to allow update of perf attribs for a
> node. Given that only CXL is using this, export only to CXL namespace.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/base/node.c       |    1 +
>  drivers/cxl/core/region.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |    2 ++
>  3 files changed, 47 insertions(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index cb2b6cc7f6e6..f5b5a3f11894 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord,
>  		}
>  	}
>  }
> +EXPORT_SYMBOL_NS_GPL(node_set_perf_attrs, CXL);
>  
>  /**
>   * struct node_cache_info - Internal tracking for memory node caches
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 72c47f624d63..3794e91e12b1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -4,6 +4,7 @@
>  #include <linux/genalloc.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
> +#include <linux/memory.h>
>  #include <linux/slab.h>
>  #include <linux/uuid.h>
>  #include <linux/sort.h>
> @@ -2958,6 +2959,37 @@ static int is_system_ram(struct resource *res, void *arg)
>  	return 1;
>  }
>  
> +static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
> +					  unsigned long action, void *arg)
> +{
> +	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
> +					       memory_notifier);
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_endpoint_decoder *cxled = p->targets[0];
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct memory_notify *mnb = arg;
> +	int nid = mnb->status_change_nid;
> +	struct access_coordinate coord;
> +	int region_nid;
> +
> +	if (nid == NUMA_NO_NODE || action != MEM_ONLINE || !cxlr->coord)
> +		return NOTIFY_STOP;

Shouldn't this be NOTIFY_DONE? NOTIFY_STOP prevents other callbacks on
the same chain from getting notified.

> +
> +	region_nid = phys_to_target_node(cxld->hpa_range.start);
> +	if (nid != region_nid)
> +		return NOTIFY_STOP;

NOTIFY_DONE?

> +
> +	/* Adjust latencies from psec to nsec to be consistent with HMAT targets */
> +	coord = *cxlr->coord;
> +	coord.read_latency = DIV_ROUND_UP(coord.read_latency, 1000);
> +	coord.write_latency = DIV_ROUND_UP(coord.write_latency, 1000);
> +
> +	node_set_perf_attrs(nid, &coord, 0);
> +	node_set_perf_attrs(nid, &coord, 1);
> +
> +	return NOTIFY_STOP;

This should be NOTIFY_OK, right?

> +}
> +
>  static int cxl_region_perf_data_calculate(struct cxl_region *cxlr)
>  {
>  	struct cxl_region_params *p = &cxlr->params;
> @@ -3077,6 +3109,10 @@ static int cxl_region_probe(struct device *dev)
>  
>  	cxl_region_perf_data_calculate(cxlr);
>  
> +	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
> +	cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI;
> +	register_memory_notifier(&cxlr->memory_notifier);
> +
>  	/*
>  	 * From this point on any path that changes the region's state away from
>  	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
> @@ -3108,9 +3144,17 @@ static int cxl_region_probe(struct device *dev)
>  	}
>  }
>  
> +static void cxl_region_remove(struct device *dev)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +	unregister_memory_notifier(&cxlr->memory_notifier);
> +}
> +
>  static struct cxl_driver cxl_region_driver = {
>  	.name = "cxl_region",
>  	.probe = cxl_region_probe,
> +	.remove = cxl_region_remove,

Keep the region driver from needing a remove() callback by using a
devm_add_action_or_reset() after the notifier registration.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
                     ` (2 preceding siblings ...)
  2023-12-12  0:23   ` Dan Williams
@ 2023-12-12 13:46   ` Brice Goglin
  2023-12-12 16:00     ` Dave Jiang
  3 siblings, 1 reply; 21+ messages in thread
From: Brice Goglin @ 2023-12-12 13:46 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave

Le 08/12/2023 à 00:31, Dave Jiang a écrit :

> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d879f5702cf2..72c47f624d63 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>   }
>   static DEVICE_ATTR_RW(size);
>   
> +#define ACCESS_ATTR(attrib)					\
> +static ssize_t attrib##_show(struct device *dev,		\
> +			   struct device_attribute *attr,	\
> +			   char *buf)				\
> +{								\
> +	struct cxl_region *cxlr = to_cxl_region(dev);		\
> +								\
> +	if (!cxlr->coord)					\
> +		return 0;					\
> +								\
> +	return sysfs_emit(buf, "%u\n",				\
> +			  cxlr->coord->attrib);			\
> +}								\

Hello

Latencies ares off by a factor of 1000 here (I see 586/686 for r/w attributes
for NUMA nodes in Qemu but 1000x higher for region attributes). For NUMA node attributes,
you're dividing latencies by a factor a 1000 in cxl_region_perf_attrs_callback():
  
         /* Adjust latencies from psec to nsec to be consistent with HMAT targets */
         coord = *cxlr->coord;
         coord.read_latency = DIV_ROUND_UP(coord.read_latency, 1000);
         coord.write_latency = DIV_ROUND_UP(coord.write_latency, 1000);

For region attributes, I think you're missing the same?

Brice


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-12 13:46   ` Brice Goglin
@ 2023-12-12 16:00     ` Dave Jiang
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-12 16:00 UTC (permalink / raw)
  To: Brice Goglin, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave



On 12/12/23 06:46, Brice Goglin wrote:
> Le 08/12/2023 à 00:31, Dave Jiang a écrit :
> 
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index d879f5702cf2..72c47f624d63 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>>   }
>>   static DEVICE_ATTR_RW(size);
>>   +#define ACCESS_ATTR(attrib)                    \
>> +static ssize_t attrib##_show(struct device *dev,        \
>> +               struct device_attribute *attr,    \
>> +               char *buf)                \
>> +{                                \
>> +    struct cxl_region *cxlr = to_cxl_region(dev);        \
>> +                                \
>> +    if (!cxlr->coord)                    \
>> +        return 0;                    \
>> +                                \
>> +    return sysfs_emit(buf, "%u\n",                \
>> +              cxlr->coord->attrib);            \
>> +}                                \
> 
> Hello
> 
> Latencies ares off by a factor of 1000 here (I see 586/686 for r/w attributes
> for NUMA nodes in Qemu but 1000x higher for region attributes). For NUMA node attributes,
> you're dividing latencies by a factor a 1000 in cxl_region_perf_attrs_callback():
>  
>         /* Adjust latencies from psec to nsec to be consistent with HMAT targets */
>         coord = *cxlr->coord;
>         coord.read_latency = DIV_ROUND_UP(coord.read_latency, 1000);
>         coord.write_latency = DIV_ROUND_UP(coord.write_latency, 1000);
> 
> For region attributes, I think you're missing the same?

Yes.... I was keeping the original computed raw data in picosecond base. Maybe I should just convert it to nanoseconds here since that's what Linux uses to begin with just to be consistent everywhere. 
> 
> Brice
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-11  9:06   ` Brice Goglin
@ 2023-12-12 19:30     ` Dave Jiang
  2023-12-19 16:44       ` Jonathan Cameron
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Jiang @ 2023-12-12 19:30 UTC (permalink / raw)
  To: Brice Goglin, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	jonathan.cameron, dave



On 12/11/23 02:06, Brice Goglin wrote:
> Le 08/12/2023 à 00:31, Dave Jiang a écrit :
>> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
>> region. The bandwidth is the aggregated bandwidth of all devices that
>> contributes to the CXL region. The latency is the worst latency of the
>> device amongst all the devices that contributes to the CXL region.
> 
> 
> Hello
> 
> Which initiator do these bandwidths/latencies refer to? Local CPUs near the root port? This should be specified in the doc.

Currently I'm only storing the numbers from initiator pxm 0 to these targets in the hmat target list. I'm open to suggestions as to if a different way would be better.

> 
> Brice
> 
> 
> 
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>   Documentation/ABI/testing/sysfs-bus-cxl |   40 +++++++++++++++++++++++++++++++
>>   drivers/cxl/core/region.c               |   24 +++++++++++++++++++
>>   2 files changed, 64 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
>> index fff2581b8033..e96f172eb6a6 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-cxl
>> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
>> @@ -552,3 +552,43 @@ Description:
>>           attribute is only visible for devices supporting the
>>           capability. The retrieved errors are logged as kernel
>>           events when cxl_poison event tracing is enabled.
>> +
>> +
>> +What:        /sys/bus/cxl/devices/regionZ/read_bandwidth
>> +Date:        Apr, 2023
>> +KernelVersion:    v6.8
>> +Contact:    linux-cxl@vger.kernel.org
>> +Description:
>> +        (RO) The aggregated read bandwidth of the region. The number is
>> +        the accumulated read bandwidth of all CXL memory devices that
>> +        contributes to the region.
>> +
>> +
>> +What:        /sys/bus/cxl/devices/regionZ/write_bandwidth
>> +Date:        Apr, 2023
>> +KernelVersion:    v6.8
>> +Contact:    linux-cxl@vger.kernel.org
>> +Description:
>> +        (RO) The aggregated write bandwidth of the region. The number is
>> +        the accumulated write bandwidth of all CXL memory devices that
>> +        contributes to the region.
>> +
>> +
>> +What:        /sys/bus/cxl/devices/regionZ/read_latency
>> +Date:        Apr, 2023
>> +KernelVersion:    v6.8
>> +Contact:    linux-cxl@vger.kernel.org
>> +Description:
>> +        (RO) The read latency of the region. The number is
>> +        the worst read latency of all CXL memory devices that
>> +        contributes to the region.
>> +
>> +
>> +What:        /sys/bus/cxl/devices/regionZ/write_latency
>> +Date:        Apr, 2023
>> +KernelVersion:    v6.8
>> +Contact:    linux-cxl@vger.kernel.org
>> +Description:
>> +        (RO) The write latency of the region. The number is
>> +        the worst write latency of all CXL memory devices that
>> +        contributes to the region.
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index d879f5702cf2..72c47f624d63 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -645,6 +645,26 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>>   }
>>   static DEVICE_ATTR_RW(size);
>>   +#define ACCESS_ATTR(attrib)                    \
>> +static ssize_t attrib##_show(struct device *dev,        \
>> +               struct device_attribute *attr,    \
>> +               char *buf)                \
>> +{                                \
>> +    struct cxl_region *cxlr = to_cxl_region(dev);        \
>> +                                \
>> +    if (!cxlr->coord)                    \
>> +        return 0;                    \
>> +                                \
>> +    return sysfs_emit(buf, "%u\n",                \
>> +              cxlr->coord->attrib);            \
>> +}                                \
>> +static DEVICE_ATTR_RO(attrib)
>> +
>> +ACCESS_ATTR(read_bandwidth);
>> +ACCESS_ATTR(read_latency);
>> +ACCESS_ATTR(write_bandwidth);
>> +ACCESS_ATTR(write_latency);
>> +
>>   static struct attribute *cxl_region_attrs[] = {
>>       &dev_attr_uuid.attr,
>>       &dev_attr_commit.attr,
>> @@ -653,6 +673,10 @@ static struct attribute *cxl_region_attrs[] = {
>>       &dev_attr_resource.attr,
>>       &dev_attr_size.attr,
>>       &dev_attr_mode.attr,
>> +    &dev_attr_read_bandwidth.attr,
>> +    &dev_attr_write_bandwidth.attr,
>> +    &dev_attr_read_latency.attr,
>> +    &dev_attr_write_latency.attr,
>>       NULL,
>>   };
>>  
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-12 19:30     ` Dave Jiang
@ 2023-12-19 16:44       ` Jonathan Cameron
  2023-12-20 20:26         ` Dave Jiang
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2023-12-19 16:44 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Brice Goglin, linux-cxl, dan.j.williams, ira.weiny,
	vishal.l.verma, alison.schofield, dave

On Tue, 12 Dec 2023 12:30:47 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 12/11/23 02:06, Brice Goglin wrote:
> > Le 08/12/2023 à 00:31, Dave Jiang a écrit :  
> >> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
> >> region. The bandwidth is the aggregated bandwidth of all devices that
> >> contributes to the CXL region. The latency is the worst latency of the
> >> device amongst all the devices that contributes to the CXL region.  
> > 
> > 
> > Hello
> > 
> > Which initiator do these bandwidths/latencies refer to? Local CPUs near the root port? This should be specified in the doc.  
> 
> Currently I'm only storing the numbers from initiator pxm 0 to these targets in the hmat target list. I'm open to suggestions as to if a different way would be better.
Nearest node with CPUs would be better I think.

Jonathan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-19 16:44       ` Jonathan Cameron
@ 2023-12-20 20:26         ` Dave Jiang
  2023-12-21 18:23           ` Dave Jiang
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Jiang @ 2023-12-20 20:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Brice Goglin, linux-cxl, dan.j.williams, ira.weiny,
	vishal.l.verma, alison.schofield, dave



On 12/19/23 09:44, Jonathan Cameron wrote:
> On Tue, 12 Dec 2023 12:30:47 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> On 12/11/23 02:06, Brice Goglin wrote:
>>> Le 08/12/2023 à 00:31, Dave Jiang a écrit :  
>>>> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
>>>> region. The bandwidth is the aggregated bandwidth of all devices that
>>>> contributes to the CXL region. The latency is the worst latency of the
>>>> device amongst all the devices that contributes to the CXL region.  
>>>
>>>
>>> Hello
>>>
>>> Which initiator do these bandwidths/latencies refer to? Local CPUs near the root port? This should be specified in the doc.  
>>
>> Currently I'm only storing the numbers from initiator pxm 0 to these targets in the hmat target list. I'm open to suggestions as to if a different way would be better.
> Nearest node with CPUs would be better I think.

Is it possible to discover that? The SRAT MPDAS structure indicates the association of initiator and memory target domains. But it does not seem to make such associations for generic targets.

> 
> Jonathan
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions
  2023-12-20 20:26         ` Dave Jiang
@ 2023-12-21 18:23           ` Dave Jiang
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2023-12-21 18:23 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Brice Goglin, linux-cxl, dan.j.williams, ira.weiny,
	vishal.l.verma, alison.schofield, dave



On 12/20/23 13:26, Dave Jiang wrote:
> 
> 
> On 12/19/23 09:44, Jonathan Cameron wrote:
>> On Tue, 12 Dec 2023 12:30:47 -0700
>> Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>> On 12/11/23 02:06, Brice Goglin wrote:
>>>> Le 08/12/2023 à 00:31, Dave Jiang a écrit :  
>>>>> Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
>>>>> region. The bandwidth is the aggregated bandwidth of all devices that
>>>>> contributes to the CXL region. The latency is the worst latency of the
>>>>> device amongst all the devices that contributes to the CXL region.  
>>>>
>>>>
>>>> Hello
>>>>
>>>> Which initiator do these bandwidths/latencies refer to? Local CPUs near the root port? This should be specified in the doc.  
>>>
>>> Currently I'm only storing the numbers from initiator pxm 0 to these targets in the hmat target list. I'm open to suggestions as to if a different way would be better.
>> Nearest node with CPUs would be better I think.
> 
> Is it possible to discover that? The SRAT MPDAS structure indicates the association of initiator and memory target domains. But it does not seem to make such associations for generic targets.

Answering my own question, looks like we need to go through hmat_update_target_attrs() and retrieve the best perf numbers via that function. Will make some mods to do that and post v15. This seems to be the way linux HMAT handling code is doing for normal initiator/target setup. 
> 
>>
>> Jonathan
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-12-21 18:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-07 23:31 [PATCH 0/3] cxl: Add support to report region access coordinates to numa nodes Dave Jiang
2023-12-07 23:31 ` [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
2023-12-11 17:44   ` fan
2023-12-12  0:19   ` Dan Williams
2023-12-07 23:31 ` [PATCH 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
2023-12-11  9:06   ` Brice Goglin
2023-12-12 19:30     ` Dave Jiang
2023-12-19 16:44       ` Jonathan Cameron
2023-12-20 20:26         ` Dave Jiang
2023-12-21 18:23           ` Dave Jiang
2023-12-11 18:03   ` fan
2023-12-11 18:13     ` Dave Jiang
2023-12-11 18:27       ` fan
2023-12-12  0:23   ` Dan Williams
2023-12-12 13:46   ` Brice Goglin
2023-12-12 16:00     ` Dave Jiang
2023-12-07 23:32 ` [PATCH 3/3] cxl: Add memory hotplug notifier for cxl region Dave Jiang
2023-12-08  3:35   ` Huang, Ying
2023-12-12  0:30   ` Dan Williams
  -- strict thread matches above, loose matches on Subject: below --
2023-12-07 23:30 [PATCH 1/3] cxl/region: Calculate performance data for a region Dave Jiang
2023-12-07 23:34 ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox