NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed
From: ira.weiny@intel.com
To: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	 Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 Navneet Singh <navneet.singh@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>,
	 Davidlohr Bueso <dave@stgolabs.net>,
	 Alison Schofield <alison.schofield@intel.com>,
	 Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	 linux-cxl@vger.kernel.org, linux-doc@vger.kernel.org,
	 nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: [PATCH v7 24/27] cxl/region: Read existing extents on region creation
Date: Thu, 07 Nov 2024 14:58:42 -0600	[thread overview]
Message-ID: <20241107-dcd-type2-upstream-v7-24-56a84e66bc36@intel.com> (raw)
In-Reply-To: <20241107-dcd-type2-upstream-v7-0-56a84e66bc36@intel.com>

From: Navneet Singh <navneet.singh@intel.com>

Dynamic capacity device extents may be left in an accepted state on a
device due to an unexpected host crash.  In this case it is expected
that the creation of a new region on top of a DC partition can read
those extents and surface them for continued use.

Once all endpoint decoders are part of a region and the region is being
realized, a read of the 'devices extent list' can reveal these
previously accepted extents.

CXL r3.1 specifies the mailbox call Get Dynamic Capacity Extent List for
this purpose.  The call returns all the extents for all dynamic capacity
partitions.  If the fabric manager is adding extents to any DCD
partition, the extent list for the recovered region may change.  In this
case the query must retry.  Upon retry the query could encounter extents
which were accepted on a previous list query.  Adding such extents is
ignored without error because they are entirely within a previous
accepted extent.  Instead warn on this case to allow for differentiating
bad devices from this normal condition.

Latch any errors to be bubbled up to ensure notification to the user
even if individual errors are rate limited or otherwise ignored.

The scan for existing extents races with the dax_cxl driver.  This is
synchronized through the region device lock.  Extents which are found
after the driver has loaded will surface through the normal notification
path while extents seen prior to the driver are read during driver load.

Signed-off-by: Navneet Singh <navneet.singh@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/core/core.h   |   1 +
 drivers/cxl/core/mbox.c   | 108 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/region.c |  25 +++++++++++
 drivers/cxl/cxlmem.h      |  21 +++++++++
 4 files changed, 155 insertions(+)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index c5951018f8ff590627676eeb7a430b6acbf516d8..feb00bbe98c9b59d2c7fceb3ad5fed3885b59753 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -21,6 +21,7 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled)
 	return container_of(cxlds, struct cxl_memdev_state, cxlds);
 }
 
+int cxl_process_extent_list(struct cxl_endpoint_decoder *cxled);
 int cxl_region_invalidate_memregion(struct cxl_region *cxlr);
 
 #ifdef CONFIG_CXL_REGION
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 61d406d19bf777a3921dbb3cc1213d341d8dea4c..e0030166ea185f8ad9194f597906d61497897654 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1699,6 +1699,114 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL);
 
+/* Return -EAGAIN if the extent list changes while reading */
+static int __cxl_process_extent_list(struct cxl_endpoint_decoder *cxled)
+{
+	u32 current_index, total_read, total_expected, initial_gen_num;
+	struct cxl_memdev_state *mds = cxled_to_mds(cxled);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+	struct device *dev = mds->cxlds.dev;
+	struct cxl_mbox_cmd mbox_cmd;
+	u32 max_extent_count;
+	int latched_rc = 0;
+	bool first = true;
+
+	struct cxl_mbox_get_extent_out *extents __free(kvfree) =
+				kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+	if (!extents)
+		return -ENOMEM;
+
+	total_read = 0;
+	current_index = 0;
+	total_expected = 0;
+	max_extent_count = (cxl_mbox->payload_size - sizeof(*extents)) /
+				sizeof(struct cxl_extent);
+	do {
+		struct cxl_mbox_get_extent_in get_extent;
+		u32 nr_returned, current_total, current_gen_num;
+		int rc;
+
+		get_extent = (struct cxl_mbox_get_extent_in) {
+			.extent_cnt = max(max_extent_count,
+					  total_expected - current_index),
+			.start_extent_index = cpu_to_le32(current_index),
+		};
+
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST,
+			.payload_in = &get_extent,
+			.size_in = sizeof(get_extent),
+			.size_out = cxl_mbox->payload_size,
+			.payload_out = extents,
+			.min_out = 1,
+		};
+
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+		if (rc < 0)
+			return rc;
+
+		/* Save initial data */
+		if (first) {
+			total_expected = le32_to_cpu(extents->total_extent_count);
+			initial_gen_num = le32_to_cpu(extents->generation_num);
+			first = false;
+		}
+
+		nr_returned = le32_to_cpu(extents->returned_extent_count);
+		total_read += nr_returned;
+		current_total = le32_to_cpu(extents->total_extent_count);
+		current_gen_num = le32_to_cpu(extents->generation_num);
+
+		dev_dbg(dev, "Got extent list %d-%d of %d generation Num:%d\n",
+			current_index, total_read - 1, current_total, current_gen_num);
+
+		if (current_gen_num != initial_gen_num || total_expected != current_total) {
+			dev_warn(dev, "Extent list change detected; gen %u != %u : cnt %u != %u\n",
+				 current_gen_num, initial_gen_num,
+				 total_expected, current_total);
+			return -EAGAIN;
+		}
+
+		for (int i = 0; i < nr_returned ; i++) {
+			struct cxl_extent *extent = &extents->extent[i];
+
+			dev_dbg(dev, "Processing extent %d/%d\n",
+				current_index + i, total_expected);
+
+			rc = validate_add_extent(mds, extent);
+			if (rc)
+				latched_rc = rc;
+		}
+
+		current_index += nr_returned;
+	} while (total_expected > total_read);
+
+	return latched_rc;
+}
+
+/**
+ * cxl_process_extent_list() - Read existing extents
+ * @cxled: Endpoint decoder which is part of a region
+ *
+ * Issue the Get Dynamic Capacity Extent List command to the device
+ * and add existing extents if found.
+ *
+ * A retry of 10 is somewhat arbitrary, however, extent changes should be
+ * relatively rare while bringing up a region.  So 10 should be plenty.
+ */
+#define CXL_READ_EXTENT_LIST_RETRY 10
+int cxl_process_extent_list(struct cxl_endpoint_decoder *cxled)
+{
+	int retry = CXL_READ_EXTENT_LIST_RETRY;
+	int rc;
+
+	do {
+		rc = __cxl_process_extent_list(cxled);
+	} while (rc == -EAGAIN && retry--);
+
+	return rc;
+}
+
 static int add_dpa_res(struct device *dev, struct resource *parent,
 		       struct resource *res, resource_size_t start,
 		       resource_size_t size, const char *type)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 7de9d7bd85a3d45567885874cc1d61cb10b816a5..b160f8a95cd7d4415c6252b3a9f36b06490ddf45 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3190,6 +3190,26 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
 	return rc;
 }
 
+static int cxlr_add_existing_extents(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int i, latched_rc = 0;
+
+	for (i = 0; i < p->nr_targets; i++) {
+		struct device *dev = &p->targets[i]->cxld.dev;
+		int rc;
+
+		rc = cxl_process_extent_list(p->targets[i]);
+		if (rc) {
+			dev_err(dev, "Existing extent processing failed %d\n",
+				rc);
+			latched_rc = rc;
+		}
+	}
+
+	return latched_rc;
+}
+
 static void cxlr_dax_unregister(void *_cxlr_dax)
 {
 	struct cxl_dax_region *cxlr_dax = _cxlr_dax;
@@ -3224,6 +3244,11 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
 	dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
 		dev_name(dev));
 
+	if (cxlr->mode == CXL_REGION_DC)
+		if (cxlr_add_existing_extents(cxlr))
+			dev_err(&cxlr->dev, "Existing extent processing failed %d\n",
+				rc);
+
 	return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister,
 					cxlr_dax);
 err:
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 73dee28bbd803a8f78686e833f8ef3492ca94e66..e7b9bd5bb4a96b0cdeb4bcf9c3b7ca1499d1cddd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -627,6 +627,27 @@ struct cxl_mbox_dc_response {
 	} __packed extent_list[];
 } __packed;
 
+/*
+ * Get Dynamic Capacity Extent List; Input Payload
+ * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-166
+ */
+struct cxl_mbox_get_extent_in {
+	__le32 extent_cnt;
+	__le32 start_extent_index;
+} __packed;
+
+/*
+ * Get Dynamic Capacity Extent List; Output Payload
+ * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-167
+ */
+struct cxl_mbox_get_extent_out {
+	__le32 returned_extent_count;
+	__le32 total_extent_count;
+	__le32 generation_num;
+	u8 rsvd[4];
+	struct cxl_extent extent[];
+} __packed;
+
 struct cxl_mbox_get_supported_logs {
 	__le16 entries;
 	u8 rsvd[6];

-- 
2.47.0


  parent reply	other threads:[~2024-11-07 20:59 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-07 20:58 [PATCH v7 00/27] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-11-07 20:58 ` [PATCH v7 01/27] range: Add range_overlaps() Ira Weiny
2024-11-07 20:58 ` [PATCH v7 02/27] ACPI/CDAT: Add CDAT/DSMAS shared and read only flag values Ira Weiny
2024-11-07 20:58 ` [PATCH v7 03/27] dax: Document struct dev_dax_range Ira Weiny
2024-11-07 20:58 ` [PATCH v7 04/27] cxl/pci: Delay event buffer allocation Ira Weiny
2024-11-07 20:58 ` [PATCH v7 05/27] cxl/hdm: Use guard() in cxl_dpa_set_mode() Ira Weiny
2024-11-07 20:58 ` [PATCH v7 06/27] cxl/region: Refactor common create region code Ira Weiny
2024-11-07 20:58 ` [PATCH v7 07/27] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-11-07 20:58 ` [PATCH v7 08/27] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-11-07 20:58 ` [PATCH v7 09/27] cxl/core: Separate region mode from decoder mode ira.weiny
2024-11-07 20:58 ` [PATCH v7 10/27] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-11-07 20:58 ` [PATCH v7 11/27] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-11-07 20:58 ` [PATCH v7 12/27] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny
2024-11-07 20:58 ` [PATCH v7 13/27] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-11-07 20:58 ` [PATCH v7 14/27] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-11-07 20:58 ` [PATCH v7 15/27] cxl/region: Add sparse DAX region support ira.weiny
2024-11-07 20:58 ` [PATCH v7 16/27] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-11-07 20:58 ` [PATCH v7 17/27] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-11-07 20:58 ` [PATCH v7 18/27] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-11-07 20:58 ` [PATCH v7 19/27] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-11-07 20:58 ` [PATCH v7 20/27] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-11-07 20:58 ` [PATCH v7 21/27] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-11-07 20:58 ` [PATCH v7 22/27] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-11-07 20:58 ` [PATCH v7 23/27] dax/region: Create resources on sparse DAX regions ira.weiny
2024-11-07 20:58 ` ira.weiny [this message]
2024-11-07 20:58 ` [PATCH v7 25/27] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-11-07 20:58 ` [PATCH v7 26/27] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-11-07 20:58 ` [PATCH v7 27/27] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-11-08 17:27 ` [PATCH v7 00/27] DCD: Add support for Dynamic Capacity Devices (DCD) Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241107-dcd-type2-upstream-v7-24-56a84e66bc36@intel.com \
    --to=ira.weiny@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=navneet.singh@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox