linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fan Ni <nifan.cxl@gmail.com>
To: ira.weiny@intel.com
Cc: Dave Jiang <dave.jiang@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Navneet Singh <navneet.singh@intel.com>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	nvdimm@lists.linux.dev
Subject: Re: [PATCH v3 22/25] cxl/region: Read existing extents on region creation
Date: Thu, 5 Sep 2024 12:35:50 -0700	[thread overview]
Message-ID: <ZtoIFjI9gXp3l9OO@leg> (raw)
In-Reply-To: <20240816-dcd-type2-upstream-v3-22-7c9b96cba6d7@intel.com>

On Fri, Aug 16, 2024 at 09:44:30AM -0500, ira.weiny@intel.com wrote:
> From: Navneet Singh <navneet.singh@intel.com>
> 
> Dynamic capacity device extents may be left in an accepted state on a
> device due to an unexpected host crash.  In this case it is expected
> that the creation of a new region on top of a DC partition can read
> those extents and surface them for continued use.
> 
> Once all endpoint decoders are part of a region and the region is being
> realized a read of the devices extent list can reveal these previously
> accepted extents.
> 
> CXL r3.1 specifies the mailbox call Get Dynamic Capacity Extent List for
> this purpose.  The call returns all the extents for all dynamic capacity
> partitions.  If the fabric manager is adding extents to any DCD
> partition, the extent list for the recovered region may change.  In this
> case the query must retry.  Upon retry the query could encounter extents
> which were accepted on a previous list query.  Adding such extents is
> ignored without error because they are entirely within a previous
> accepted extent.
> 
> The scan for existing extents races with the dax_cxl driver.  This is
> synchronized through the region device lock.  Extents which are found
> after the driver has loaded will surface through the normal notification
> path while extents seen prior to the driver are read during driver load.
> 
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

With the minor things fixed mentioned by Jonathan and Dave,

Reviewed-by: Fan Ni <fan.ni@samsung.com>

> 
> ---
> Changes:
> [iweiny: Leverage the new add path from the event processing code such
> 	 that the adding and surfacing of extents flows through the same
> 	 code path for both event processing and existing extents.
> 	 While this does validate existing extents again on start up
> 	 this is an error recovery case / new boot scenario and should
> 	 not cause any major issues while making the code more
> 	 straight forward and maintainable.]
> 
> [iweiny: use %par]
> [iweiny: rebase]
> [iweiny: Move this patch later in the series such that the realization
>          of extents can go through the same path as an add event]
> [Fan: Issue a retry if the gen number changes]
> [djiang: s/uint64_t/u64/]
> [djiang: update function names]
> [Jørgen/djbw: read the generation and total count on first iteration of
>               the Get Extent List call]
> [djbw: s/cxl_mbox_get_dc_extent_in/cxl_mbox_get_extent_in/]
> [djbw: s/cxl_mbox_get_dc_extent_out/cxl_mbox_get_extent_out/]
> [djbw/iweiny: s/cxl_read_dc_extents/cxl_read_extent_list]
> ---
>  drivers/cxl/core/core.h   |   2 +
>  drivers/cxl/core/mbox.c   | 100 ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c |  12 ++++++
>  drivers/cxl/cxlmem.h      |  21 ++++++++++
>  4 files changed, 135 insertions(+)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 8dfc97b2e0a4..9e54064a6f48 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -21,6 +21,8 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled)
>  	return container_of(cxlds, struct cxl_memdev_state, cxlds);
>  }
>  
> +void cxl_read_extent_list(struct cxl_endpoint_decoder *cxled);
> +
>  #ifdef CONFIG_CXL_REGION
>  extern struct device_attribute dev_attr_create_pmem_region;
>  extern struct device_attribute dev_attr_create_ram_region;
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index f629ad7488ac..d43ac8eabf56 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1670,6 +1670,106 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL);
>  
> +/* Return -EAGAIN if the extent list changes while reading */
> +static int __cxl_read_extent_list(struct cxl_endpoint_decoder *cxled)
> +{
> +	u32 current_index, total_read, total_expected, initial_gen_num;
> +	struct cxl_memdev_state *mds = cxled_to_mds(cxled);
> +	struct device *dev = mds->cxlds.dev;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u32 max_extent_count;
> +	bool first = true;
> +
> +	struct cxl_mbox_get_extent_out *extents __free(kfree) =
> +				kvmalloc(mds->payload_size, GFP_KERNEL);
> +	if (!extents)
> +		return -ENOMEM;
> +
> +	total_read = 0;
> +	current_index = 0;
> +	total_expected = 0;
> +	max_extent_count = (mds->payload_size - sizeof(*extents)) /
> +				sizeof(struct cxl_extent);
> +	do {
> +		struct cxl_mbox_get_extent_in get_extent;
> +		u32 nr_returned, current_total, current_gen_num;
> +		int rc;
> +
> +		get_extent = (struct cxl_mbox_get_extent_in) {
> +			.extent_cnt = max(max_extent_count,
> +					  total_expected - current_index),
> +			.start_extent_index = cpu_to_le32(current_index),
> +		};
> +
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST,
> +			.payload_in = &get_extent,
> +			.size_in = sizeof(get_extent),
> +			.size_out = mds->payload_size,
> +			.payload_out = extents,
> +			.min_out = 1,
> +		};
> +
> +		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> +		if (rc < 0)
> +			return rc;
> +
> +		/* Save initial data */
> +		if (first) {
> +			total_expected = le32_to_cpu(extents->total_extent_count);
> +			initial_gen_num = le32_to_cpu(extents->generation_num);
> +			first = false;
> +		}
> +
> +		nr_returned = le32_to_cpu(extents->returned_extent_count);
> +		total_read += nr_returned;
> +		current_total = le32_to_cpu(extents->total_extent_count);
> +		current_gen_num = le32_to_cpu(extents->generation_num);
> +
> +		dev_dbg(dev, "Got extent list %d-%d of %d generation Num:%d\n",
> +			current_index, total_read - 1, current_total, current_gen_num);
> +
> +		if (current_gen_num != initial_gen_num || total_expected != current_total) {
> +			dev_dbg(dev, "Extent list change detected; gen %u != %u : cnt %u != %u\n",
> +				current_gen_num, initial_gen_num,
> +				total_expected, current_total);
> +			return -EAGAIN;
> +		}
> +
> +		for (int i = 0; i < nr_returned ; i++) {
> +			struct cxl_extent *extent = &extents->extent[i];
> +
> +			dev_dbg(dev, "Processing extent %d/%d\n",
> +				current_index + i, total_expected);
> +
> +			rc = validate_add_extent(mds, extent);
> +			if (rc)
> +				continue;
> +		}
> +
> +		current_index += nr_returned;
> +	} while (total_expected > total_read);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_read_extent_list() - Read existing extents
> + * @cxled: Endpoint decoder which is part of a region
> + *
> + * Issue the Get Dynamic Capacity Extent List command to the device
> + * and add existing extents if found.
> + */
> +void cxl_read_extent_list(struct cxl_endpoint_decoder *cxled)
> +{
> +	int retry = 10;
> +	int rc;
> +
> +	do {
> +		rc = __cxl_read_extent_list(cxled);
> +	} while (rc == -EAGAIN && retry--);
> +}
> +
>  static int add_dpa_res(struct device *dev, struct resource *parent,
>  		       struct resource *res, resource_size_t start,
>  		       resource_size_t size, const char *type)
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 8c9171f914fb..885fb3004784 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3190,6 +3190,15 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
>  	return rc;
>  }
>  
> +static void cxlr_add_existing_extents(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	int i;
> +
> +	for (i = 0; i < p->nr_targets; i++)
> +		cxl_read_extent_list(p->targets[i]);
> +}
> +
>  static void cxlr_dax_unregister(void *_cxlr_dax)
>  {
>  	struct cxl_dax_region *cxlr_dax = _cxlr_dax;
> @@ -3227,6 +3236,9 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
>  	dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
>  		dev_name(dev));
>  
> +	if (cxlr->mode == CXL_REGION_DC)
> +		cxlr_add_existing_extents(cxlr);
> +
>  	return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister,
>  					cxlr_dax);
>  err:
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 3a40fe1f0be7..11c03637488d 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -624,6 +624,27 @@ struct cxl_mbox_dc_response {
>  	} __packed extent_list[];
>  } __packed;
>  
> +/*
> + * Get Dynamic Capacity Extent List; Input Payload
> + * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-166
> + */
> +struct cxl_mbox_get_extent_in {
> +	__le32 extent_cnt;
> +	__le32 start_extent_index;
> +} __packed;
> +
> +/*
> + * Get Dynamic Capacity Extent List; Output Payload
> + * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-167
> + */
> +struct cxl_mbox_get_extent_out {
> +	__le32 returned_extent_count;
> +	__le32 total_extent_count;
> +	__le32 generation_num;
> +	u8 rsvd[4];
> +	struct cxl_extent extent[];
> +} __packed;
> +
>  struct cxl_mbox_get_supported_logs {
>  	__le16 entries;
>  	u8 rsvd[6];
> 
> -- 
> 2.45.2
> 

  parent reply	other threads:[~2024-09-05 19:35 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 14:44 [PATCH v3 00/25] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-08-16 14:44 ` [PATCH v3 01/25] range: Add range_overlaps() Ira Weiny
2024-08-16 14:44 ` [PATCH v3 02/25] printk: Add print format (%par) for struct range Ira Weiny
2024-08-20 14:08   ` Petr Mladek
2024-08-22 17:53     ` Ira Weiny
2024-08-22 18:10       ` Andy Shevchenko
2024-08-26 13:23         ` Petr Mladek
2024-08-26 17:23           ` Andy Shevchenko
2024-08-26 21:17             ` Ira Weiny
2024-08-27  7:43               ` Petr Mladek
2024-08-27 13:21                 ` Andy Shevchenko
2024-08-27 21:44                 ` Ira Weiny
2024-08-27 13:17               ` Andy Shevchenko
2024-08-28  4:12                 ` Ira Weiny
2024-08-28 13:50                   ` Andy Shevchenko
2024-08-26 13:17       ` Petr Mladek
2024-08-26 13:24         ` Andy Shevchenko
2024-08-16 14:44 ` [PATCH v3 03/25] dax: Document dax dev range tuple Ira Weiny
2024-08-16 20:58   ` Dave Jiang
2024-08-23 15:29   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 04/25] cxl/pci: Delay event buffer allocation Ira Weiny
2024-09-03  6:49   ` Li, Ming4
2024-09-05 19:44   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-09-03  6:50   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 06/25] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-08-16 21:45   ` Dave Jiang
2024-08-20 17:01     ` Fan Ni
2024-08-23  2:01       ` Ira Weiny
2024-08-23  2:02       ` Ira Weiny
2024-08-23 15:45   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 07/25] cxl/core: Separate region mode from decoder mode ira.weiny
2024-08-16 22:11   ` Dave Jiang
2024-08-23 15:47   ` Jonathan Cameron
2024-09-03  6:56   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 08/25] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-08-16 22:14   ` Dave Jiang
2024-09-03  6:57   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 09/25] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-08-16 23:08   ` Dave Jiang
2024-08-23  2:26     ` Ira Weiny
2024-08-23 16:09   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 10/25] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-08-16 23:17   ` Dave Jiang
2024-08-23 16:12   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 11/25] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-08-16 23:42   ` Dave Jiang
2024-08-23  2:28     ` Ira Weiny
2024-08-23 14:58       ` Dave Jiang
2024-08-23 16:14       ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 12/25] cxl/region: Refactor common create region code Ira Weiny
2024-08-16 23:43   ` Dave Jiang
2024-08-22 18:51   ` Fan Ni
2024-08-23 16:17   ` Jonathan Cameron
2024-09-03  7:04   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 13/25] cxl/region: Add sparse DAX region support ira.weiny
2024-08-16 23:51   ` Dave Jiang
2024-08-22 18:50   ` Fan Ni
2024-08-23 16:59   ` Jonathan Cameron
2024-09-03  2:15   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 14/25] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-08-16 23:57   ` Dave Jiang
2024-08-22 21:39   ` Fan Ni
2024-08-23 17:01   ` Jonathan Cameron
2024-09-03  7:06   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 15/25] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-08-22 21:41   ` Fan Ni
2024-09-03  7:07   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 16/25] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-08-17  0:02   ` Dave Jiang
2024-08-23 17:08   ` Jonathan Cameron
2024-09-03  7:09   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 17/25] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-08-19 16:35   ` Dave Jiang
2024-08-23 17:12   ` Jonathan Cameron
2024-09-03  7:10   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-08-19 18:51   ` Dave Jiang
2024-08-23  2:53     ` Ira Weiny
2024-08-23 21:32   ` Fan Ni
2024-08-27 12:08     ` Jonathan Cameron
2024-08-27 16:02       ` Fan Ni
2024-08-27 13:18   ` Jonathan Cameron
2024-08-29 21:16     ` Ira Weiny
2024-08-30  9:21       ` Jonathan Cameron
2024-09-03  6:37   ` Li, Ming4
2024-09-05 19:30   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 19/25] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-08-19 19:05   ` Dave Jiang
2024-08-23  2:58     ` Ira Weiny
2024-08-23 17:17       ` Jonathan Cameron
2024-08-23 17:19   ` Jonathan Cameron
2024-08-28 17:44   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 20/25] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-08-19 22:35   ` Dave Jiang
2024-08-27 13:26   ` Jonathan Cameron
2024-08-29 21:36     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 21/25] dax/region: Create resources on sparse DAX regions ira.weiny
2024-08-18 11:38   ` Markus Elfring
2024-08-19 23:30   ` Dave Jiang
2024-08-23 14:28     ` Ira Weiny
2024-08-27 14:12   ` Jonathan Cameron
2024-08-29 21:54     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 22/25] cxl/region: Read existing extents on region creation ira.weiny
2024-08-20  0:06   ` Dave Jiang
2024-08-23 21:31     ` Ira Weiny
2024-08-27 14:19   ` Jonathan Cameron
2024-09-05 19:35   ` Fan Ni [this message]
2024-08-16 14:44 ` [PATCH v3 23/25] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-08-20 22:54   ` Dave Jiang
2024-08-26 18:02     ` Ira Weiny
2024-08-27 14:20   ` Jonathan Cameron
2024-09-05 19:38   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 24/25] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-08-20 23:30   ` Dave Jiang
2024-08-27 14:32   ` Jonathan Cameron
2024-09-09 13:57     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 25/25] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-08-27 14:39   ` Jonathan Cameron
2024-09-09 14:08     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZtoIFjI9gXp3l9OO@leg \
    --to=nifan.cxl@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=dsterba@suse.com \
    --cc=ira.weiny@intel.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=navneet.singh@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).