Re: [PATCH RFC v2 06/18] cxl/port: Add Dynamic Capacity size support to endpoint decoders

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Navneet Singh <navneet.singh@intel.com>,
	Fan Ni <fan.ni@samsung.com>, Davidlohr Bueso <dave@stgolabs.net>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC v2 06/18] cxl/port: Add Dynamic Capacity size support to endpoint decoders
Date: Tue, 29 Aug 2023 16:09:19 +0100	[thread overview]
Message-ID: <20230829160919.00007f69@Huawei.com> (raw)
In-Reply-To: <20230604-dcd-type2-upstream-v2-6-f740c47e7916@intel.com>

On Mon, 28 Aug 2023 22:20:57 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> To support Dynamic Capacity Devices (DCD) endpoint decoders will need to
> map DC Regions (partitions).  Part of this is assigning the size of the
> DC Region DPA to the decoder in addition to any skip value from the
> previous decoder which exists.  This must be done within a continuous
> DPA space.  Two complications arise with Dynamic Capacity regions which
> did not exist with Ram and PMEM partitions.  First, gaps in the DPA
> space can exist between and around the DC Regions.  Second, the Linux
> resource tree does not allow a resource to be marked across existing
> nodes within a tree.
> 
> For clarity, below is an example of an 60GB device with 10GB of RAM,
> 10GB of PMEM and 10GB for each of 2 DC Regions.  The desired CXL mapping
> is 5GB of RAM, 5GB of PMEM, and all 10GB of DC1.
> 
>      DPA RANGE
>      (dpa_res)
> 0GB        10GB       20GB       30GB       40GB       50GB       60GB
> |----------|----------|----------|----------|----------|----------|
> 
> RAM         PMEM                  DC0                   DC1
>  (ram_res)  (pmem_res)            (dc_res[0])           (dc_res[1])
> |----------|----------|   <gap>  |----------|   <gap>  |----------|
> 
>  RAM        PMEM                                        DC1
> |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX|
> 0GB   5GB  10GB  15GB 20GB       30GB       40GB       50GB       60GB
> 
> The previous skip resource between RAM and PMEM was always a child of
> the RAM resource and fit nicely (see X below).  Because of this
> simplicity this skip resource reference was not stored in any CXL state.
> On release the skip range could be calculated based on the endpoint
> decoders stored values.
> 
> Now when DC1 is being mapped 4 skip resources must be created as
> children.  One of the PMEM resource (A), two of the parent DPA resource
> (B,D), and one more child of the DC0 resource (C).
> 
> 0GB        10GB       20GB       30GB       40GB       50GB       60GB
> |----------|----------|----------|----------|----------|----------|
>                            |                     |
> |----------|----------|    |     |----------|    |     |----------|
>         |          |       |          |          |
>        (X)        (A)     (B)        (C)        (D)
> 	v          v       v          v          v
> |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX|
>        skip       skip  skip        skip      skip
> 
> Expand the calculation of DPA freespace and enhance the logic to support
> mapping/unmapping DC DPA space.  To track the potential of multiple skip
> resources an xarray is attached to the endpoint decoder.  The existing
> algorithm is consolidated with the new one to store a single skip
> resource in the same way as multiple skip resources.
> 
> Co-developed-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Various minor things noticed inline.

Jonathan

> 
> ---
> An alternative of using reserve_region_with_split() was considered.
> The advantage of that would be keeping all the resource information
> stored solely in the resource tree rather than having separate
> references to them.  However, it would best be implemented with a call
> such as release_split_region() [name TBD?] which could find all the leaf
> resources in the range and release them.  Furthermore, it is not clear
> if reserve_region_with_split() is really intended for anything outside
> of init code.  In the end this algorithm seems straight forward enough.
> 
> Changes for v2:
> [iweiny: write commit message]
> [iweiny: remove unneeded changes]
> [iweiny: split from region creation patch]
> [iweiny: Alter skip algorithm to use 'anonymous regions']
> [iweiny: enhance debug messages]
> [iweiny: consolidate skip resource creation]
> [iweiny: ensure xa_destroy() is called]
> [iweiny: consolidate region requests further]
> [iweiny: ensure resource is released on xa_insert]
> ---
>  drivers/cxl/core/hdm.c  | 188 +++++++++++++++++++++++++++++++++++++++++++-----
>  drivers/cxl/core/port.c |   2 +
>  drivers/cxl/cxl.h       |   2 +
>  3 files changed, 176 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3f4af1f5fac8..3cd048677816 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c


> +
> +static int cxl_reserve_dpa_skip(struct cxl_endpoint_decoder *cxled,
> +				resource_size_t base, resource_size_t skipped)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_port *port = cxled_to_port(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	resource_size_t skip_base = base - skipped;
> +	resource_size_t size, skip_len = 0;
> +	struct device *dev = &port->dev;
> +	int rc, index;
> +
> +	size = resource_size(&cxlds->ram_res);
> +	if (size && skip_base <= cxlds->ram_res.end) {

This size only used in this if statement I'd just put it inline.
 
> +		skip_len = cxlds->ram_res.end - skip_base + 1;
> +		rc = cxl_request_skip(cxled, skip_base, skip_len);
> +		if (rc)
> +			return rc;
> +		skip_base += skip_len;
> +	}
> +
> +	if (skip_base == base) {
> +		dev_dbg(dev, "skip done!\n");

Not sure that dbg is much help as other places below where skip also done...

> +		return 0;
> +	}
> +
> +	size = resource_size(&cxlds->pmem_res);
> +	if (size && skip_base <= cxlds->pmem_res.end) {

size only used in this if statement. I'd just put
the resource_size() bit inline.

> +		skip_len = cxlds->pmem_res.end - skip_base + 1;
> +		rc = cxl_request_skip(cxled, skip_base, skip_len);
> +		if (rc)
> +			return rc;
> +		skip_base += skip_len;
> +	}
> +
> +	index = dc_mode_to_region_index(cxled->mode);
> +	for (int i = 0; i <= index; i++) {
> +		struct resource *dcr = &cxlds->dc_res[i];
> +
> +		if (skip_base < dcr->start) {
> +			skip_len = dcr->start - skip_base;
> +			rc = cxl_request_skip(cxled, skip_base, skip_len);
> +			if (rc)
> +				return rc;
> +			skip_base += skip_len;
> +		}
> +
> +		if (skip_base == base) {
> +			dev_dbg(dev, "skip done!\n");

As above - perhaps some more info?

> +			break;
> +		}
> +
> +		if (resource_size(dcr) && skip_base <= dcr->end) {
> +			if (skip_base > base)
> +				dev_err(dev, "Skip error\n");

Not return ?  If there is a reason to carry on, I'd like a comment to say what it is.

> +
> +			skip_len = dcr->end - skip_base + 1;
> +			rc = cxl_request_skip(cxled, skip_base, skip_len);
> +			if (rc)
> +				return rc;
> +			skip_base += skip_len;
> +		}
> +	}
> +
> +	return 0;
> +}
> +


> @@ -492,11 +607,13 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled,
>  					 resource_size_t *start_out,
>  					 resource_size_t *skip_out)
>  {
> +	resource_size_t free_ram_start, free_pmem_start, free_dc_start;
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> -	resource_size_t free_ram_start, free_pmem_start;
>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct device *dev = &cxled->cxld.dev;

There is one existing (I think) call to dev_dbg(cxled_dev(cxled) ...
in this function.  So both should use that here, and should convert that one
case to using dev.

>  	resource_size_t start, avail, skip;
>  	struct resource *p, *last;
> +	int index;
>  
>  	lockdep_assert_held(&cxl_dpa_rwsem);
>  
> @@ -514,6 +631,20 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled,
>  	else
>  		free_pmem_start = cxlds->pmem_res.start;
>  
> +	/*
> +	 * Limit each decoder to a single DC region to map memory with
> +	 * different DSMAS entry.
> +	 */
> +	index = dc_mode_to_region_index(cxled->mode);
> +	if (index >= 0) {
> +		if (cxlds->dc_res[index].child) {
> +			dev_err(dev, "Cannot allocate DPA from DC Region: %d\n",
> +				index);
> +			return -EINVAL;
> +		}
> +		free_dc_start = cxlds->dc_res[index].start;
> +	}
> +
>  	if (cxled->mode == CXL_DECODER_RAM) {
>  		start = free_ram_start;
>  		avail = cxlds->ram_res.end - start + 1;
> @@ -535,6 +666,29 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled,
>  		else
>  			skip_end = start - 1;
>  		skip = skip_end - skip_start + 1;
> +	} else if (cxl_decoder_mode_is_dc(cxled->mode)) {
> +		resource_size_t skip_start, skip_end;
> +
> +		start = free_dc_start;
> +		avail = cxlds->dc_res[index].end - start + 1;
> +		if ((resource_size(&cxlds->pmem_res) == 0) || !cxlds->pmem_res.child)

Previous patch used !resource_size()
I prefer compare with 0 like you have here, but which ever is chosen, things should
be consistent.

...

next prev parent reply	other threads:[~2023-08-29 15:10 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-29  5:20 [PATCH RFC v2 00/18] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2023-08-29  5:20 ` [PATCH RFC v2 01/18] cxl/hdm: Debug, use decoder name function Ira Weiny
2023-08-29 14:03   ` Jonathan Cameron
2023-08-29 21:48     ` Fan Ni
2023-09-03  2:55     ` Ira Weiny
2023-08-30 20:32   ` Dave Jiang
2023-08-29  5:20 ` [PATCH RFC v2 02/18] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Ira Weiny
2023-08-29 14:07   ` Jonathan Cameron
2023-09-03  3:38     ` Ira Weiny
2023-08-29 21:49   ` Fan Ni
2023-08-30 20:33   ` Dave Jiang
2023-10-24 16:16   ` Jonathan Cameron
2023-08-29  5:20 ` [PATCH RFC v2 03/18] cxl/mem: Read Dynamic capacity configuration from the device ira.weiny
2023-08-29 14:37   ` Jonathan Cameron
2023-09-03 23:36     ` Ira Weiny
2023-08-30 21:01   ` Dave Jiang
2023-09-05  0:14     ` Ira Weiny
2023-09-08 20:23     ` Ira Weiny
2023-08-30 21:44   ` Fan Ni
2023-09-08 22:52     ` Ira Weiny
2023-09-12 21:32       ` Fan Ni
2023-09-07 15:46   ` Alison Schofield
2023-09-12  1:18     ` Ira Weiny
2023-09-08 12:46   ` Jørgen Hansen
2023-09-11 20:26     ` Ira Weiny
2023-08-29  5:20 ` [PATCH RFC v2 04/18] cxl/region: Add Dynamic Capacity decoder and region modes Ira Weiny
2023-08-29 14:39   ` Jonathan Cameron
2023-08-30 21:13   ` Dave Jiang
2023-08-31 17:00   ` Fan Ni
2023-08-29  5:20 ` [PATCH RFC v2 05/18] cxl/port: Add Dynamic Capacity mode support to endpoint decoders Ira Weiny
2023-08-29 14:49   ` Jonathan Cameron
2023-09-05  0:05     ` Ira Weiny
2023-08-31 17:25   ` Fan Ni
2023-09-08 23:26     ` Ira Weiny
2023-08-29  5:20 ` [PATCH RFC v2 06/18] cxl/port: Add Dynamic Capacity size " Ira Weiny
2023-08-29 15:09   ` Jonathan Cameron [this message]
2023-09-05  4:32     ` Ira Weiny
2023-08-29  5:20 ` [PATCH RFC v2 07/18] cxl/mem: Expose device dynamic capacity configuration ira.weiny
2023-08-29 15:14   ` Jonathan Cameron
2023-09-05 17:55     ` Fan Ni
2023-09-05 20:45     ` Ira Weiny
2023-08-30 22:46   ` Dave Jiang
2023-09-08 23:22     ` Ira Weiny
2023-08-29  5:20 ` [PATCH RFC v2 08/18] cxl/region: Add Dynamic Capacity CXL region support Ira Weiny
2023-08-29 15:19   ` Jonathan Cameron
2023-08-30 23:27   ` Dave Jiang
2023-09-06  4:36     ` Ira Weiny
2023-09-05 21:09   ` Fan Ni
2023-08-29  5:21 ` [PATCH RFC v2 09/18] cxl/mem: Read extents on memory device discovery Ira Weiny
2023-08-29 15:26   ` Jonathan Cameron
2023-08-30  0:16     ` Ira Weiny
2023-09-05 21:41     ` Ira Weiny
2023-08-29  5:21 ` [PATCH RFC v2 10/18] cxl/mem: Handle DCD add and release capacity events Ira Weiny
2023-08-29 15:59   ` Jonathan Cameron
2023-09-05 23:49     ` Ira Weiny
2023-08-31 17:28   ` Dave Jiang
2023-09-08 15:35     ` Ira Weiny
2023-08-29  5:21 ` [PATCH RFC v2 11/18] cxl/region: Expose DC extents on region driver load Ira Weiny
2023-08-29 16:20   ` Jonathan Cameron
2023-09-06  3:36     ` Ira Weiny
2023-08-31 18:38   ` Dave Jiang
2023-09-08 23:57     ` Ira Weiny
2023-08-29  5:21 ` [PATCH RFC v2 12/18] cxl/region: Notify regions of DC changes Ira Weiny
2023-08-29 16:40   ` Jonathan Cameron
2023-09-06  4:00     ` Ira Weiny
2023-09-18 13:56   ` Jørgen Hansen
2023-09-18 17:45     ` Ira Weiny
2023-08-29  5:21 ` [PATCH RFC v2 13/18] dax/bus: Factor out dev dax resize logic Ira Weiny
2023-08-30 11:27   ` Jonathan Cameron
2023-09-06  4:12     ` Ira Weiny
2023-08-31 21:48   ` Dave Jiang
2023-08-29  5:21 ` [PATCH RFC v2 14/18] dax/region: Support DAX device creation on dynamic DAX regions Ira Weiny
2023-08-30 11:50   ` Jonathan Cameron
2023-09-06  4:35     ` Ira Weiny
2023-09-12 16:49       ` Jonathan Cameron
2023-09-12 22:08         ` Ira Weiny
2023-09-12 22:35           ` Dan Williams
2023-09-13 17:30             ` Ira Weiny
2023-09-13 17:59               ` Dan Williams
2023-09-13 19:26                 ` Ira Weiny
2023-09-14 10:32                   ` Jonathan Cameron
2023-08-29  5:21 ` [PATCH RFC v2 15/18] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-08-29 16:46   ` Jonathan Cameron
2023-09-06  4:07     ` Ira Weiny
2023-08-29  5:21 ` [PATCH RFC v2 16/18] tools/testing/cxl: Make event logs dynamic Ira Weiny
2023-08-30 12:11   ` Jonathan Cameron
2023-09-06 21:15     ` Ira Weiny
2023-08-29  5:21 ` [PATCH RFC v2 17/18] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2023-08-30 12:20   ` Jonathan Cameron
2023-09-06 21:18     ` Ira Weiny
2023-08-31 23:19   ` Dave Jiang
2023-08-29  5:21 ` [PATCH RFC v2 18/18] tools/testing/cxl: Add Dynamic Capacity events Ira Weiny
2023-08-30 12:23   ` Jonathan Cameron
2023-09-06 21:39     ` Ira Weiny
2023-08-31 23:20   ` Dave Jiang
2023-09-07 21:01 ` [PATCH RFC v2 00/18] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni
2023-09-12  1:44   ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230829160919.00007f69@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=navneet.singh@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox