Re: [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support.

Linux CXL
 help / color / mirror / Atom feed

From: Navneet Singh <navneet.singh@intel.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>,
	<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support.
Date: Thu, 15 Jun 2023 14:56:33 -0400	[thread overview]
Message-ID: <ZIte4QozSm+n2zI3@fedora> (raw)
In-Reply-To: <648b548db05f5_1c7ab42944a@iweiny-mobl.notmuch>

On Thu, Jun 15, 2023 at 11:12:29AM -0700, Ira Weiny wrote:
> Dave Jiang wrote:
> > 
> > 
> > On 6/14/23 12:16, ira.weiny@intel.com wrote:
> > > From: Navneet Singh <navneet.singh@intel.com>
> > > 
> > > CXL devices optionally support dynamic capacity. CXL Regions must be
> > > created to access this capacity.
> > > 
> > > Add sysfs entries to create dynamic capacity cxl regions. Provide a new
> > > Dynamic Capacity decoder mode which targets dynamic capacity on devices
> > > which are added to that region.
> > > 
> > > Below are the steps to create and delete dynamic capacity region0
> > > (example).
> > > 
> > >      region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > >      echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > >      echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > >      echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > > 
> > >      echo "dc0" >/sys/bus/cxl/devices/decoder1.0/mode
> > >      echo 0x400000000 >/sys/bus/cxl/devices/decoder1.0/dpa_size
> > > 
> > >      echo 0x400000000 > /sys/bus/cxl/devices/$region/size
> > >      echo  "decoder1.0" > /sys/bus/cxl/devices/$region/target0
> > >      echo 1 > /sys/bus/cxl/devices/$region/commit
> > >      echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > > 
> > >      echo $region> /sys/bus/cxl/devices/decoder0.0/delete_region
> > > 
> > > Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> > > 
> > > ---
> > > [iweiny: fixups]
> > > [iweiny: remove unused CXL_DC_REGION_MODE macro]
> > > [iweiny: Make dc_mode_to_region_index static]
> > > [iweiny: simplify <sysfs>/create_dc_region]
> > > [iweiny: introduce decoder_mode_is_dc]
> > > [djbw: fixups, no sign-off: preview only]
> > > ---
> > >   drivers/cxl/Kconfig       |  11 +++
> > >   drivers/cxl/core/core.h   |   7 ++
> > >   drivers/cxl/core/hdm.c    | 234 ++++++++++++++++++++++++++++++++++++++++++----
> > >   drivers/cxl/core/port.c   |  18 ++++
> > >   drivers/cxl/core/region.c | 135 ++++++++++++++++++++++++--
> > >   drivers/cxl/cxl.h         |  28 ++++++
> > >   drivers/dax/cxl.c         |   4 +
> > >   7 files changed, 409 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > index ff4e78117b31..df034889d053 100644
> > > --- a/drivers/cxl/Kconfig
> > > +++ b/drivers/cxl/Kconfig
> > > @@ -121,6 +121,17 @@ config CXL_REGION
> > >   
> > >   	  If unsure say 'y'
> > >   
> > > +config CXL_DCD
> > > +	bool "CXL: DCD Support"
> > > +	default CXL_BUS
> > > +	depends on CXL_REGION
> > > +	help
> > > +	  Enable the CXL core to provision CXL DCD regions.
> > > +	  CXL devices optionally support dynamic capacity and DCD region
> > > +	  maps the dynamic capacity regions DPA's into Host HPA ranges.
> > > +
> > > +	  If unsure say 'y'
> > > +
> > >   config CXL_REGION_INVALIDATION_TEST
> > >   	bool "CXL: Region Cache Management Bypass (TEST)"
> > >   	depends on CXL_REGION
> > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > > index 27f0968449de..725700ab5973 100644
> > > --- a/drivers/cxl/core/core.h
> > > +++ b/drivers/cxl/core/core.h
> > > @@ -9,6 +9,13 @@ extern const struct device_type cxl_nvdimm_type;
> > >   
> > >   extern struct attribute_group cxl_base_attribute_group;
> > >   
> > > +#ifdef CONFIG_CXL_DCD
> > > +extern struct device_attribute dev_attr_create_dc_region;
> > > +#define SET_CXL_DC_REGION_ATTR(x) (&dev_attr_##x.attr),
> > > +#else
> > > +#define SET_CXL_DC_REGION_ATTR(x)
> > > +#endif
> > > +
> > >   #ifdef CONFIG_CXL_REGION
> > >   extern struct device_attribute dev_attr_create_pmem_region;
> > >   extern struct device_attribute dev_attr_create_ram_region;
> > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > index 514d30131d92..29649b47d177 100644
> > > --- a/drivers/cxl/core/hdm.c
> > > +++ b/drivers/cxl/core/hdm.c
> > > @@ -233,14 +233,23 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > >   	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > >   	struct resource *res = cxled->dpa_res;
> > >   	resource_size_t skip_start;
> > > +	resource_size_t skipped = cxled->skip;
> > >   
> > >   	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > >   
> > >   	/* save @skip_start, before @res is released */
> > > -	skip_start = res->start - cxled->skip;
> > > +	skip_start = res->start - skipped;
> > >   	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > > -	if (cxled->skip)
> > > -		__release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> > > +	if (cxled->skip != 0) {
> > > +		while (skipped != 0) {
> > > +			res = xa_load(&cxled->skip_res, skip_start);
> > > +			__release_region(&cxlds->dpa_res, skip_start,
> > > +							resource_size(res));
> > > +			xa_erase(&cxled->skip_res, skip_start);
> > > +			skip_start += resource_size(res);
> > > +			skipped -= resource_size(res);
> > > +			}
> > > +	}
> > >   	cxled->skip = 0;
> > >   	cxled->dpa_res = NULL;
> > >   	put_device(&cxled->cxld.dev);
> > > @@ -267,6 +276,19 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > >   	__cxl_dpa_release(cxled);
> > >   }
> > >   
> > > +static int dc_mode_to_region_index(enum cxl_decoder_mode mode)
> > > +{
> > > +	int index = 0;
> > > +
> > > +	for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) {
> > > +		if (mode == i)
> > > +			return index;
> > > +		index++;
> > > +	}
> > > +
> > > +	return -EINVAL;
> > > +}
> > > +
> > >   static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > >   			     resource_size_t base, resource_size_t len,
> > >   			     resource_size_t skipped)
> > > @@ -275,7 +297,11 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > >   	struct cxl_port *port = cxled_to_port(cxled);
> > >   	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > >   	struct device *dev = &port->dev;
> > > +	struct device *ed_dev = &cxled->cxld.dev;
> > > +	struct resource *dpa_res = &cxlds->dpa_res;
> > > +	resource_size_t skip_len = 0;
> > >   	struct resource *res;
> > > +	int rc, index;
> > >   
> > >   	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > >   
> > > @@ -304,28 +330,119 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > >   	}
> > >   
> > >   	if (skipped) {
> > > -		res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
> > > -				       dev_name(&cxled->cxld.dev), 0);
> > > -		if (!res) {
> > > -			dev_dbg(dev,
> > > -				"decoder%d.%d: failed to reserve skipped space\n",
> > > -				port->id, cxled->cxld.id);
> > > -			return -EBUSY;
> > > +		resource_size_t skip_base = base - skipped;
> > > +
> > > +		if (decoder_mode_is_dc(cxled->mode)) {
> > 
> > Maybe move this entire block to a helper function to reduce the size of 
> > the current function and reduce indent levels and improve readability?
> 
> :-/
> 
> I'll work on breaking it out more.  The logic here is getting kind of
> crazy.
> 
> > 
> > > +			if (resource_size(&cxlds->ram_res) &&
> > > +					skip_base <= cxlds->ram_res.end) {
> > > +				skip_len = cxlds->ram_res.end - skip_base + 1;
> > > +				res = __request_region(dpa_res, skip_base,
> > > +						skip_len, dev_name(ed_dev), 0);
> > > +				if (!res)
> > > +					goto error;
> > > +
> > > +				rc = xa_insert(&cxled->skip_res, skip_base, res,
> > > +								GFP_KERNEL);
> > > +				skip_base += skip_len;
> > > +			}
> > > +
> > > +			if (resource_size(&cxlds->ram_res) &&
>                                                   ^^^^^^^
> 						  pmem_res?
> 
> > > +					skip_base <= cxlds->pmem_res.end) {
> 
> The 2 if statements here are almost exactly the same.  To the point I
> wonder if there is a bug.
> 
> Navneet,
> 
> Why does the code check ram_res the second time but go on to use pmem_res
> in the block?

Navneet- Thanks for pointing out it should be pmem_res instead of
ram_res. I will fix it.
> 
> > > +				skip_len = cxlds->pmem_res.end - skip_base + 1;
> > > +				res = __request_region(dpa_res, skip_base,
> > > +						skip_len, dev_name(ed_dev), 0);
> > > +				if (!res)
> > > +					goto error;
> > > +
> > > +				rc = xa_insert(&cxled->skip_res, skip_base, res,
> > > +								GFP_KERNEL);
> > > +				skip_base += skip_len;
> > > +			}
> > > +
> > > +			index = dc_mode_to_region_index(cxled->mode);
> > > +			for (int i = 0; i <= index; i++) {
> > > +				struct resource *dcr = &cxlds->dc_res[i];
> > > +
> > > +				if (skip_base < dcr->start) {
> > > +					skip_len = dcr->start - skip_base;
> > > +					res = __request_region(dpa_res,
> > > +							skip_base, skip_len,
> > > +							dev_name(ed_dev), 0);
> > > +					if (!res)
> > > +						goto error;
> > > +
> > > +					rc = xa_insert(&cxled->skip_res, skip_base,
> > > +							res, GFP_KERNEL);
> > > +					skip_base += skip_len;
> > > +				}
> > > +
> > > +				if (skip_base == base) {
> > > +					dev_dbg(dev, "skip done!\n");
> > > +					break;
> > > +				}
> > > +
> > > +				if (resource_size(dcr) &&
> > > +						skip_base <= dcr->end) {
> > > +					if (skip_base > base)
> > > +						dev_err(dev, "Skip error\n");
> > > +
> > > +					skip_len = dcr->end - skip_base + 1;
> > > +					res = __request_region(dpa_res, skip_base,
> > > +							skip_len,
> > > +							dev_name(ed_dev), 0);
> > > +					if (!res)
> > > +						goto error;
> > > +
> > > +					rc = xa_insert(&cxled->skip_res, skip_base,
> > > +							res, GFP_KERNEL);
> > > +					skip_base += skip_len;
> > > +				}
> > > +			}
> > > +		} else	{
> > > +			res = __request_region(dpa_res, base - skipped, skipped,
> > > +							dev_name(ed_dev), 0);
> > > +			if (!res)
> > > +				goto error;
> > > +
> > > +			rc = xa_insert(&cxled->skip_res, skip_base, res,
> > > +								GFP_KERNEL);
> > >   		}
> > >   	}
> > > -	res = __request_region(&cxlds->dpa_res, base, len,
> > > -			       dev_name(&cxled->cxld.dev), 0);
> > > +
> > > +	res = __request_region(dpa_res, base, len, dev_name(ed_dev), 0);
> > >   	if (!res) {
> > >   		dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
> > > -			port->id, cxled->cxld.id);
> > > -		if (skipped)
> > > -			__release_region(&cxlds->dpa_res, base - skipped,
> > > -					 skipped);
> > > +				port->id, cxled->cxld.id);
> > > +		if (skipped) {
> > > +			resource_size_t skip_base = base - skipped;
> > > +
> > > +			while (skipped != 0) {
> > > +				if (skip_base > base)
> > > +					dev_err(dev, "Skip error\n");
> > > +
> > > +				res = xa_load(&cxled->skip_res, skip_base);
> > > +				__release_region(dpa_res, skip_base,
> > > +							resource_size(res));
> > > +				xa_erase(&cxled->skip_res, skip_base);
> > > +				skip_base += resource_size(res);
> > > +				skipped -= resource_size(res);
> > > +			}
> > > +		}
> > >   		return -EBUSY;
> > >   	}
> > >   	cxled->dpa_res = res;
> > >   	cxled->skip = skipped;
> > >   
> > > +	for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) {
> > > +		int index = dc_mode_to_region_index(mode);
> > > +
> > > +		if (resource_contains(&cxlds->dc_res[index], res)) {
> > > +			cxled->mode = mode;
> > > +			dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id,
> > > +				cxled->cxld.id, cxled->dpa_res, cxled->mode);
> > > +			goto success > +		}
> > > +	}
> > 
> > This block should only happen if decoder_mode_is_dc() right? If that's 
> > the case, you might be able to refactor it so the 'goto success' isn't 
> > necessary.
> 
> I'll check.  I looked through this code a couple of times in my review
> before posting because I'm not 100% sure I want to see 8 different modes
> DC decoders and regions.
> 
> I think the 'mode' should be 'DC' with an index in the endpoint decoder to
> map DC region that decoder is mapping.  But that change was much bigger to
> Navneets code and I wanted to see how others felt about having DC0 - DC7
> modes.  My compromise was creating decoder_mode_is_dc().
> 
> > 
> > >   	if (resource_contains(&cxlds->pmem_res, res))
> > >   		cxled->mode = CXL_DECODER_PMEM;
> > >   	else if (resource_contains(&cxlds->ram_res, res))
> > > @@ -336,9 +453,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > >   		cxled->mode = CXL_DECODER_MIXED;
> > >   	}
> > >   
> > > +success:
> > >   	port->hdm_end++;
> > >   	get_device(&cxled->cxld.dev);
> > >   	return 0;
> > > +
> > > +error:
> > > +	dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space\n",
> > > +			port->id, cxled->cxld.id);
> > > +	return -EBUSY;
> > > +
> > >   }
> > >   
> > >   int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > > @@ -429,6 +553,14 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > >   	switch (mode) {
> > >   	case CXL_DECODER_RAM:
> > >   	case CXL_DECODER_PMEM:
> > > +	case CXL_DECODER_DC0:
> > > +	case CXL_DECODER_DC1:
> > > +	case CXL_DECODER_DC2:
> > > +	case CXL_DECODER_DC3:
> > > +	case CXL_DECODER_DC4:
> > > +	case CXL_DECODER_DC5:
> > > +	case CXL_DECODER_DC6:
> > > +	case CXL_DECODER_DC7:
> 
> For example this seems very hacky...
> 
> [snip]
> 
> > >   
> > > +/*
> > > + * The region can not be manged by CXL if any portion of
> > > + * it is already online as 'System RAM'
> > > + */
> > > +static bool region_is_system_ram(struct cxl_region *cxlr,
> > > +				 struct cxl_region_params *p)
> > > +{
> > > +	return (walk_iomem_res_desc(IORES_DESC_NONE,
> > > +				    IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> > > +				    p->res->start, p->res->end, cxlr,
> > > +				    is_system_ram) > 0);
> > > +}
> > > +
> > >   static int cxl_region_probe(struct device *dev)
> > >   {
> > >   	struct cxl_region *cxlr = to_cxl_region(dev);
> > > @@ -3174,14 +3287,7 @@ static int cxl_region_probe(struct device *dev)
> > >   	case CXL_DECODER_PMEM:
> > >   		return devm_cxl_add_pmem_region(cxlr);
> > >   	case CXL_DECODER_RAM:
> > > -		/*
> > > -		 * The region can not be manged by CXL if any portion of
> > > -		 * it is already online as 'System RAM'
> > > -		 */
> > > -		if (walk_iomem_res_desc(IORES_DESC_NONE,
> > > -					IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> > > -					p->res->start, p->res->end, cxlr,
> > > -					is_system_ram) > 0)
> > > +		if (region_is_system_ram(cxlr, p))
> > 
> > Maybe split this change out as a prep patch before the current patch.
> 
> That seems reasonable.  But the patch is not so large and the
> justification for creating a helper is that we need this same check for DC
> regions.  So it seemed ok to leave it like this.  Let me see about
> splitting it out.
> 
> [snip]
> 
> > > diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> > > index ccdf8de85bd5..eb5eb81bfbd7 100644
> > > --- a/drivers/dax/cxl.c
> > > +++ b/drivers/dax/cxl.c
> > > @@ -23,11 +23,15 @@ static int cxl_dax_region_probe(struct device *dev)
> > >   	if (!dax_region)
> > >   		return -ENOMEM;
> > >   
> > > +	if (decoder_mode_is_dc(cxlr->mode))
> > > +		return 0;
> > > +
> > >   	data = (struct dev_dax_data) {
> > >   		.dax_region = dax_region,
> > >   		.id = -1,
> > >   		.size = range_len(&cxlr_dax->hpa_range),
> > >   	};
> > > +
> > 
> > Stray blank line?
> 
> Opps!  Fixed!
> 
> Ira

next prev parent reply	other threads:[~2023-06-15 18:57 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-14 19:16 [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2023-06-14 19:16 ` [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device ira.weiny
2023-06-14 22:53   ` Dave Jiang
2023-06-15 15:04     ` Ira Weiny
2023-06-14 23:49   ` Alison Schofield
2023-06-15 22:46     ` Ira Weiny
2023-06-15 18:30   ` Fan Ni
2023-06-15 19:17     ` Navneet Singh
2023-06-15 21:41   ` Fan Ni
2023-06-22 15:58   ` Jonathan Cameron
2023-06-24 13:08     ` Ira Weiny
2023-07-03  2:29       ` Jonathan Cameron
2023-06-14 19:16 ` [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support ira.weiny
2023-06-14 23:37   ` Dave Jiang
2023-06-15 18:12     ` Ira Weiny
2023-06-15 18:28       ` Dave Jiang
2023-06-16  3:52         ` Navneet Singh
2023-06-15 18:56       ` Navneet Singh [this message]
2023-06-15  0:21   ` Alison Schofield
2023-06-16  2:06     ` Ira Weiny
2023-06-16 15:56       ` Alison Schofield
2023-06-16 16:51   ` Alison Schofield
2023-06-21  2:44     ` Ira Weiny
2023-06-20 17:55   ` Fan Ni
2023-06-20 20:33     ` Ira Weiny
2023-06-21  3:13     ` Navneet Singh
2023-06-21 17:20   ` Fan Ni
2023-06-23 18:02     ` Ira Weiny
2023-06-22 16:34   ` Jonathan Cameron
2023-07-05 14:49   ` Davidlohr Bueso
2023-06-14 19:16 ` [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace ira.weiny
2023-06-15  0:40   ` Alison Schofield
2023-06-16  2:47     ` Ira Weiny
2023-06-16 15:58       ` Dave Jiang
2023-06-20 16:23         ` Ira Weiny
2023-06-20 16:48           ` Dave Jiang
2023-06-15 15:41   ` Dave Jiang
2023-06-14 19:16 ` [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events ira.weiny
2023-06-15  2:19   ` Alison Schofield
2023-06-16  4:11     ` Ira Weiny
2023-06-27 18:20       ` Fan Ni
2023-06-15 16:58   ` Dave Jiang
2023-06-22 17:01   ` Jonathan Cameron
2023-06-29 15:19     ` Ira Weiny
2023-06-27 18:17   ` Fan Ni
2023-07-13 12:55   ` Jørgen Hansen
2023-06-14 19:16 ` [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-06-15 17:08   ` Dave Jiang
2023-06-15  0:56 ` [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) Alison Schofield
2023-06-16  2:57   ` Ira Weiny
2023-06-15 14:51 ` Ira Weiny
2023-06-22 15:07   ` Jonathan Cameron
2023-06-22 16:37     ` Jonathan Cameron
2023-06-27 14:59     ` Ira Weiny
2023-06-29 15:30 ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIte4QozSm+n2zI3@fedora \
    --to=navneet.singh@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=fan.ni@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox