All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	"Navneet Singh" <navneet.singh@intel.com>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<nvdimm@lists.linux.dev>
Subject: Re: [PATCH v3 21/25] dax/region: Create resources on sparse DAX regions
Date: Tue, 27 Aug 2024 15:12:37 +0100	[thread overview]
Message-ID: <20240827151237.00000f2b@Huawei.com> (raw)
In-Reply-To: <20240816-dcd-type2-upstream-v3-21-7c9b96cba6d7@intel.com>

On Fri, 16 Aug 2024 09:44:29 -0500
ira.weiny@intel.com wrote:

> From: Navneet Singh <navneet.singh@intel.com>
> 
> DAX regions which map dynamic capacity partitions require that memory be
> allowed to come and go.  Recall sparse regions were created for this
> purpose.  Now that extents can be realized within DAX regions the DAX
> region driver can start tracking sub-resource information.
> 
> The tight relationship between DAX region operations and extent
> operations require memory changes to be controlled synchronously with
> the user of the region.  Synchronize through the dax_region_rwsem and by
> having the region driver drive both the region device as well as the
> extent sub-devices.
> 
> Recall requests to remove extents can happen at any time and that a host
> is not obligated to release the memory until it is not being used.  If
> an extent is not used allow a release response.
> 
> The DAX layer has no need for the details of the CXL memory extent
> devices.  Expose extents to the DAX layer as device children of the DAX
> region device.  A single callback from the driver aids the DAX layer to
> determine if the child device is an extent.  The DAX layer also
> registers a devres function to automatically clean up when the device is
> removed from the region.
> 
> There is a race between extents being surfaced and the dax_cxl driver
> being loaded.  The driver must therefore scan for any existing extents
> while still under the device lock.
> 
> Respond to extent notifications.  Manage the DAX region resource tree
> based on the extents lifetime.  Return the status of remove
> notifications to lower layers such that it can manage the hardware
> appropriately.
> 
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
A few minor comments inline.

Jonathan

> 
> ---
> Changes:
> [iweiny: patch reorder]
> [iweiny: move hunks from other patches to clarify code changes and
>          add/release flows WRT dax regions]
> [iweiny: use %par]
> [iweiny: clean up variable names]
> [iweiny: Simplify sparse_ops]
> [Fan: avoid open coding range_len()]
> [djbw: s/reg_ext/region_extent]
> ---
>  drivers/cxl/core/extent.c |  76 +++++++++++++--
>  drivers/cxl/cxl.h         |   6 ++
>  drivers/dax/bus.c         | 243 +++++++++++++++++++++++++++++++++++++++++-----
>  drivers/dax/bus.h         |   3 +-
>  drivers/dax/cxl.c         |  63 +++++++++++-
>  drivers/dax/dax-private.h |  34 +++++++
>  drivers/dax/hmem/hmem.c   |   2 +-
>  drivers/dax/pmem.c        |   2 +-
>  8 files changed, 391 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
> index d7d526a51e2b..103b0bec3a4a 100644
> --- a/drivers/cxl/core/extent.c
> +++ b/drivers/cxl/core/extent.c
> @@ -271,20 +271,67 @@ static void calc_hpa_range(struct cxl_endpoint_decoder *cxled,
>  	hpa_range->end = hpa_range->start + range_len(dpa_range) - 1;
>  }
>  
> +static int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
> +			      struct region_extent *region_extent)
> +{
> +	struct cxl_dax_region *cxlr_dax;
> +	struct device *dev;
> +	int rc = 0;
> +
> +	cxlr_dax = cxlr->cxlr_dax;
> +	dev = &cxlr_dax->dev;
> +	dev_dbg(dev, "Trying notify: type %d HPA %par\n",
> +		event, &region_extent->hpa_range);
> +
> +	/*
> +	 * NOTE the lack of a driver indicates a notification has failed.  No
> +	 * user space coordiantion was possible.
> +	 */
> +	device_lock(dev);

I'd use guard() for this as then can just return the notify result
and drop local variable rc.


> +	if (dev->driver) {
> +		struct cxl_driver *driver = to_cxl_drv(dev->driver);
> +		struct cxl_notify_data notify_data = (struct cxl_notify_data) {
> +			.event = event,
> +			.region_extent = region_extent,
> +		};
> +
> +		if (driver->notify) {
> +			dev_dbg(dev, "Notify: type %d HPA %par\n",
> +				event, &region_extent->hpa_range);
> +			rc = driver->notify(dev, &notify_data);
> +		}
> +	}
> +	device_unlock(dev);
> +	return rc;
> +}
>
> @@ -338,8 +390,20 @@ static int cxlr_add_extent(struct cxl_dax_region *cxlr_dax,
>  		return rc;
>  	}
>  
> -	/* device model handles freeing region_extent */
> -	return online_region_extent(region_extent);
> +	rc = online_region_extent(region_extent);
> +	/* device model handled freeing region_extent */
> +	if (rc)
> +		return rc;
> +
> +	rc = cxlr_notify_extent(cxlr_dax->cxlr, DCD_ADD_CAPACITY, region_extent);
> +	/*
> +	 * The region device was breifly live but DAX layer ensures it was not

briefly

> +	 * used
> +	 */
> +	if (rc)
> +		region_rm_extent(region_extent);	
> +
> +	return rc;
>  }

> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 975860371d9f..f14b0cfa7edd 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c

> +EXPORT_SYMBOL_GPL(dax_region_add_resource);
> +
> +int dax_region_rm_resource(struct dax_region *dax_region,
> +			   struct device *dev)
> +{
> +	struct dax_resource *dax_resource;
> +
> +	guard(rwsem_write)(&dax_region_rwsem);
> +
> +	dax_resource = dev_get_drvdata(dev);
> +	if (!dax_resource)
> +		return 0;
> +
> +	if (dax_resource->use_cnt)
> +		return -EBUSY;
> +
> +	/* avoid races with users trying to use the extent */

Not obvious to me from local code, why does releasing the resource
here avoid a race?  Perhaps the comment needs expanding.

> +	__dax_release_resource(dax_resource);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_region_rm_resource);
> +


> +static ssize_t dev_dax_resize_sparse(struct dax_region *dax_region,
> +				     struct dev_dax *dev_dax,
> +				     resource_size_t to_alloc)
> +{
> +	struct dax_resource *dax_resource;
> +	resource_size_t available_size;
> +	struct device *extent_dev;
> +	ssize_t alloc;
> +
> +	extent_dev = device_find_child(dax_region->dev, dax_region,
> +				       find_free_extent);

There is a __free for put device and it will tidy this up a tiny bit.

> +	if (!extent_dev)
> +		return 0;
> +
> +	dax_resource = dev_get_drvdata(extent_dev);
> +	if (!dax_resource)
> +		return 0;
> +
> +	available_size = dax_avail_size(dax_resource->res);
> +	to_alloc = min(available_size, to_alloc);
I'd put those two inline and skip the local variables unless
they have more use in later patches.

	alloc = __dev_dax_resize(dax_resources->res, dev_dax,
				 min(dax_avail_size(dax_resources->res), to_alloc),
				 dax_resource);
	
				
> +	alloc = __dev_dax_resize(dax_resource->res, dev_dax, to_alloc, dax_resource);
> +	if (alloc > 0)
> +		dax_resource->use_cnt++;
> +	put_device(extent_dev);
> +	return alloc;
> +}
> +

> @@ -1494,8 +1679,14 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
>  	device_initialize(dev);
>  	dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
>  
> +	if (is_sparse(dax_region) && data->size) {
> +		dev_err(parent, "Sparse DAX region devices are created initially with 0 size");
must be created initially with 0 size.

Otherwise this error message says that they are, so why is it an error?

> +		rc = -EINVAL;
> +		goto err_id;
> +	}
> +
>  	rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start,
> -				 data->size);
> +				 data->size, NULL);
>  	if (rc)
>  		goto err_range;
>  

> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 367e86b1c22a..bf3b82b0120d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -5,6 +5,60 @@

...

> +static int cxl_dax_region_notify(struct device *dev,
> +				 struct cxl_notify_data *notify_data)
> +{
> +	struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev);
> +	struct dax_region *dax_region = dev_get_drvdata(dev);
> +	struct region_extent *region_extent = notify_data->region_extent;
> +
> +	switch (notify_data->event) {
> +	case DCD_ADD_CAPACITY:
> +		return __cxl_dax_add_resource(dax_region, region_extent);
> +	case DCD_RELEASE_CAPACITY:
> +		return dax_region_rm_resource(dax_region, &region_extent->dev);
> +	case DCD_FORCED_CAPACITY_RELEASE:
> +	default:
> +		dev_err(&cxlr_dax->dev, "Unknown DC event %d\n",
> +			notify_data->event);
> +		break;
Might as well return here and not below.
Makes it really really obvious this is the error path and currently the only
one that hits the return statement.
> +	}
> +
> +	return -ENXIO;
> +}

>  static int cxl_dax_region_probe(struct device *dev)
>  {
> @@ -24,14 +78,16 @@ static int cxl_dax_region_probe(struct device *dev)
>  		flags |= IORESOURCE_DAX_SPARSE_CAP;
>  
>  	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
> -				      PMD_SIZE, flags);
> +				      PMD_SIZE, flags, &sparse_ops);
>  	if (!dax_region)
>  		return -ENOMEM;
>  
> -	if (cxlr->mode == CXL_REGION_DC)
> +	if (cxlr->mode == CXL_REGION_DC) {
> +		device_for_each_child(&cxlr_dax->dev, dax_region,
> +				      cxl_dax_add_resource);
>  		/* Add empty seed dax device */
>  		dev_size = 0;
> -	else
> +	} else

Coding style says that you need brackets for all branches if
one needs them (as multiline).  Just above:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#spaces


>  		dev_size = range_len(&cxlr_dax->hpa_range);
>  
>  	data = (struct dev_dax_data) {


  parent reply	other threads:[~2024-08-27 14:12 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 14:44 [PATCH v3 00/25] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-08-16 14:44 ` [PATCH v3 01/25] range: Add range_overlaps() Ira Weiny
2024-08-16 14:44 ` [PATCH v3 02/25] printk: Add print format (%par) for struct range Ira Weiny
2024-08-20 14:08   ` Petr Mladek
2024-08-22 17:53     ` Ira Weiny
2024-08-22 18:10       ` Andy Shevchenko
2024-08-26 13:23         ` Petr Mladek
2024-08-26 17:23           ` Andy Shevchenko
2024-08-26 21:17             ` Ira Weiny
2024-08-27  7:43               ` Petr Mladek
2024-08-27 13:21                 ` Andy Shevchenko
2024-08-27 21:44                 ` Ira Weiny
2024-08-27 13:17               ` Andy Shevchenko
2024-08-28  4:12                 ` Ira Weiny
2024-08-28 13:50                   ` Andy Shevchenko
2024-08-26 13:17       ` Petr Mladek
2024-08-26 13:24         ` Andy Shevchenko
2024-08-16 14:44 ` [PATCH v3 03/25] dax: Document dax dev range tuple Ira Weiny
2024-08-16 20:58   ` Dave Jiang
2024-08-23 15:29   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 04/25] cxl/pci: Delay event buffer allocation Ira Weiny
2024-09-03  6:49   ` Li, Ming4
2024-09-05 19:44   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-09-03  6:50   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 06/25] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-08-16 21:45   ` Dave Jiang
2024-08-20 17:01     ` Fan Ni
2024-08-23  2:01       ` Ira Weiny
2024-08-23  2:02       ` Ira Weiny
2024-08-23 15:45   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 07/25] cxl/core: Separate region mode from decoder mode ira.weiny
2024-08-16 22:11   ` Dave Jiang
2024-08-23 15:47   ` Jonathan Cameron
2024-09-03  6:56   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 08/25] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-08-16 22:14   ` Dave Jiang
2024-09-03  6:57   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 09/25] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-08-16 23:08   ` Dave Jiang
2024-08-23  2:26     ` Ira Weiny
2024-08-23 16:09   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 10/25] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-08-16 23:17   ` Dave Jiang
2024-08-23 16:12   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 11/25] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-08-16 23:42   ` Dave Jiang
2024-08-23  2:28     ` Ira Weiny
2024-08-23 14:58       ` Dave Jiang
2024-08-23 16:14       ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 12/25] cxl/region: Refactor common create region code Ira Weiny
2024-08-16 23:43   ` Dave Jiang
2024-08-22 18:51   ` Fan Ni
2024-08-23 16:17   ` Jonathan Cameron
2024-09-03  7:04   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 13/25] cxl/region: Add sparse DAX region support ira.weiny
2024-08-16 23:51   ` Dave Jiang
2024-08-22 18:50   ` Fan Ni
2024-08-23 16:59   ` Jonathan Cameron
2024-09-03  2:15   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 14/25] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-08-16 23:57   ` Dave Jiang
2024-08-22 21:39   ` Fan Ni
2024-08-23 17:01   ` Jonathan Cameron
2024-09-03  7:06   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 15/25] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-08-22 21:41   ` Fan Ni
2024-09-03  7:07   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 16/25] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-08-17  0:02   ` Dave Jiang
2024-08-23 17:08   ` Jonathan Cameron
2024-09-03  7:09   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 17/25] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-08-19 16:35   ` Dave Jiang
2024-08-23 17:12   ` Jonathan Cameron
2024-09-03  7:10   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-08-19 18:51   ` Dave Jiang
2024-08-23  2:53     ` Ira Weiny
2024-08-23 21:32   ` Fan Ni
2024-08-27 12:08     ` Jonathan Cameron
2024-08-27 16:02       ` Fan Ni
2024-08-27 13:18   ` Jonathan Cameron
2024-08-29 21:16     ` Ira Weiny
2024-08-30  9:21       ` Jonathan Cameron
2024-09-03  6:37   ` Li, Ming4
2024-09-05 19:30   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 19/25] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-08-19 19:05   ` Dave Jiang
2024-08-23  2:58     ` Ira Weiny
2024-08-23 17:17       ` Jonathan Cameron
2024-08-23 17:19   ` Jonathan Cameron
2024-08-28 17:44   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 20/25] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-08-19 22:35   ` Dave Jiang
2024-08-27 13:26   ` Jonathan Cameron
2024-08-29 21:36     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 21/25] dax/region: Create resources on sparse DAX regions ira.weiny
2024-08-18 11:38   ` Markus Elfring
2024-08-19 23:30   ` Dave Jiang
2024-08-23 14:28     ` Ira Weiny
2024-08-27 14:12   ` Jonathan Cameron [this message]
2024-08-29 21:54     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 22/25] cxl/region: Read existing extents on region creation ira.weiny
2024-08-20  0:06   ` Dave Jiang
2024-08-23 21:31     ` Ira Weiny
2024-08-27 14:19   ` Jonathan Cameron
2024-09-05 19:35   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 23/25] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-08-20 22:54   ` Dave Jiang
2024-08-26 18:02     ` Ira Weiny
2024-08-27 14:20   ` Jonathan Cameron
2024-09-05 19:38   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 24/25] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-08-20 23:30   ` Dave Jiang
2024-08-27 14:32   ` Jonathan Cameron
2024-09-09 13:57     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 25/25] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-08-27 14:39   ` Jonathan Cameron
2024-09-09 14:08     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240827151237.00000f2b@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=dsterba@suse.com \
    --cc=fan.ni@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=navneet.singh@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.