All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	"Navneet Singh" <navneet.singh@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	"Alison Schofield" <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <nvdimm@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 24/28] dax/region: Create resources on sparse DAX regions
Date: Thu, 10 Oct 2024 16:27:45 +0100	[thread overview]
Message-ID: <20241010162745.00007b31@Huawei.com> (raw)
In-Reply-To: <20241007-dcd-type2-upstream-v4-24-c261ee6eeded@intel.com>

On Mon, 07 Oct 2024 18:16:30 -0500
ira.weiny@intel.com wrote:

> From: Navneet Singh <navneet.singh@intel.com>
> 
> DAX regions which map dynamic capacity partitions require that memory be
> allowed to come and go.  Recall sparse regions were created for this
> purpose.  Now that extents can be realized within DAX regions the DAX
> region driver can start tracking sub-resource information.
> 
> The tight relationship between DAX region operations and extent
> operations require memory changes to be controlled synchronously with
> the user of the region.  Synchronize through the dax_region_rwsem and by
> having the region driver drive both the region device as well as the
> extent sub-devices.
> 
> Recall requests to remove extents can happen at any time and that a host
> is not obligated to release the memory until it is not being used.  If
> an extent is not used allow a release response.
> 
> The DAX layer has no need for the details of the CXL memory extent
> devices.  Expose extents to the DAX layer as device children of the DAX
> region device.  A single callback from the driver aids the DAX layer to
> determine if the child device is an extent.  The DAX layer also
> registers a devres function to automatically clean up when the device is
> removed from the region.
> 
> There is a race between extents being surfaced and the dax_cxl driver
> being loaded.  The driver must therefore scan for any existing extents
> while still under the device lock.
> 
> Respond to extent notifications.  Manage the DAX region resource tree
> based on the extents lifetime.  Return the status of remove
> notifications to lower layers such that it can manage the hardware
> appropriately.
> 
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
More somewhat superficial review from me.
Needs DAX expert reviewers.

Jonathan

> ---
>  drivers/cxl/core/extent.c |  74 ++++++++++++--
>  drivers/cxl/cxl.h         |   6 ++
>  drivers/dax/bus.c         | 243 +++++++++++++++++++++++++++++++++++++++++-----
>  drivers/dax/bus.h         |   3 +-
>  drivers/dax/cxl.c         |  62 +++++++++++-
>  drivers/dax/dax-private.h |  42 ++++++++
>  drivers/dax/hmem/hmem.c   |   2 +-
>  drivers/dax/pmem.c        |   2 +-
>  8 files changed, 396 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
> index a1eb6e8e4f1a..75fb73ce2185 100644
> --- a/drivers/cxl/core/extent.c
> +++ b/drivers/cxl/core/extent.c
> @@ -270,20 +270,65 @@ static void calc_hpa_range(struct cxl_endpoint_decoder *cxled,
>  	hpa_range->end = hpa_range->start + range_len(dpa_range) - 1;
>  }
>  
> +static int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
> +			      struct region_extent *region_extent)
> +{
> +	struct device *dev = &cxlr->cxlr_dax->dev;
> +	struct cxl_notify_data notify_data;
> +	struct cxl_driver *driver;
> +
> +	dev_dbg(dev, "Trying notify: type %d HPA %pra\n",
> +		event, &region_extent->hpa_range);
> +
> +	guard(device)(dev);
> +
> +	/*
> +	 * The lack of a driver indicates a notification has failed.  No user
> +	 * space coordiantion was possible.
spell check.
coordination

> +	 */
> +	if (!dev->driver)
> +		return 0;
> +	driver = to_cxl_drv(dev->driver);
> +	if (!driver->notify)
> +		return 0;
> +
> +	notify_data = (struct cxl_notify_data) {
> +		.event = event,
> +		.region_extent = region_extent,
> +	};
> +
> +	dev_dbg(dev, "Notify: type %d HPA %pra\n",
> +		event, &region_extent->hpa_range);
> +	return driver->notify(dev, &notify_data);
> +}

> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index f0e3f8c787df..4e19d18369de 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -183,6 +183,86 @@ static bool is_sparse(struct dax_region *dax_region)
>  	return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0;
>  }

> +
> +int dax_region_add_resource(struct dax_region *dax_region,
> +			    struct device *device,
> +			    resource_size_t start, resource_size_t length)
> +{
> +	struct resource *new_resource;
> +	int rc;
> +
> +	struct dax_resource *dax_resource __free(kfree) =
> +				kzalloc(sizeof(*dax_resource), GFP_KERNEL);
> +	if (!dax_resource)
> +		return -ENOMEM;
> +
> +	guard(rwsem_write)(&dax_region_rwsem);
> +
> +	dev_dbg(dax_region->dev, "DAX region resource %pr\n", &dax_region->res);
> +	new_resource = __request_region(&dax_region->res, start, length, "extent", 0);
> +	if (!new_resource) {
> +		dev_err(dax_region->dev, "Failed to add region s:%pa l:%pa\n",
> +			&start, &length);
> +		return -ENOSPC;
> +	}
> +
> +	dev_dbg(dax_region->dev, "add resource %pr\n", new_resource);
> +	dax_resource->region = dax_region;
> +	dax_resource->res = new_resource;
> +	dev_set_drvdata(device, dax_resource);
> +	rc = devm_add_action_or_reset(device, dax_release_resource,
> +				      no_free_ptr(dax_resource));
> +	/*  On error; ensure driver data is cleared under semaphore */

It's not used in the dax_release_resource callback (that I can
immediately spot) so could you just not set it until after
this has succeeded?

> +	if (rc)
> +		dev_set_drvdata(device, NULL);
i.e. move
	dev_set_drvdata(device, dax_resource);
to here.

> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(dax_region_add_resource);
Adding quite a few exports. Is it time to namespace DAX exports?
Perhaps a follow up series.



>  bool static_dev_dax(struct dev_dax *dev_dax)
>  {
>  	return is_static(dev_dax->region);
> @@ -296,19 +376,44 @@ static ssize_t region_align_show(struct device *dev,
>  static struct device_attribute dev_attr_region_align =
>  		__ATTR(align, 0400, region_align_show, NULL);
>  
> +#define for_each_child_resource(extent, res) \
> +	for (res = (extent)->child; res; res = res->sibling)
> +
Extent naming in here is a little off for a general sounding macro.
Maybe for_each_child_resource(parent, res) or something like that?

Seem generally useful. Maybe move to resource.h?

> @@ -1494,8 +1679,14 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
>  	device_initialize(dev);
>  	dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
>  
> +	if (is_sparse(dax_region) && data->size) {
> +		dev_err(parent, "Sparse DAX region devices must be created initially with 0 size");
> +		rc = -EINVAL;
> +		goto err_id;

Right label?  This code doesn't have side effects and the next error path is goto err_range
Looks like you fail to reverse the alloc_dev_dax_id() in this error path.

> +	}
> +
>  	rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start,
> -				 data->size);
> +				 data->size, NULL);
>  	if (rc)
>  		goto err_range;
>  
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 783bfeef42cc..ae5029ea6047 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -9,6 +9,7 @@ struct dev_dax;
>  struct resource;
>  struct dax_device;
>  struct dax_region;
> +struct dax_sparse_ops;
>  
>  /* dax bus specific ioresource flags */
>  #define IORESOURCE_DAX_STATIC BIT(0)
> @@ -17,7 +18,7 @@ struct dax_region;
>  
>  struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		struct range *range, int target_node, unsigned int align,
> -		unsigned long flags);
> +		unsigned long flags, struct dax_sparse_ops *sparse_ops);
>  
>  struct dev_dax_data {
>  	struct dax_region *dax_region;
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 367e86b1c22a..df979ea2cb59 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -5,6 +5,58 @@
>  
>  #include "../cxl/cxl.h"
>  #include "bus.h"
> +#include "dax-private.h"
> +
> +static int __cxl_dax_add_resource(struct dax_region *dax_region,
> +				  struct region_extent *region_extent)
> +{
> +	resource_size_t start, length;
> +	struct device *dev;
> +
> +	dev = &region_extent->dev;
Might as well do
	struct device *dev = &region_extent->dev;


> +	start = dax_region->res.start + region_extent->hpa_range.start;
> +	length = range_len(&region_extent->hpa_range);
> +	return dax_region_add_resource(dax_region, dev, start, length);
> +}


> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index ccde98c3d4e2..e3866115243e 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
...

> +/*
> + * Similar to run_dax() dax_region_{add,rm}_resource() and dax_avail_size() are
> + * exported but are not intended to be generic operations outside the dax
> + * subsystem.  They are only generic between the dax layer and the dax drivers.
> + */
> +int dax_region_add_resource(struct dax_region *dax_region, struct device *dev,
> +			    resource_size_t start, resource_size_t length);
> +int dax_region_rm_resource(struct dax_region *dax_region,
> +			   struct device *dev);
> +resource_size_t dax_avail_size(struct resource *dax_resource);
> +
> +typedef int (*match_cb)(struct device *dev, resource_size_t *size_avail);
Why is this here?


  reply	other threads:[~2024-10-10 15:27 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-07 23:16 [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-10-07 23:16 ` [PATCH v4 01/28] test printk: Add very basic struct resource tests Ira Weiny
2024-10-08 16:35   ` Andy Shevchenko
2024-10-09 12:24   ` Jonathan Cameron
2024-10-09 17:09   ` Fan Ni
2024-10-10 14:59   ` Petr Mladek
2024-10-11 14:49     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 02/28] printk: Add print format (%pra) for struct range Ira Weiny
2024-10-08 16:56   ` Andy Shevchenko
2024-10-09 12:27     ` Jonathan Cameron
2024-10-09 14:42       ` Andy Shevchenko
2024-10-09 13:30   ` Rasmus Villemoes
2024-10-09 14:41     ` Andy Shevchenko
2024-10-14  0:08       ` Ira Weiny
2024-10-11 16:54     ` Ira Weiny
2024-10-09 17:33   ` Fan Ni
2024-10-11  2:09   ` Bagas Sanjaya
2024-10-17 20:57     ` Ira Weiny
2024-10-25 12:42       ` Bagas Sanjaya
2024-10-07 23:16 ` [PATCH v4 03/28] cxl/cdat: Use %pra for dpa range outputs Ira Weiny
2024-10-09 12:33   ` Jonathan Cameron
2024-10-09 17:34   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 04/28] range: Add range_overlaps() Ira Weiny
2024-10-08 16:10   ` David Sterba
2024-10-09 14:45     ` Andy Shevchenko
2024-10-09 14:46       ` Andy Shevchenko
2024-10-14  0:12         ` Ira Weiny
2024-10-09 15:36       ` David Sterba
2024-10-09 16:04         ` Andy Shevchenko
2024-10-10 15:24     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 05/28] dax: Document dax dev range tuple Ira Weiny
2024-10-09 12:42   ` Jonathan Cameron
2024-10-11 20:40     ` Ira Weiny
2024-10-16 15:48       ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 06/28] cxl/pci: Delay event buffer allocation Ira Weiny
2024-10-09 17:47   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 07/28] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-10-07 23:16 ` [PATCH v4 08/28] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-10-09 12:49   ` Jonathan Cameron
2024-10-14  0:05     ` Ira Weiny
2024-10-16 15:54       ` Jonathan Cameron
2024-10-16 16:59         ` Kees Cook
2024-10-07 23:16 ` [PATCH v4 09/28] cxl/core: Separate region mode from decoder mode ira.weiny
2024-10-09 12:51   ` Jonathan Cameron
2024-10-09 18:06   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 10/28] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-10-07 23:16 ` [PATCH v4 11/28] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-10-10 12:45   ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny
2024-10-09 14:42   ` Rafael J. Wysocki
2024-10-11 20:38     ` Ira Weiny
2024-10-14 20:52       ` Wysocki, Rafael J
2024-10-09 18:16   ` Fan Ni
2024-10-14  1:16     ` Ira Weiny
2024-10-10 12:51   ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 13/28] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-10-09 20:46   ` Fan Ni
2024-10-14  1:34     ` Ira Weiny
2024-10-10 13:04   ` Jonathan Cameron
2024-10-16 21:34     ` Ira Weiny
2024-10-11  2:15   ` Bagas Sanjaya
2024-10-07 23:16 ` [PATCH v4 14/28] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-10-10 13:14   ` Jonathan Cameron
2024-10-17 17:51     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 15/28] cxl/region: Refactor common create region code Ira Weiny
2024-10-10 13:18   ` Jonathan Cameron
2024-10-17 20:29     ` Ira Weiny
2024-10-10 16:27   ` Fan Ni
2024-10-24  2:17   ` Alison Schofield
2024-10-07 23:16 ` [PATCH v4 16/28] cxl/region: Add sparse DAX region support ira.weiny
2024-10-10 13:46   ` Jonathan Cameron
2024-10-10 17:41   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 17/28] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-10-10 13:49   ` Jonathan Cameron
2024-10-10 17:58   ` Fan Ni
2024-10-24  2:33     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 18/28] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-10-10 18:07   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 19/28] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-10-10 14:15   ` Jonathan Cameron
2024-10-10 18:25   ` Fan Ni
2024-10-24  3:09     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 20/28] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-10-10 14:21   ` Jonathan Cameron
2024-10-10 18:29   ` Fan Ni
2024-10-24  2:30   ` Alison Schofield
2024-10-07 23:16 ` [PATCH v4 21/28] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-10-09  1:56   ` Li, Ming4
2024-10-09 19:49     ` Ira Weiny
2024-10-10  3:06       ` Li, Ming4
2024-10-14  2:05         ` Ira Weiny
2024-10-10 14:50       ` Jonathan Cameron
2024-10-11 19:14         ` Fan Ni
2024-10-17 21:15         ` Ira Weiny
2024-10-18  9:03           ` Jonathan Cameron
2024-10-21 14:04             ` Ira Weiny
2024-10-21 14:47               ` Jonathan Cameron
2024-10-10 14:58   ` Jonathan Cameron
2024-10-17 21:39     ` Ira Weiny
2024-10-18  9:09       ` Jonathan Cameron
2024-10-21 18:45         ` Ira Weiny
2024-10-22 17:01           ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 22/28] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-10-10 15:01   ` Jonathan Cameron
2024-10-18 18:26     ` Ira Weiny
2024-10-21  9:37       ` Jonathan Cameron
2024-10-14 16:08   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 23/28] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-10-10 15:06   ` Jonathan Cameron
2024-10-21 21:16     ` Ira Weiny
2024-10-14 16:56   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 24/28] dax/region: Create resources on sparse DAX regions ira.weiny
2024-10-10 15:27   ` Jonathan Cameron [this message]
2024-10-23  1:20     ` Ira Weiny
2024-10-23 11:22       ` Jonathan Cameron
2024-10-24  3:50         ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 25/28] cxl/region: Read existing extents on region creation ira.weiny
2024-10-10 15:33   ` Jonathan Cameron
2024-10-24  1:41     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 26/28] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-10-10 15:41   ` Jonathan Cameron
2024-10-24  1:52     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 27/28] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-10-10 15:49   ` Jonathan Cameron
2024-10-24  1:59     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 28/28] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-10-10 15:58   ` Jonathan Cameron
2024-10-24  2:23     ` Ira Weiny
2024-10-08 22:57 ` [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni
2024-10-08 23:06   ` Fan Ni
2024-10-10 15:30     ` Ira Weiny
2024-10-10 15:31     ` Ira Weiny
2024-10-21 16:47 ` Fan Ni
2024-10-22 17:05   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241010162745.00007b31@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=navneet.singh@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.