Linux CXL
 help / color / mirror / Atom feed
From: Ben Widawsky <ben.widawsky@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org,
	Alison Schofield <alison.schofield@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>
Subject: Re: [PATCH 1/6] cxl/region: Add region creation ABI
Date: Fri, 18 Jun 2021 08:07:49 -0700	[thread overview]
Message-ID: <20210618150749.eo433cjs4rpilayy@intel.com> (raw)
In-Reply-To: <20210618101311.00005f78@Huawei.com>

On 21-06-18 10:13:11, Jonathan Cameron wrote:
> On Thu, 17 Jun 2021 10:36:50 -0700
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> > Regions are created as a child of the decoder that encompasses an
> > address space with constraints. Regions only exist for persistent
> > capacities.
> > 
> > When regions are created, the number of desired interleave ways must be
> > known. To enable this, the sysfs attribute will take the desired ways as
> > input. This interface intentionally allows creation of
> > impossible-to-enable regions based on interleave constraints in the
> > topology. The reasoning is to create new regions through the kernel
> > interfaces which may become possible on reboot under a variety of
> > circumstances.
> > 
> > As an example, creating a x1 region with:
> > echo 1 > /sys/bus/cxl/devices/decoder1.0/create_region
> > 
> > Will yield /sys/bus/cxl/devices/decoder1.0/region1.0:0
> > 
> > That region may then be deleted with:
> > echo region1.0:0 > /sys/bus/cxl/devices/decoder1.0/delete_region
> > 
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> 
> Hi Ben,
> 
> Some comments inline. The sysfs interface is getting a little
> clever for my liking....

Thanks for the feedback.

> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl       |  21 +++
> >  .../driver-api/cxl/memory-devices.rst         |  11 ++
> >  drivers/cxl/Makefile                          |   3 +-
> >  drivers/cxl/core.c                            |  71 +++++++++
> >  drivers/cxl/cxl.h                             |  11 ++
> >  drivers/cxl/region.c                          | 147 ++++++++++++++++++
> >  drivers/cxl/region.h                          |  43 +++++
> >  7 files changed, 306 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/cxl/region.c
> >  create mode 100644 drivers/cxl/region.h
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 0b6a2e6e8fbb..115a25d2899d 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -127,3 +127,24 @@ Description:
> >  		memory (type-3). The 'target_type' attribute indicates the
> >  		current setting which may dynamically change based on what
> >  		memory regions are activated in this decode hierarchy.
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/create_region
> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		Creates a new CXL region of N interleaved ways. Writing a value
> > +		of '2' will create a new uninitialized region with 2x interleave
> > +		that will be mapped by the CXL decoderX.Y. Reading from this
> > +		node will return the last created region. Regions must be
> > +		subsequently configured and bound to a region driver before they
> > +		can be used.
> 
> I don't like sysfs attributes that return entirely different looking (and seemingly
> unrelated) values from what you write to them.
> I could sort of get behind writing 1 and returning the region (as in the RFC)
> on basis that felt like 'create' and 'what did I create'.
> Doing '2' and getting back anything other than '2' feels wrong
> 
> Maybe, would feel better if the name made it more explicit that the thing took
> interleave values?
> 
> create_region_with_interleave perhaps?

I could argue both ways. I agree the asymmetry isn't ideal. The expectation is
that userspace tooling is going to handle most of this anyway, and it's "well
documented" in Documentation/.../ABI. So I think if the interface has some
warts, it's not a huge deal, but perhaps designing it that way isn't ideal.
More below...

> 
> Not ideal.  Sysfs always a bit clunky for this stuff and configfs would mean
> a split interface which is never ideal.

Overall, the programming model follows NVDIMM. With my DRM background, my
default would have been IOCTLs - those have a long legacy. I've never touched
configfs. I'm not advocating one way or another. Maybe it would be good to
discuss pros/cons to different options?

> 
> It's a bit horrible, but maybe a more intuitive interface would be to make the
> targetX visibility dynamic and hence make interleave an attribute of the region?
> 
> sysfs_update_group() is rarely used, but I think the intent is to support
> this sort of case.
> 
> I've not fully thought through the effects of that though so might
> well be missing something.

I really would have liked something dynamic like you described. I searched for
other drivers that this without much luck. I'm relatively unfamiliar with a lot
of the device model machinery in Linux. Dan, are you familiar with this, is it
feasible? If it sounds reasonable I can spend some more time figuring out how to
do it.

> 
> Also, add to the docs what can be expected if there is no 'last created'
> or it's been removed again.
> 
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		Deletes the named region. A region must be unbound from the
> > +		region driver before being deleted. The attributes expects a
> > +		region in the form "regionX.Y:Z".
> > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > index 487ce4f41d77..c7f59a8c94db 100644
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -39,6 +39,17 @@ CXL Core
> >  .. kernel-doc:: drivers/cxl/core.c
> >     :doc: cxl core
> >  
> > +CXL Regions
> > +-----------
> > +.. kernel-doc:: drivers/cxl/region.c
> > +   :doc: cxl region
> > +
> > +.. kernel-doc:: drivers/cxl/region.h
> > +   :identifiers:
> > +
> > +.. kernel-doc:: drivers/cxl/region.c
> > +   :identifiers:
> > +
> >  External Interfaces
> >  ===================
> >  
> > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> > index 32954059b37b..c3151198c041 100644
> > --- a/drivers/cxl/Makefile
> > +++ b/drivers/cxl/Makefile
> > @@ -1,6 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  obj-$(CONFIG_CXL_BUS) += cxl_core.o
> > -obj-$(CONFIG_CXL_MEM) += cxl_pci.o
> > +obj-$(CONFIG_CXL_MEM) += cxl_pci.o cxl_region.o
> >  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> >  obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> >  
> > @@ -9,3 +9,4 @@ cxl_core-y := core.o
> >  cxl_pci-y := pci.o
> >  cxl_acpi-y := acpi.o
> >  cxl_pmem-y := pmem.o
> > +cxl_region-y := region.o
> > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c
> > index a2e4d54fc7bc..d8d7ca85e110 100644
> > --- a/drivers/cxl/core.c
> > +++ b/drivers/cxl/core.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/pci.h>
> >  #include <linux/slab.h>
> >  #include <linux/idr.h>
> > +#include "region.h"
> >  #include "cxl.h"
> >  #include "mem.h"
> >  
> > @@ -120,7 +121,68 @@ static ssize_t target_list_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RO(target_list);
> >  
> > +static ssize_t create_region_show(struct device *dev,
> > +				  struct device_attribute *attr, char *buf)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +	int rc;
> > +
> > +	device_lock(dev);
> > +	rc = sprintf(buf, "%s\n",
> > +		     cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
> 
> An alternative is to be cynical and adjust the interface a touch.
> Lets say it always returns the name of the last created region.  If that's
> true, just copy the name at creation time.  Then no need to care if the
> pointer is live or not.
> 

Sounds fine to me.

> > +	device_unlock(dev);
> > +
> > +	return rc;
> > +}
> > +
> > +static ssize_t create_region_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +	struct cxl_region *region;
> > +	ssize_t rc;
> > +	int val;
> > +
> > +	rc = kstrtoint(buf, 0, &val);
> > +	if (rc)
> > +		return rc;
> > +	if (val < 0 || val > 16)
> > +		return -EINVAL;
> > +
> > +	region = cxl_alloc_region(cxld, val);
> > +	if (IS_ERR(region))
> > +		return PTR_ERR(region);
> > +
> > +	rc = cxl_add_region(cxld, region);
> > +	if (rc) {
> > +		cxl_free_region(cxld, region);
> > +		return rc;
> > +	}
> > +
> > +	cxld->youngest = region;
> > +	return len;
> > +}
> > +static DEVICE_ATTR_RW(create_region);
> > +
> > +static ssize_t delete_region_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +	int rc;
> > +
> > +	rc = cxl_delete_region(cxld, buf);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(delete_region);
> > +
> >  static struct attribute *cxl_decoder_base_attrs[] = {
> > +	&dev_attr_create_region.attr,
> > +	&dev_attr_delete_region.attr,
> >  	&dev_attr_start.attr,
> >  	&dev_attr_size.attr,
> >  	&dev_attr_locked.attr,
> > @@ -171,7 +233,13 @@ static void cxl_decoder_release(struct device *dev)
> >  {
> >  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> >  	struct cxl_port *port = to_cxl_port(dev->parent);
> > +	struct cxl_region *region;
> >  
> > +	list_for_each_entry(region, &cxld->regions, list)
> > +		cxl_delete_region(cxld, dev_name(&region->dev));
> > +
> > +	dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
> > +		      "Lost track of a region");
> >  	ida_free(&port->decoder_ida, cxld->id);
> >  	kfree(cxld);
> >  }
> > @@ -483,8 +551,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
> >  		.interleave_ways = interleave_ways,
> >  		.interleave_granularity = interleave_granularity,
> >  		.target_type = type,
> > +		.regions = LIST_HEAD_INIT(cxld->regions),
> >  	};
> >  
> > +	ida_init(&cxld->region_ida);
> > +
> >  	/* handle implied target_list */
> >  	if (interleave_ways == 1)
> >  		cxld->target[0] =
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b6bda39a59e3..8b27a07d7d0f 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -190,6 +190,9 @@ enum cxl_decoder_type {
> >   * @interleave_granularity: data stride per dport
> >   * @target_type: accelerator vs expander (type2 vs type3) selector
> >   * @flags: memory type capabilities and locking
> > + * @region_ida: allocator for region ids.
> > + * @regions: List of regions mapped (may be disabled) by this decoder.
> > + * @youngest: Last region created for this decoder.
> >   * @target: active ordered target list in current decoder configuration
> >   */
> >  struct cxl_decoder {
> > @@ -200,6 +203,9 @@ struct cxl_decoder {
> >  	int interleave_granularity;
> >  	enum cxl_decoder_type target_type;
> >  	unsigned long flags;
> > +	struct ida region_ida;
> > +	struct list_head regions;
> > +	struct cxl_region *youngest;
> >  	struct cxl_dport *target[];
> >  };
> >  
> > @@ -262,6 +268,11 @@ struct cxl_dport {
> >  	struct list_head list;
> >  };
> >  
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> > +				    int interleave_ways);
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
> >  struct cxl_port *to_cxl_port(struct device *dev);
> >  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> >  				   resource_size_t component_reg_phys,
> > diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> > new file mode 100644
> > index 000000000000..391467e864a2
> > --- /dev/null
> > +++ b/drivers/cxl/region.c
> > @@ -0,0 +1,147 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/idr.h>
> > +#include "region.h"
> > +#include "cxl.h"
> > +#include "mem.h"
> > +
> > +/**
> > + * DOC: cxl region
> > + *
> > + * A CXL region encompasses a chunk of host physical address space that may be
> > + * consumed by a single device (x1 interleave aka linear) or across multiple
> > + * devices (xN interleaved). A region is a child device of a &struct
> > + * cxl_decoder. There may be multiple active regions under a single &struct
> > + * cxl_decoder. The common case for multiple regions would be several linear,
> > + * contiguous regions under a single decoder. Generally, there will be a 1:1
> > + * relationship between decoder and region when the region is interleaved.
> > + */
> > +
> > +static void cxl_region_release(struct device *dev);
> > +
> > +const struct device_type cxl_region_type = {
> > +	.name = "cxl_region",
> > +	.release = cxl_region_release,
> > +};
> > +
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > +	ida_free(&cxld->region_ida, region->id);
> > +	kfree(region);
> > +}
> > +
> > +static void cxl_region_release(struct device *dev)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +
> > +	cxl_free_region(cxld, region);
> > +}
> > +
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> > +				    int interleave_ways)
> > +{
> > +	struct cxl_region *region;
> > +	int rc;
> > +
> > +	region = kzalloc(struct_size(region, targets, interleave_ways),
> > +			 GFP_KERNEL);
> > +	if (!region)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	region->eniw = interleave_ways;
> > +
> > +	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > +	if (rc < 0) {
> > +		dev_err(&cxld->dev, "Couldn't get a new id\n");
> > +		kfree(region);
> > +		return ERR_PTR(rc);
> > +	}
> > +	region->id = rc;
> > +
> > +	return region;
> > +}
> > +
> > +/**
> > + * cxl_add_region - Adds a region to a decoder
> > + * @cxld: Parent decoder.
> > + * @region: Region to be added to the decoder.
> > + *
> > + * This is the second step of region initialization. Regions exist within an
> > + * address space which is mapped by a @cxld, and that @cxld enforces constraints
> > + * upon the region as it is configured. Regions may be added to a @cxld but not
> > + * activated and therefore it is possible to have more regions in a @cxld than
> > + * there are interleave ways in the @cxld. Regions exist only for persistent
> > + * capacities.
> > + *
> > + * Return: zero if the region was added to the @cxld, else returns negative
> > + * error code.
> > + */
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +	struct device *dev = &region->dev;
> > +	int rc;
> > +
> > +	device_initialize(dev);
> > +	dev->parent = &cxld->dev;
> > +	device_set_pm_not_required(dev);
> > +	dev->bus = &cxl_bus_type;
> > +	dev->type = &cxl_region_type;
> > +	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
> > +	if (rc)
> > +		goto err;
> > +
> > +	rc = device_add(dev);
> > +	if (rc)
> > +		goto err;
> > +
> > +	dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
> > +
> > +	return 0;
> > +
> > +err:
> > +	put_device(dev);
> > +	return rc;
> > +}
> > +
> > +static struct cxl_region *
> > +cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
> > +{
> > +	struct device *region_dev;
> > +
> > +	region_dev = device_find_child_by_name(&cxld->dev, name);
> > +	if (!region_dev)
> > +		return ERR_PTR(-ENOENT);
> > +
> > +	return to_cxl_region(region_dev);
> > +}
> > +
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
> > +{
> > +	struct cxl_region *region;
> > +
> > +	device_lock(&cxld->dev);
> > +
> > +	region = cxl_find_region_by_name(cxld, region_name);
> > +	if (IS_ERR(region)) {
> > +		device_unlock(&cxld->dev);
> > +		return PTR_ERR(region);
> > +	}
> > +
> > +	dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
> > +		dev_name(&region->dev), dev_name(&cxld->dev));
> > +
> > +	cmpxchg(&cxld->youngest, region, NULL);
> 
> Why does this need to be atomic?  I think the other side of anything
> that might make use of this isn't atomic, except because you are holding
> the device_lock for cxld->dev

I attempted a version that was lockless before this (failed) and this remained.
It doesn't need to be atomic.

> 
> > +
> > +	device_unregister(&region->dev);
> > +	device_unlock(&cxld->dev);
> > +
> > +	put_device(&region->dev);
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> > new file mode 100644
> > index 000000000000..7a87d229e38a
> > --- /dev/null
> > +++ b/drivers/cxl/region.h
> > @@ -0,0 +1,43 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright(c) 2021 Intel Corporation. */
> > +#ifndef __CXL_REGION_H__
> > +#define __CXL_REGION_H__
> > +
> > +#include <linux/uuid.h>
> > +
> > +extern const struct device_type cxl_region_type;
> > +
> > +/**
> > + * struct cxl_region - CXL region
> > + * @dev: This region's device.
> > + * @id: This regions id. Id is globally unique across all regions.
> > + * @res: Address space consumed by this region.
> > + * @requested_size: Size of the region determined from LSA or userspace.
> > + * @uuid: The UUID for this region.
> > + * @list: Node in decoders region list.
> > + * @eniw: Number of interleave ways this region is configured for.
> > + * @targets: The memory devices comprising the region.
> > + */
> > +struct cxl_region {
> > +	struct device dev;
> > +	int id;
> > +	struct resource *res;
> > +	u64 requested_size;
> > +	uuid_t uuid;
> > +	struct list_head list;
> > +	int eniw;
> > +	struct cxl_memdev *targets[];
> > +};
> > +
> > +static inline struct cxl_region *to_cxl_region(struct device *dev)
> > +{
> > +	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > +			  "not a cxl_region device\n"))
> > +		return NULL;
> > +
> > +	return container_of(dev, struct cxl_region, dev);
> > +}
> > +
> > +bool cxl_is_region_configured(struct cxl_region *region);
> > +
> > +#endif
> 

  reply	other threads:[~2021-06-18 15:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 17:36 [PATCH 0/6] Region creation Ben Widawsky
2021-06-17 17:36 ` [PATCH 1/6] cxl/region: Add region creation ABI Ben Widawsky
2021-06-17 21:11   ` [PATCH v2 " Ben Widawsky
2021-06-18  9:13   ` [PATCH " Jonathan Cameron
2021-06-18 15:07     ` Ben Widawsky [this message]
2021-06-18 16:39       ` Dan Williams
2021-06-17 17:36 ` [PATCH 2/6] cxl: Move cxl_memdev conversion helper to mem.h Ben Widawsky
2021-06-18  9:13   ` Jonathan Cameron
2021-06-18 15:00   ` Dan Williams
2021-06-17 17:36 ` [PATCH 3/6] cxl/region: Introduce concept of region configuration Ben Widawsky
2021-06-18 11:22   ` Jonathan Cameron
2021-06-18 15:25     ` Ben Widawsky
2021-06-18 15:44       ` Jonathan Cameron
2021-06-17 17:36 ` [PATCH 4/6] cxl/region: Introduce a cxl_region driver Ben Widawsky
2021-06-17 21:13   ` [PATCH v2 " Ben Widawsky
2021-06-18 11:49     ` Jonathan Cameron
2021-06-17 17:36 ` [PATCH 5/6] cxl/core: Convert decoder range to resource Ben Widawsky
2021-06-18 11:52   ` Jonathan Cameron
2021-06-17 17:36 ` [PATCH 6/6] cxl/region: Handle region's address space allocation Ben Widawsky
2021-06-18 13:35   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210618150749.eo433cjs4rpilayy@intel.com \
    --to=ben.widawsky@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox