From: Ben Widawsky <ben.widawsky@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org,
Alison Schofield <alison.schofield@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>
Subject: Re: [PATCH 1/6] cxl/region: Add region creation ABI
Date: Fri, 18 Jun 2021 08:07:49 -0700 [thread overview]
Message-ID: <20210618150749.eo433cjs4rpilayy@intel.com> (raw)
In-Reply-To: <20210618101311.00005f78@Huawei.com>
On 21-06-18 10:13:11, Jonathan Cameron wrote:
> On Thu, 17 Jun 2021 10:36:50 -0700
> Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> > Regions are created as a child of the decoder that encompasses an
> > address space with constraints. Regions only exist for persistent
> > capacities.
> >
> > When regions are created, the number of desired interleave ways must be
> > known. To enable this, the sysfs attribute will take the desired ways as
> > input. This interface intentionally allows creation of
> > impossible-to-enable regions based on interleave constraints in the
> > topology. The reasoning is to create new regions through the kernel
> > interfaces which may become possible on reboot under a variety of
> > circumstances.
> >
> > As an example, creating a x1 region with:
> > echo 1 > /sys/bus/cxl/devices/decoder1.0/create_region
> >
> > Will yield /sys/bus/cxl/devices/decoder1.0/region1.0:0
> >
> > That region may then be deleted with:
> > echo region1.0:0 > /sys/bus/cxl/devices/decoder1.0/delete_region
> >
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
>
> Hi Ben,
>
> Some comments inline. The sysfs interface is getting a little
> clever for my liking....
Thanks for the feedback.
>
> > ---
> > Documentation/ABI/testing/sysfs-bus-cxl | 21 +++
> > .../driver-api/cxl/memory-devices.rst | 11 ++
> > drivers/cxl/Makefile | 3 +-
> > drivers/cxl/core.c | 71 +++++++++
> > drivers/cxl/cxl.h | 11 ++
> > drivers/cxl/region.c | 147 ++++++++++++++++++
> > drivers/cxl/region.h | 43 +++++
> > 7 files changed, 306 insertions(+), 1 deletion(-)
> > create mode 100644 drivers/cxl/region.c
> > create mode 100644 drivers/cxl/region.h
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 0b6a2e6e8fbb..115a25d2899d 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -127,3 +127,24 @@ Description:
> > memory (type-3). The 'target_type' attribute indicates the
> > current setting which may dynamically change based on what
> > memory regions are activated in this decode hierarchy.
> > +
> > +What: /sys/bus/cxl/devices/decoderX.Y/create_region
> > +Date: June, 2021
> > +KernelVersion: v5.14
> > +Contact: linux-cxl@vger.kernel.org
> > +Description:
> > + Creates a new CXL region of N interleaved ways. Writing a value
> > + of '2' will create a new uninitialized region with 2x interleave
> > + that will be mapped by the CXL decoderX.Y. Reading from this
> > + node will return the last created region. Regions must be
> > + subsequently configured and bound to a region driver before they
> > + can be used.
>
> I don't like sysfs attributes that return entirely different looking (and seemingly
> unrelated) values from what you write to them.
> I could sort of get behind writing 1 and returning the region (as in the RFC)
> on basis that felt like 'create' and 'what did I create'.
> Doing '2' and getting back anything other than '2' feels wrong
>
> Maybe, would feel better if the name made it more explicit that the thing took
> interleave values?
>
> create_region_with_interleave perhaps?
I could argue both ways. I agree the asymmetry isn't ideal. The expectation is
that userspace tooling is going to handle most of this anyway, and it's "well
documented" in Documentation/.../ABI. So I think if the interface has some
warts, it's not a huge deal, but perhaps designing it that way isn't ideal.
More below...
>
> Not ideal. Sysfs always a bit clunky for this stuff and configfs would mean
> a split interface which is never ideal.
Overall, the programming model follows NVDIMM. With my DRM background, my
default would have been IOCTLs - those have a long legacy. I've never touched
configfs. I'm not advocating one way or another. Maybe it would be good to
discuss pros/cons to different options?
>
> It's a bit horrible, but maybe a more intuitive interface would be to make the
> targetX visibility dynamic and hence make interleave an attribute of the region?
>
> sysfs_update_group() is rarely used, but I think the intent is to support
> this sort of case.
>
> I've not fully thought through the effects of that though so might
> well be missing something.
I really would have liked something dynamic like you described. I searched for
other drivers that this without much luck. I'm relatively unfamiliar with a lot
of the device model machinery in Linux. Dan, are you familiar with this, is it
feasible? If it sounds reasonable I can spend some more time figuring out how to
do it.
>
> Also, add to the docs what can be expected if there is no 'last created'
> or it's been removed again.
>
> > +
> > +What: /sys/bus/cxl/devices/decoderX.Y/delete_region
> > +Date: June, 2021
> > +KernelVersion: v5.14
> > +Contact: linux-cxl@vger.kernel.org
> > +Description:
> > + Deletes the named region. A region must be unbound from the
> > + region driver before being deleted. The attributes expects a
> > + region in the form "regionX.Y:Z".
> > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > index 487ce4f41d77..c7f59a8c94db 100644
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -39,6 +39,17 @@ CXL Core
> > .. kernel-doc:: drivers/cxl/core.c
> > :doc: cxl core
> >
> > +CXL Regions
> > +-----------
> > +.. kernel-doc:: drivers/cxl/region.c
> > + :doc: cxl region
> > +
> > +.. kernel-doc:: drivers/cxl/region.h
> > + :identifiers:
> > +
> > +.. kernel-doc:: drivers/cxl/region.c
> > + :identifiers:
> > +
> > External Interfaces
> > ===================
> >
> > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> > index 32954059b37b..c3151198c041 100644
> > --- a/drivers/cxl/Makefile
> > +++ b/drivers/cxl/Makefile
> > @@ -1,6 +1,6 @@
> > # SPDX-License-Identifier: GPL-2.0
> > obj-$(CONFIG_CXL_BUS) += cxl_core.o
> > -obj-$(CONFIG_CXL_MEM) += cxl_pci.o
> > +obj-$(CONFIG_CXL_MEM) += cxl_pci.o cxl_region.o
> > obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> > obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> >
> > @@ -9,3 +9,4 @@ cxl_core-y := core.o
> > cxl_pci-y := pci.o
> > cxl_acpi-y := acpi.o
> > cxl_pmem-y := pmem.o
> > +cxl_region-y := region.o
> > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c
> > index a2e4d54fc7bc..d8d7ca85e110 100644
> > --- a/drivers/cxl/core.c
> > +++ b/drivers/cxl/core.c
> > @@ -6,6 +6,7 @@
> > #include <linux/pci.h>
> > #include <linux/slab.h>
> > #include <linux/idr.h>
> > +#include "region.h"
> > #include "cxl.h"
> > #include "mem.h"
> >
> > @@ -120,7 +121,68 @@ static ssize_t target_list_show(struct device *dev,
> > }
> > static DEVICE_ATTR_RO(target_list);
> >
> > +static ssize_t create_region_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > + int rc;
> > +
> > + device_lock(dev);
> > + rc = sprintf(buf, "%s\n",
> > + cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
>
> An alternative is to be cynical and adjust the interface a touch.
> Lets say it always returns the name of the last created region. If that's
> true, just copy the name at creation time. Then no need to care if the
> pointer is live or not.
>
Sounds fine to me.
> > + device_unlock(dev);
> > +
> > + return rc;
> > +}
> > +
> > +static ssize_t create_region_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t len)
> > +{
> > + struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > + struct cxl_region *region;
> > + ssize_t rc;
> > + int val;
> > +
> > + rc = kstrtoint(buf, 0, &val);
> > + if (rc)
> > + return rc;
> > + if (val < 0 || val > 16)
> > + return -EINVAL;
> > +
> > + region = cxl_alloc_region(cxld, val);
> > + if (IS_ERR(region))
> > + return PTR_ERR(region);
> > +
> > + rc = cxl_add_region(cxld, region);
> > + if (rc) {
> > + cxl_free_region(cxld, region);
> > + return rc;
> > + }
> > +
> > + cxld->youngest = region;
> > + return len;
> > +}
> > +static DEVICE_ATTR_RW(create_region);
> > +
> > +static ssize_t delete_region_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t len)
> > +{
> > + struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > + int rc;
> > +
> > + rc = cxl_delete_region(cxld, buf);
> > + if (rc)
> > + return rc;
> > +
> > + return len;
> > +}
> > +static DEVICE_ATTR_WO(delete_region);
> > +
> > static struct attribute *cxl_decoder_base_attrs[] = {
> > + &dev_attr_create_region.attr,
> > + &dev_attr_delete_region.attr,
> > &dev_attr_start.attr,
> > &dev_attr_size.attr,
> > &dev_attr_locked.attr,
> > @@ -171,7 +233,13 @@ static void cxl_decoder_release(struct device *dev)
> > {
> > struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > struct cxl_port *port = to_cxl_port(dev->parent);
> > + struct cxl_region *region;
> >
> > + list_for_each_entry(region, &cxld->regions, list)
> > + cxl_delete_region(cxld, dev_name(®ion->dev));
> > +
> > + dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
> > + "Lost track of a region");
> > ida_free(&port->decoder_ida, cxld->id);
> > kfree(cxld);
> > }
> > @@ -483,8 +551,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
> > .interleave_ways = interleave_ways,
> > .interleave_granularity = interleave_granularity,
> > .target_type = type,
> > + .regions = LIST_HEAD_INIT(cxld->regions),
> > };
> >
> > + ida_init(&cxld->region_ida);
> > +
> > /* handle implied target_list */
> > if (interleave_ways == 1)
> > cxld->target[0] =
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b6bda39a59e3..8b27a07d7d0f 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -190,6 +190,9 @@ enum cxl_decoder_type {
> > * @interleave_granularity: data stride per dport
> > * @target_type: accelerator vs expander (type2 vs type3) selector
> > * @flags: memory type capabilities and locking
> > + * @region_ida: allocator for region ids.
> > + * @regions: List of regions mapped (may be disabled) by this decoder.
> > + * @youngest: Last region created for this decoder.
> > * @target: active ordered target list in current decoder configuration
> > */
> > struct cxl_decoder {
> > @@ -200,6 +203,9 @@ struct cxl_decoder {
> > int interleave_granularity;
> > enum cxl_decoder_type target_type;
> > unsigned long flags;
> > + struct ida region_ida;
> > + struct list_head regions;
> > + struct cxl_region *youngest;
> > struct cxl_dport *target[];
> > };
> >
> > @@ -262,6 +268,11 @@ struct cxl_dport {
> > struct list_head list;
> > };
> >
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> > + int interleave_ways);
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
> > struct cxl_port *to_cxl_port(struct device *dev);
> > struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> > resource_size_t component_reg_phys,
> > diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> > new file mode 100644
> > index 000000000000..391467e864a2
> > --- /dev/null
> > +++ b/drivers/cxl/region.c
> > @@ -0,0 +1,147 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/idr.h>
> > +#include "region.h"
> > +#include "cxl.h"
> > +#include "mem.h"
> > +
> > +/**
> > + * DOC: cxl region
> > + *
> > + * A CXL region encompasses a chunk of host physical address space that may be
> > + * consumed by a single device (x1 interleave aka linear) or across multiple
> > + * devices (xN interleaved). A region is a child device of a &struct
> > + * cxl_decoder. There may be multiple active regions under a single &struct
> > + * cxl_decoder. The common case for multiple regions would be several linear,
> > + * contiguous regions under a single decoder. Generally, there will be a 1:1
> > + * relationship between decoder and region when the region is interleaved.
> > + */
> > +
> > +static void cxl_region_release(struct device *dev);
> > +
> > +const struct device_type cxl_region_type = {
> > + .name = "cxl_region",
> > + .release = cxl_region_release,
> > +};
> > +
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > + ida_free(&cxld->region_ida, region->id);
> > + kfree(region);
> > +}
> > +
> > +static void cxl_region_release(struct device *dev)
> > +{
> > + struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > + struct cxl_region *region = to_cxl_region(dev);
> > +
> > + cxl_free_region(cxld, region);
> > +}
> > +
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld,
> > + int interleave_ways)
> > +{
> > + struct cxl_region *region;
> > + int rc;
> > +
> > + region = kzalloc(struct_size(region, targets, interleave_ways),
> > + GFP_KERNEL);
> > + if (!region)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + region->eniw = interleave_ways;
> > +
> > + rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > + if (rc < 0) {
> > + dev_err(&cxld->dev, "Couldn't get a new id\n");
> > + kfree(region);
> > + return ERR_PTR(rc);
> > + }
> > + region->id = rc;
> > +
> > + return region;
> > +}
> > +
> > +/**
> > + * cxl_add_region - Adds a region to a decoder
> > + * @cxld: Parent decoder.
> > + * @region: Region to be added to the decoder.
> > + *
> > + * This is the second step of region initialization. Regions exist within an
> > + * address space which is mapped by a @cxld, and that @cxld enforces constraints
> > + * upon the region as it is configured. Regions may be added to a @cxld but not
> > + * activated and therefore it is possible to have more regions in a @cxld than
> > + * there are interleave ways in the @cxld. Regions exist only for persistent
> > + * capacities.
> > + *
> > + * Return: zero if the region was added to the @cxld, else returns negative
> > + * error code.
> > + */
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > + struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > + struct device *dev = ®ion->dev;
> > + int rc;
> > +
> > + device_initialize(dev);
> > + dev->parent = &cxld->dev;
> > + device_set_pm_not_required(dev);
> > + dev->bus = &cxl_bus_type;
> > + dev->type = &cxl_region_type;
> > + rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
> > + if (rc)
> > + goto err;
> > +
> > + rc = device_add(dev);
> > + if (rc)
> > + goto err;
> > +
> > + dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
> > +
> > + return 0;
> > +
> > +err:
> > + put_device(dev);
> > + return rc;
> > +}
> > +
> > +static struct cxl_region *
> > +cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
> > +{
> > + struct device *region_dev;
> > +
> > + region_dev = device_find_child_by_name(&cxld->dev, name);
> > + if (!region_dev)
> > + return ERR_PTR(-ENOENT);
> > +
> > + return to_cxl_region(region_dev);
> > +}
> > +
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
> > +{
> > + struct cxl_region *region;
> > +
> > + device_lock(&cxld->dev);
> > +
> > + region = cxl_find_region_by_name(cxld, region_name);
> > + if (IS_ERR(region)) {
> > + device_unlock(&cxld->dev);
> > + return PTR_ERR(region);
> > + }
> > +
> > + dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
> > + dev_name(®ion->dev), dev_name(&cxld->dev));
> > +
> > + cmpxchg(&cxld->youngest, region, NULL);
>
> Why does this need to be atomic? I think the other side of anything
> that might make use of this isn't atomic, except because you are holding
> the device_lock for cxld->dev
I attempted a version that was lockless before this (failed) and this remained.
It doesn't need to be atomic.
>
> > +
> > + device_unregister(®ion->dev);
> > + device_unlock(&cxld->dev);
> > +
> > + put_device(®ion->dev);
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> > new file mode 100644
> > index 000000000000..7a87d229e38a
> > --- /dev/null
> > +++ b/drivers/cxl/region.h
> > @@ -0,0 +1,43 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright(c) 2021 Intel Corporation. */
> > +#ifndef __CXL_REGION_H__
> > +#define __CXL_REGION_H__
> > +
> > +#include <linux/uuid.h>
> > +
> > +extern const struct device_type cxl_region_type;
> > +
> > +/**
> > + * struct cxl_region - CXL region
> > + * @dev: This region's device.
> > + * @id: This regions id. Id is globally unique across all regions.
> > + * @res: Address space consumed by this region.
> > + * @requested_size: Size of the region determined from LSA or userspace.
> > + * @uuid: The UUID for this region.
> > + * @list: Node in decoders region list.
> > + * @eniw: Number of interleave ways this region is configured for.
> > + * @targets: The memory devices comprising the region.
> > + */
> > +struct cxl_region {
> > + struct device dev;
> > + int id;
> > + struct resource *res;
> > + u64 requested_size;
> > + uuid_t uuid;
> > + struct list_head list;
> > + int eniw;
> > + struct cxl_memdev *targets[];
> > +};
> > +
> > +static inline struct cxl_region *to_cxl_region(struct device *dev)
> > +{
> > + if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > + "not a cxl_region device\n"))
> > + return NULL;
> > +
> > + return container_of(dev, struct cxl_region, dev);
> > +}
> > +
> > +bool cxl_is_region_configured(struct cxl_region *region);
> > +
> > +#endif
>
next prev parent reply other threads:[~2021-06-18 15:08 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-17 17:36 [PATCH 0/6] Region creation Ben Widawsky
2021-06-17 17:36 ` [PATCH 1/6] cxl/region: Add region creation ABI Ben Widawsky
2021-06-17 21:11 ` [PATCH v2 " Ben Widawsky
2021-06-18 9:13 ` [PATCH " Jonathan Cameron
2021-06-18 15:07 ` Ben Widawsky [this message]
2021-06-18 16:39 ` Dan Williams
2021-06-17 17:36 ` [PATCH 2/6] cxl: Move cxl_memdev conversion helper to mem.h Ben Widawsky
2021-06-18 9:13 ` Jonathan Cameron
2021-06-18 15:00 ` Dan Williams
2021-06-17 17:36 ` [PATCH 3/6] cxl/region: Introduce concept of region configuration Ben Widawsky
2021-06-18 11:22 ` Jonathan Cameron
2021-06-18 15:25 ` Ben Widawsky
2021-06-18 15:44 ` Jonathan Cameron
2021-06-17 17:36 ` [PATCH 4/6] cxl/region: Introduce a cxl_region driver Ben Widawsky
2021-06-17 21:13 ` [PATCH v2 " Ben Widawsky
2021-06-18 11:49 ` Jonathan Cameron
2021-06-17 17:36 ` [PATCH 5/6] cxl/core: Convert decoder range to resource Ben Widawsky
2021-06-18 11:52 ` Jonathan Cameron
2021-06-17 17:36 ` [PATCH 6/6] cxl/region: Handle region's address space allocation Ben Widawsky
2021-06-18 13:35 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210618150749.eo433cjs4rpilayy@intel.com \
--to=ben.widawsky@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox