From: Fan Ni <fan.ni@gmx.us>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
Navneet Singh <navneet.singh@intel.com>,
Fan Ni <fan.ni@samsung.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v2 08/18] cxl/region: Add Dynamic Capacity CXL region support
Date: Tue, 5 Sep 2023 14:09:48 -0700 [thread overview]
Message-ID: <ZPeZHIPef2otzVNU@debian> (raw)
In-Reply-To: <20230604-dcd-type2-upstream-v2-8-f740c47e7916@intel.com>
On Mon, Aug 28, 2023 at 10:20:59PM -0700, Ira Weiny wrote:
> CXL devices optionally support dynamic capacity. CXL Regions must be
> configured correctly to access this capacity. Similar to ram and pmem
> partitions, DC Regions represent different partitions of the DPA space.
>
> Interleaving is deferred due to the complexity of managing extents on
> multiple devices at the same time. However, there is nothing which
> directly prevents interleave support at this time. The check allows
> for early rejection.
>
> To maintain backwards compatibility with older software, CXL regions
> need a default DAX device to hold the reference for the region until it
> is deleted.
>
> Add create_dc_region sysfs entry to create DC regions. Share the logic
> of devm_cxl_add_dax_region() and region_is_system_ram(). Special case
> DC capable CXL regions to create a 0 sized seed DAX device until others
> can be created on dynamic space later.
>
> Flag dax_regions to indicate 0 capacity available until dax_region
> extents are supported by the region.
>
> Co-developed-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
> ---
> changes for v2:
> [iweiny: flag empty dax regions]
> [iweiny: Split out anything not directly related to creating a DC CXL
> region]
> [iweiny: Separate out dev dax stuff]
> [iweiny/navneet: create 0 sized DAX device by default]
> [iweiny: use new DC region mode]
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 20 +++++-----
> drivers/cxl/core/core.h | 1 +
> drivers/cxl/core/port.c | 1 +
> drivers/cxl/core/region.c | 71 ++++++++++++++++++++++++++++-----
> drivers/dax/bus.c | 8 ++++
> drivers/dax/bus.h | 1 +
> drivers/dax/cxl.c | 15 ++++++-
> 7 files changed, 96 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index aa65dc5b4e13..a0562938ecac 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -351,20 +351,20 @@ Description:
> interleave_granularity).
>
>
> -What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> +What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region
> Date: May, 2022, January, 2023
> -KernelVersion: v6.0 (pmem), v6.3 (ram)
> +KernelVersion: v6.0 (pmem), v6.3 (ram), v6.6 (dc)
> Contact: linux-cxl@vger.kernel.org
> Description:
> (RW) Write a string in the form 'regionZ' to start the process
> - of defining a new persistent, or volatile memory region
> - (interleave-set) within the decode range bounded by root decoder
> - 'decoderX.Y'. The value written must match the current value
> - returned from reading this attribute. An atomic compare exchange
> - operation is done on write to assign the requested id to a
> - region and allocate the region-id for the next creation attempt.
> - EBUSY is returned if the region name written does not match the
> - current cached value.
> + of defining a new persistent, volatile, or Dynamic Capacity
> + (DC) memory region (interleave-set) within the decode range
> + bounded by root decoder 'decoderX.Y'. The value written must
> + match the current value returned from reading this attribute.
> + An atomic compare exchange operation is done on write to assign
> + the requested id to a region and allocate the region-id for the
> + next creation attempt. EBUSY is returned if the region name
> + written does not match the current cached value.
>
>
> What: /sys/bus/cxl/devices/decoderX.Y/delete_region
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 45e7e044cf4a..cf3cf01cb95d 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -13,6 +13,7 @@ extern struct attribute_group cxl_base_attribute_group;
> #ifdef CONFIG_CXL_REGION
> extern struct device_attribute dev_attr_create_pmem_region;
> extern struct device_attribute dev_attr_create_ram_region;
> +extern struct device_attribute dev_attr_create_dc_region;
> extern struct device_attribute dev_attr_delete_region;
> extern struct device_attribute dev_attr_region;
> extern const struct device_type cxl_pmem_region_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index a5db710a63bc..608901bb7d91 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -314,6 +314,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
> &dev_attr_target_list.attr,
> SET_CXL_REGION_ATTR(create_pmem_region)
> SET_CXL_REGION_ATTR(create_ram_region)
> + SET_CXL_REGION_ATTR(create_dc_region)
> SET_CXL_REGION_ATTR(delete_region)
> NULL,
> };
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 69af1354bc5b..fc8dee469244 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2271,6 +2271,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> switch (mode) {
> case CXL_REGION_RAM:
> case CXL_REGION_PMEM:
> + case CXL_REGION_DC:
> break;
> default:
> dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n",
> @@ -2383,6 +2384,33 @@ static ssize_t create_ram_region_store(struct device *dev,
> }
> DEVICE_ATTR_RW(create_ram_region);
>
> +static ssize_t create_dc_region_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static ssize_t create_dc_region_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> + struct cxl_region *cxlr;
> + int rc, id;
> +
> + rc = sscanf(buf, "region%d\n", &id);
> + if (rc != 1)
> + return -EINVAL;
> +
> + cxlr = __create_region(cxlrd, id, CXL_REGION_DC,
> + CXL_DECODER_HOSTONLYMEM);
> + if (IS_ERR(cxlr))
> + return PTR_ERR(cxlr);
> +
> + return len;
> +}
> +DEVICE_ATTR_RW(create_dc_region);
> +
> static ssize_t region_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> @@ -2834,7 +2862,7 @@ static void cxlr_dax_unregister(void *_cxlr_dax)
> device_unregister(&cxlr_dax->dev);
> }
>
> -static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> +static int __devm_cxl_add_dax_region(struct cxl_region *cxlr)
> {
> struct cxl_dax_region *cxlr_dax;
> struct device *dev;
> @@ -2863,6 +2891,21 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> return rc;
> }
>
> +static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
> +{
> + return __devm_cxl_add_dax_region(cxlr);
> +}
> +
> +static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr)
> +{
> + if (cxlr->params.interleave_ways != 1) {
> + dev_err(&cxlr->dev, "Interleaving DC not supported\n");
> + return -EINVAL;
> + }
> +
> + return __devm_cxl_add_dax_region(cxlr);
> +}
> +
> static int match_decoder_by_range(struct device *dev, void *data)
> {
> struct range *r1, *r2 = data;
> @@ -3203,6 +3246,19 @@ static int is_system_ram(struct resource *res, void *arg)
> return 1;
> }
>
> +/*
> + * The region can not be manged by CXL if any portion of
> + * it is already online as 'System RAM'
> + */
> +static bool region_is_system_ram(struct cxl_region *cxlr,
> + struct cxl_region_params *p)
> +{
> + return (walk_iomem_res_desc(IORES_DESC_NONE,
> + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> + p->res->start, p->res->end, cxlr,
> + is_system_ram) > 0);
> +}
> +
> static int cxl_region_probe(struct device *dev)
> {
> struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -3242,14 +3298,7 @@ static int cxl_region_probe(struct device *dev)
> case CXL_REGION_PMEM:
> return devm_cxl_add_pmem_region(cxlr);
> case CXL_REGION_RAM:
> - /*
> - * The region can not be manged by CXL if any portion of
> - * it is already online as 'System RAM'
> - */
> - if (walk_iomem_res_desc(IORES_DESC_NONE,
> - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> - p->res->start, p->res->end, cxlr,
> - is_system_ram) > 0)
> + if (region_is_system_ram(cxlr, p))
> return 0;
>
> /*
> @@ -3261,6 +3310,10 @@ static int cxl_region_probe(struct device *dev)
>
> /* HDM-H routes to device-dax */
> return devm_cxl_add_dax_region(cxlr);
> + case CXL_REGION_DC:
> + if (region_is_system_ram(cxlr, p))
> + return 0;
> + return devm_cxl_add_dc_dax_region(cxlr);
> default:
> dev_dbg(&cxlr->dev, "unsupported region mode: %s\n",
> cxl_region_mode_name(cxlr->mode));
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 0ee96e6fc426..b76e49813a39 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -169,6 +169,11 @@ static bool is_static(struct dax_region *dax_region)
> return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
> }
>
> +static bool is_dynamic(struct dax_region *dax_region)
> +{
> + return (dax_region->res.flags & IORESOURCE_DAX_DYNAMIC_CAP) != 0;
> +}
> +
> bool static_dev_dax(struct dev_dax *dev_dax)
> {
> return is_static(dev_dax->region);
> @@ -285,6 +290,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
>
> device_lock_assert(dax_region->dev);
>
> + if (is_dynamic(dax_region))
> + return 0;
> +
> for_each_dax_region_resource(dax_region, res)
> size -= resource_size(res);
> return size;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 1ccd23360124..74d8fe4a5532 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -13,6 +13,7 @@ struct dax_region;
> /* dax bus specific ioresource flags */
> #define IORESOURCE_DAX_STATIC BIT(0)
> #define IORESOURCE_DAX_KMEM BIT(1)
> +#define IORESOURCE_DAX_DYNAMIC_CAP BIT(2)
>
> struct dax_region *alloc_dax_region(struct device *parent, int region_id,
> struct range *range, int target_node, unsigned int align,
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 8bc9d04034d6..147c8c69782b 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -13,19 +13,30 @@ static int cxl_dax_region_probe(struct device *dev)
> struct cxl_region *cxlr = cxlr_dax->cxlr;
> struct dax_region *dax_region;
> struct dev_dax_data data;
> + resource_size_t dev_size;
> + unsigned long flags;
>
> if (nid == NUMA_NO_NODE)
> nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
>
> + dev_size = range_len(&cxlr_dax->hpa_range);
> +
> + flags = IORESOURCE_DAX_KMEM;
> + if (cxlr->mode == CXL_REGION_DC) {
> + /* Add empty seed dax device */
> + dev_size = 0;
> + flags |= IORESOURCE_DAX_DYNAMIC_CAP;
> + }
> +
> dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
> - PMD_SIZE, IORESOURCE_DAX_KMEM);
> + PMD_SIZE, flags);
> if (!dax_region)
> return -ENOMEM;
>
> data = (struct dev_dax_data) {
> .dax_region = dax_region,
> .id = -1,
> - .size = range_len(&cxlr_dax->hpa_range),
> + .size = dev_size,
> };
>
> return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
>
> --
> 2.41.0
>
next prev parent reply other threads:[~2023-09-05 21:10 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-29 5:20 [PATCH RFC v2 00/18] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2023-08-29 5:20 ` [PATCH RFC v2 01/18] cxl/hdm: Debug, use decoder name function Ira Weiny
2023-08-29 14:03 ` Jonathan Cameron
2023-08-29 21:48 ` Fan Ni
2023-09-03 2:55 ` Ira Weiny
2023-08-30 20:32 ` Dave Jiang
2023-08-29 5:20 ` [PATCH RFC v2 02/18] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Ira Weiny
2023-08-29 14:07 ` Jonathan Cameron
2023-09-03 3:38 ` Ira Weiny
2023-08-29 21:49 ` Fan Ni
2023-08-30 20:33 ` Dave Jiang
2023-10-24 16:16 ` Jonathan Cameron
2023-08-29 5:20 ` [PATCH RFC v2 03/18] cxl/mem: Read Dynamic capacity configuration from the device ira.weiny
2023-08-29 14:37 ` Jonathan Cameron
2023-09-03 23:36 ` Ira Weiny
2023-08-30 21:01 ` Dave Jiang
2023-09-05 0:14 ` Ira Weiny
2023-09-08 20:23 ` Ira Weiny
2023-08-30 21:44 ` Fan Ni
2023-09-08 22:52 ` Ira Weiny
2023-09-12 21:32 ` Fan Ni
2023-09-07 15:46 ` Alison Schofield
2023-09-12 1:18 ` Ira Weiny
2023-09-08 12:46 ` Jørgen Hansen
2023-09-11 20:26 ` Ira Weiny
2023-08-29 5:20 ` [PATCH RFC v2 04/18] cxl/region: Add Dynamic Capacity decoder and region modes Ira Weiny
2023-08-29 14:39 ` Jonathan Cameron
2023-08-30 21:13 ` Dave Jiang
2023-08-31 17:00 ` Fan Ni
2023-08-29 5:20 ` [PATCH RFC v2 05/18] cxl/port: Add Dynamic Capacity mode support to endpoint decoders Ira Weiny
2023-08-29 14:49 ` Jonathan Cameron
2023-09-05 0:05 ` Ira Weiny
2023-08-31 17:25 ` Fan Ni
2023-09-08 23:26 ` Ira Weiny
2023-08-29 5:20 ` [PATCH RFC v2 06/18] cxl/port: Add Dynamic Capacity size " Ira Weiny
2023-08-29 15:09 ` Jonathan Cameron
2023-09-05 4:32 ` Ira Weiny
2023-08-29 5:20 ` [PATCH RFC v2 07/18] cxl/mem: Expose device dynamic capacity configuration ira.weiny
2023-08-29 15:14 ` Jonathan Cameron
2023-09-05 17:55 ` Fan Ni
2023-09-05 20:45 ` Ira Weiny
2023-08-30 22:46 ` Dave Jiang
2023-09-08 23:22 ` Ira Weiny
2023-08-29 5:20 ` [PATCH RFC v2 08/18] cxl/region: Add Dynamic Capacity CXL region support Ira Weiny
2023-08-29 15:19 ` Jonathan Cameron
2023-08-30 23:27 ` Dave Jiang
2023-09-06 4:36 ` Ira Weiny
2023-09-05 21:09 ` Fan Ni [this message]
2023-08-29 5:21 ` [PATCH RFC v2 09/18] cxl/mem: Read extents on memory device discovery Ira Weiny
2023-08-29 15:26 ` Jonathan Cameron
2023-08-30 0:16 ` Ira Weiny
2023-09-05 21:41 ` Ira Weiny
2023-08-29 5:21 ` [PATCH RFC v2 10/18] cxl/mem: Handle DCD add and release capacity events Ira Weiny
2023-08-29 15:59 ` Jonathan Cameron
2023-09-05 23:49 ` Ira Weiny
2023-08-31 17:28 ` Dave Jiang
2023-09-08 15:35 ` Ira Weiny
2023-08-29 5:21 ` [PATCH RFC v2 11/18] cxl/region: Expose DC extents on region driver load Ira Weiny
2023-08-29 16:20 ` Jonathan Cameron
2023-09-06 3:36 ` Ira Weiny
2023-08-31 18:38 ` Dave Jiang
2023-09-08 23:57 ` Ira Weiny
2023-08-29 5:21 ` [PATCH RFC v2 12/18] cxl/region: Notify regions of DC changes Ira Weiny
2023-08-29 16:40 ` Jonathan Cameron
2023-09-06 4:00 ` Ira Weiny
2023-09-18 13:56 ` Jørgen Hansen
2023-09-18 17:45 ` Ira Weiny
2023-08-29 5:21 ` [PATCH RFC v2 13/18] dax/bus: Factor out dev dax resize logic Ira Weiny
2023-08-30 11:27 ` Jonathan Cameron
2023-09-06 4:12 ` Ira Weiny
2023-08-31 21:48 ` Dave Jiang
2023-08-29 5:21 ` [PATCH RFC v2 14/18] dax/region: Support DAX device creation on dynamic DAX regions Ira Weiny
2023-08-30 11:50 ` Jonathan Cameron
2023-09-06 4:35 ` Ira Weiny
2023-09-12 16:49 ` Jonathan Cameron
2023-09-12 22:08 ` Ira Weiny
2023-09-12 22:35 ` Dan Williams
2023-09-13 17:30 ` Ira Weiny
2023-09-13 17:59 ` Dan Williams
2023-09-13 19:26 ` Ira Weiny
2023-09-14 10:32 ` Jonathan Cameron
2023-08-29 5:21 ` [PATCH RFC v2 15/18] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-08-29 16:46 ` Jonathan Cameron
2023-09-06 4:07 ` Ira Weiny
2023-08-29 5:21 ` [PATCH RFC v2 16/18] tools/testing/cxl: Make event logs dynamic Ira Weiny
2023-08-30 12:11 ` Jonathan Cameron
2023-09-06 21:15 ` Ira Weiny
2023-08-29 5:21 ` [PATCH RFC v2 17/18] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2023-08-30 12:20 ` Jonathan Cameron
2023-09-06 21:18 ` Ira Weiny
2023-08-31 23:19 ` Dave Jiang
2023-08-29 5:21 ` [PATCH RFC v2 18/18] tools/testing/cxl: Add Dynamic Capacity events Ira Weiny
2023-08-30 12:23 ` Jonathan Cameron
2023-09-06 21:39 ` Ira Weiny
2023-08-31 23:20 ` Dave Jiang
2023-09-07 21:01 ` [PATCH RFC v2 00/18] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni
2023-09-12 1:44 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZPeZHIPef2otzVNU@debian \
--to=fan.ni@gmx.us \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=navneet.singh@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox