From: Ira Weiny <ira.weiny@intel.com>
To: Dave Jiang <dave.jiang@intel.com>, <ira.weiny@intel.com>,
Navneet Singh <navneet.singh@intel.com>,
Fan Ni <fan.ni@samsung.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support.
Date: Thu, 15 Jun 2023 11:12:29 -0700 [thread overview]
Message-ID: <648b548db05f5_1c7ab42944a@iweiny-mobl.notmuch> (raw)
In-Reply-To: <f6cd4851-b024-c8fa-8f6e-fcc27843f120@intel.com>
Dave Jiang wrote:
>
>
> On 6/14/23 12:16, ira.weiny@intel.com wrote:
> > From: Navneet Singh <navneet.singh@intel.com>
> >
> > CXL devices optionally support dynamic capacity. CXL Regions must be
> > created to access this capacity.
> >
> > Add sysfs entries to create dynamic capacity cxl regions. Provide a new
> > Dynamic Capacity decoder mode which targets dynamic capacity on devices
> > which are added to that region.
> >
> > Below are the steps to create and delete dynamic capacity region0
> > (example).
> >
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> >
> > echo "dc0" >/sys/bus/cxl/devices/decoder1.0/mode
> > echo 0x400000000 >/sys/bus/cxl/devices/decoder1.0/dpa_size
> >
> > echo 0x400000000 > /sys/bus/cxl/devices/$region/size
> > echo "decoder1.0" > /sys/bus/cxl/devices/$region/target0
> > echo 1 > /sys/bus/cxl/devices/$region/commit
> > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> >
> > echo $region> /sys/bus/cxl/devices/decoder0.0/delete_region
> >
> > Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> >
> > ---
> > [iweiny: fixups]
> > [iweiny: remove unused CXL_DC_REGION_MODE macro]
> > [iweiny: Make dc_mode_to_region_index static]
> > [iweiny: simplify <sysfs>/create_dc_region]
> > [iweiny: introduce decoder_mode_is_dc]
> > [djbw: fixups, no sign-off: preview only]
> > ---
> > drivers/cxl/Kconfig | 11 +++
> > drivers/cxl/core/core.h | 7 ++
> > drivers/cxl/core/hdm.c | 234 ++++++++++++++++++++++++++++++++++++++++++----
> > drivers/cxl/core/port.c | 18 ++++
> > drivers/cxl/core/region.c | 135 ++++++++++++++++++++++++--
> > drivers/cxl/cxl.h | 28 ++++++
> > drivers/dax/cxl.c | 4 +
> > 7 files changed, 409 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index ff4e78117b31..df034889d053 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -121,6 +121,17 @@ config CXL_REGION
> >
> > If unsure say 'y'
> >
> > +config CXL_DCD
> > + bool "CXL: DCD Support"
> > + default CXL_BUS
> > + depends on CXL_REGION
> > + help
> > + Enable the CXL core to provision CXL DCD regions.
> > + CXL devices optionally support dynamic capacity and DCD region
> > + maps the dynamic capacity regions DPA's into Host HPA ranges.
> > +
> > + If unsure say 'y'
> > +
> > config CXL_REGION_INVALIDATION_TEST
> > bool "CXL: Region Cache Management Bypass (TEST)"
> > depends on CXL_REGION
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 27f0968449de..725700ab5973 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -9,6 +9,13 @@ extern const struct device_type cxl_nvdimm_type;
> >
> > extern struct attribute_group cxl_base_attribute_group;
> >
> > +#ifdef CONFIG_CXL_DCD
> > +extern struct device_attribute dev_attr_create_dc_region;
> > +#define SET_CXL_DC_REGION_ATTR(x) (&dev_attr_##x.attr),
> > +#else
> > +#define SET_CXL_DC_REGION_ATTR(x)
> > +#endif
> > +
> > #ifdef CONFIG_CXL_REGION
> > extern struct device_attribute dev_attr_create_pmem_region;
> > extern struct device_attribute dev_attr_create_ram_region;
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 514d30131d92..29649b47d177 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -233,14 +233,23 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > struct resource *res = cxled->dpa_res;
> > resource_size_t skip_start;
> > + resource_size_t skipped = cxled->skip;
> >
> > lockdep_assert_held_write(&cxl_dpa_rwsem);
> >
> > /* save @skip_start, before @res is released */
> > - skip_start = res->start - cxled->skip;
> > + skip_start = res->start - skipped;
> > __release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > - if (cxled->skip)
> > - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> > + if (cxled->skip != 0) {
> > + while (skipped != 0) {
> > + res = xa_load(&cxled->skip_res, skip_start);
> > + __release_region(&cxlds->dpa_res, skip_start,
> > + resource_size(res));
> > + xa_erase(&cxled->skip_res, skip_start);
> > + skip_start += resource_size(res);
> > + skipped -= resource_size(res);
> > + }
> > + }
> > cxled->skip = 0;
> > cxled->dpa_res = NULL;
> > put_device(&cxled->cxld.dev);
> > @@ -267,6 +276,19 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > __cxl_dpa_release(cxled);
> > }
> >
> > +static int dc_mode_to_region_index(enum cxl_decoder_mode mode)
> > +{
> > + int index = 0;
> > +
> > + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) {
> > + if (mode == i)
> > + return index;
> > + index++;
> > + }
> > +
> > + return -EINVAL;
> > +}
> > +
> > static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > resource_size_t base, resource_size_t len,
> > resource_size_t skipped)
> > @@ -275,7 +297,11 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > struct cxl_port *port = cxled_to_port(cxled);
> > struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > struct device *dev = &port->dev;
> > + struct device *ed_dev = &cxled->cxld.dev;
> > + struct resource *dpa_res = &cxlds->dpa_res;
> > + resource_size_t skip_len = 0;
> > struct resource *res;
> > + int rc, index;
> >
> > lockdep_assert_held_write(&cxl_dpa_rwsem);
> >
> > @@ -304,28 +330,119 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > }
> >
> > if (skipped) {
> > - res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
> > - dev_name(&cxled->cxld.dev), 0);
> > - if (!res) {
> > - dev_dbg(dev,
> > - "decoder%d.%d: failed to reserve skipped space\n",
> > - port->id, cxled->cxld.id);
> > - return -EBUSY;
> > + resource_size_t skip_base = base - skipped;
> > +
> > + if (decoder_mode_is_dc(cxled->mode)) {
>
> Maybe move this entire block to a helper function to reduce the size of
> the current function and reduce indent levels and improve readability?
:-/
I'll work on breaking it out more. The logic here is getting kind of
crazy.
>
> > + if (resource_size(&cxlds->ram_res) &&
> > + skip_base <= cxlds->ram_res.end) {
> > + skip_len = cxlds->ram_res.end - skip_base + 1;
> > + res = __request_region(dpa_res, skip_base,
> > + skip_len, dev_name(ed_dev), 0);
> > + if (!res)
> > + goto error;
> > +
> > + rc = xa_insert(&cxled->skip_res, skip_base, res,
> > + GFP_KERNEL);
> > + skip_base += skip_len;
> > + }
> > +
> > + if (resource_size(&cxlds->ram_res) &&
^^^^^^^
pmem_res?
> > + skip_base <= cxlds->pmem_res.end) {
The 2 if statements here are almost exactly the same. To the point I
wonder if there is a bug.
Navneet,
Why does the code check ram_res the second time but go on to use pmem_res
in the block?
> > + skip_len = cxlds->pmem_res.end - skip_base + 1;
> > + res = __request_region(dpa_res, skip_base,
> > + skip_len, dev_name(ed_dev), 0);
> > + if (!res)
> > + goto error;
> > +
> > + rc = xa_insert(&cxled->skip_res, skip_base, res,
> > + GFP_KERNEL);
> > + skip_base += skip_len;
> > + }
> > +
> > + index = dc_mode_to_region_index(cxled->mode);
> > + for (int i = 0; i <= index; i++) {
> > + struct resource *dcr = &cxlds->dc_res[i];
> > +
> > + if (skip_base < dcr->start) {
> > + skip_len = dcr->start - skip_base;
> > + res = __request_region(dpa_res,
> > + skip_base, skip_len,
> > + dev_name(ed_dev), 0);
> > + if (!res)
> > + goto error;
> > +
> > + rc = xa_insert(&cxled->skip_res, skip_base,
> > + res, GFP_KERNEL);
> > + skip_base += skip_len;
> > + }
> > +
> > + if (skip_base == base) {
> > + dev_dbg(dev, "skip done!\n");
> > + break;
> > + }
> > +
> > + if (resource_size(dcr) &&
> > + skip_base <= dcr->end) {
> > + if (skip_base > base)
> > + dev_err(dev, "Skip error\n");
> > +
> > + skip_len = dcr->end - skip_base + 1;
> > + res = __request_region(dpa_res, skip_base,
> > + skip_len,
> > + dev_name(ed_dev), 0);
> > + if (!res)
> > + goto error;
> > +
> > + rc = xa_insert(&cxled->skip_res, skip_base,
> > + res, GFP_KERNEL);
> > + skip_base += skip_len;
> > + }
> > + }
> > + } else {
> > + res = __request_region(dpa_res, base - skipped, skipped,
> > + dev_name(ed_dev), 0);
> > + if (!res)
> > + goto error;
> > +
> > + rc = xa_insert(&cxled->skip_res, skip_base, res,
> > + GFP_KERNEL);
> > }
> > }
> > - res = __request_region(&cxlds->dpa_res, base, len,
> > - dev_name(&cxled->cxld.dev), 0);
> > +
> > + res = __request_region(dpa_res, base, len, dev_name(ed_dev), 0);
> > if (!res) {
> > dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
> > - port->id, cxled->cxld.id);
> > - if (skipped)
> > - __release_region(&cxlds->dpa_res, base - skipped,
> > - skipped);
> > + port->id, cxled->cxld.id);
> > + if (skipped) {
> > + resource_size_t skip_base = base - skipped;
> > +
> > + while (skipped != 0) {
> > + if (skip_base > base)
> > + dev_err(dev, "Skip error\n");
> > +
> > + res = xa_load(&cxled->skip_res, skip_base);
> > + __release_region(dpa_res, skip_base,
> > + resource_size(res));
> > + xa_erase(&cxled->skip_res, skip_base);
> > + skip_base += resource_size(res);
> > + skipped -= resource_size(res);
> > + }
> > + }
> > return -EBUSY;
> > }
> > cxled->dpa_res = res;
> > cxled->skip = skipped;
> >
> > + for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) {
> > + int index = dc_mode_to_region_index(mode);
> > +
> > + if (resource_contains(&cxlds->dc_res[index], res)) {
> > + cxled->mode = mode;
> > + dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id,
> > + cxled->cxld.id, cxled->dpa_res, cxled->mode);
> > + goto success > + }
> > + }
>
> This block should only happen if decoder_mode_is_dc() right? If that's
> the case, you might be able to refactor it so the 'goto success' isn't
> necessary.
I'll check. I looked through this code a couple of times in my review
before posting because I'm not 100% sure I want to see 8 different modes
DC decoders and regions.
I think the 'mode' should be 'DC' with an index in the endpoint decoder to
map DC region that decoder is mapping. But that change was much bigger to
Navneets code and I wanted to see how others felt about having DC0 - DC7
modes. My compromise was creating decoder_mode_is_dc().
>
> > if (resource_contains(&cxlds->pmem_res, res))
> > cxled->mode = CXL_DECODER_PMEM;
> > else if (resource_contains(&cxlds->ram_res, res))
> > @@ -336,9 +453,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > cxled->mode = CXL_DECODER_MIXED;
> > }
> >
> > +success:
> > port->hdm_end++;
> > get_device(&cxled->cxld.dev);
> > return 0;
> > +
> > +error:
> > + dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space\n",
> > + port->id, cxled->cxld.id);
> > + return -EBUSY;
> > +
> > }
> >
> > int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > @@ -429,6 +553,14 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > switch (mode) {
> > case CXL_DECODER_RAM:
> > case CXL_DECODER_PMEM:
> > + case CXL_DECODER_DC0:
> > + case CXL_DECODER_DC1:
> > + case CXL_DECODER_DC2:
> > + case CXL_DECODER_DC3:
> > + case CXL_DECODER_DC4:
> > + case CXL_DECODER_DC5:
> > + case CXL_DECODER_DC6:
> > + case CXL_DECODER_DC7:
For example this seems very hacky...
[snip]
> >
> > +/*
> > + * The region can not be manged by CXL if any portion of
> > + * it is already online as 'System RAM'
> > + */
> > +static bool region_is_system_ram(struct cxl_region *cxlr,
> > + struct cxl_region_params *p)
> > +{
> > + return (walk_iomem_res_desc(IORES_DESC_NONE,
> > + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> > + p->res->start, p->res->end, cxlr,
> > + is_system_ram) > 0);
> > +}
> > +
> > static int cxl_region_probe(struct device *dev)
> > {
> > struct cxl_region *cxlr = to_cxl_region(dev);
> > @@ -3174,14 +3287,7 @@ static int cxl_region_probe(struct device *dev)
> > case CXL_DECODER_PMEM:
> > return devm_cxl_add_pmem_region(cxlr);
> > case CXL_DECODER_RAM:
> > - /*
> > - * The region can not be manged by CXL if any portion of
> > - * it is already online as 'System RAM'
> > - */
> > - if (walk_iomem_res_desc(IORES_DESC_NONE,
> > - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> > - p->res->start, p->res->end, cxlr,
> > - is_system_ram) > 0)
> > + if (region_is_system_ram(cxlr, p))
>
> Maybe split this change out as a prep patch before the current patch.
That seems reasonable. But the patch is not so large and the
justification for creating a helper is that we need this same check for DC
regions. So it seemed ok to leave it like this. Let me see about
splitting it out.
[snip]
> > diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> > index ccdf8de85bd5..eb5eb81bfbd7 100644
> > --- a/drivers/dax/cxl.c
> > +++ b/drivers/dax/cxl.c
> > @@ -23,11 +23,15 @@ static int cxl_dax_region_probe(struct device *dev)
> > if (!dax_region)
> > return -ENOMEM;
> >
> > + if (decoder_mode_is_dc(cxlr->mode))
> > + return 0;
> > +
> > data = (struct dev_dax_data) {
> > .dax_region = dax_region,
> > .id = -1,
> > .size = range_len(&cxlr_dax->hpa_range),
> > };
> > +
>
> Stray blank line?
Opps! Fixed!
Ira
next prev parent reply other threads:[~2023-06-15 18:12 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-14 19:16 [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2023-06-14 19:16 ` [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device ira.weiny
2023-06-14 22:53 ` Dave Jiang
2023-06-15 15:04 ` Ira Weiny
2023-06-14 23:49 ` Alison Schofield
2023-06-15 22:46 ` Ira Weiny
2023-06-15 18:30 ` Fan Ni
2023-06-15 19:17 ` Navneet Singh
2023-06-15 21:41 ` Fan Ni
2023-06-22 15:58 ` Jonathan Cameron
2023-06-24 13:08 ` Ira Weiny
2023-07-03 2:29 ` Jonathan Cameron
2023-06-14 19:16 ` [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support ira.weiny
2023-06-14 23:37 ` Dave Jiang
2023-06-15 18:12 ` Ira Weiny [this message]
2023-06-15 18:28 ` Dave Jiang
2023-06-16 3:52 ` Navneet Singh
2023-06-15 18:56 ` Navneet Singh
2023-06-15 0:21 ` Alison Schofield
2023-06-16 2:06 ` Ira Weiny
2023-06-16 15:56 ` Alison Schofield
2023-06-16 16:51 ` Alison Schofield
2023-06-21 2:44 ` Ira Weiny
2023-06-20 17:55 ` Fan Ni
2023-06-20 20:33 ` Ira Weiny
2023-06-21 3:13 ` Navneet Singh
2023-06-21 17:20 ` Fan Ni
2023-06-23 18:02 ` Ira Weiny
2023-06-22 16:34 ` Jonathan Cameron
2023-07-05 14:49 ` Davidlohr Bueso
2023-06-14 19:16 ` [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace ira.weiny
2023-06-15 0:40 ` Alison Schofield
2023-06-16 2:47 ` Ira Weiny
2023-06-16 15:58 ` Dave Jiang
2023-06-20 16:23 ` Ira Weiny
2023-06-20 16:48 ` Dave Jiang
2023-06-15 15:41 ` Dave Jiang
2023-06-14 19:16 ` [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events ira.weiny
2023-06-15 2:19 ` Alison Schofield
2023-06-16 4:11 ` Ira Weiny
2023-06-27 18:20 ` Fan Ni
2023-06-15 16:58 ` Dave Jiang
2023-06-22 17:01 ` Jonathan Cameron
2023-06-29 15:19 ` Ira Weiny
2023-06-27 18:17 ` Fan Ni
2023-07-13 12:55 ` Jørgen Hansen
2023-06-14 19:16 ` [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-06-15 17:08 ` Dave Jiang
2023-06-15 0:56 ` [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) Alison Schofield
2023-06-16 2:57 ` Ira Weiny
2023-06-15 14:51 ` Ira Weiny
2023-06-22 15:07 ` Jonathan Cameron
2023-06-22 16:37 ` Jonathan Cameron
2023-06-27 14:59 ` Ira Weiny
2023-06-29 15:30 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=648b548db05f5_1c7ab42944a@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=fan.ni@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=navneet.singh@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox