From: Dave Jiang <dave.jiang@intel.com>
To: Ira Weiny <ira.weiny@intel.com>,
Navneet Singh <navneet.singh@intel.com>,
Fan Ni <fan.ni@samsung.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support.
Date: Thu, 15 Jun 2023 11:28:26 -0700 [thread overview]
Message-ID: <06bcf919-1abb-33d2-105b-b7e7a7c73b96@intel.com> (raw)
In-Reply-To: <648b548db05f5_1c7ab42944a@iweiny-mobl.notmuch>
On 6/15/23 11:12, Ira Weiny wrote:
> Dave Jiang wrote:
>>
>>
>> On 6/14/23 12:16, ira.weiny@intel.com wrote:
>>> From: Navneet Singh <navneet.singh@intel.com>
>>>
>>> CXL devices optionally support dynamic capacity. CXL Regions must be
>>> created to access this capacity.
>>>
>>> Add sysfs entries to create dynamic capacity cxl regions. Provide a new
>>> Dynamic Capacity decoder mode which targets dynamic capacity on devices
>>> which are added to that region.
>>>
>>> Below are the steps to create and delete dynamic capacity region0
>>> (example).
>>>
>>> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
>>> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
>>> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
>>> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
>>>
>>> echo "dc0" >/sys/bus/cxl/devices/decoder1.0/mode
>>> echo 0x400000000 >/sys/bus/cxl/devices/decoder1.0/dpa_size
>>>
>>> echo 0x400000000 > /sys/bus/cxl/devices/$region/size
>>> echo "decoder1.0" > /sys/bus/cxl/devices/$region/target0
>>> echo 1 > /sys/bus/cxl/devices/$region/commit
>>> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
>>>
>>> echo $region> /sys/bus/cxl/devices/decoder0.0/delete_region
>>>
>>> Signed-off-by: Navneet Singh <navneet.singh@intel.com>
>>>
>>> ---
>>> [iweiny: fixups]
>>> [iweiny: remove unused CXL_DC_REGION_MODE macro]
>>> [iweiny: Make dc_mode_to_region_index static]
>>> [iweiny: simplify <sysfs>/create_dc_region]
>>> [iweiny: introduce decoder_mode_is_dc]
>>> [djbw: fixups, no sign-off: preview only]
>>> ---
>>> drivers/cxl/Kconfig | 11 +++
>>> drivers/cxl/core/core.h | 7 ++
>>> drivers/cxl/core/hdm.c | 234 ++++++++++++++++++++++++++++++++++++++++++----
>>> drivers/cxl/core/port.c | 18 ++++
>>> drivers/cxl/core/region.c | 135 ++++++++++++++++++++++++--
>>> drivers/cxl/cxl.h | 28 ++++++
>>> drivers/dax/cxl.c | 4 +
>>> 7 files changed, 409 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
>>> index ff4e78117b31..df034889d053 100644
>>> --- a/drivers/cxl/Kconfig
>>> +++ b/drivers/cxl/Kconfig
>>> @@ -121,6 +121,17 @@ config CXL_REGION
>>>
>>> If unsure say 'y'
>>>
>>> +config CXL_DCD
>>> + bool "CXL: DCD Support"
>>> + default CXL_BUS
>>> + depends on CXL_REGION
>>> + help
>>> + Enable the CXL core to provision CXL DCD regions.
>>> + CXL devices optionally support dynamic capacity and DCD region
>>> + maps the dynamic capacity regions DPA's into Host HPA ranges.
>>> +
>>> + If unsure say 'y'
>>> +
>>> config CXL_REGION_INVALIDATION_TEST
>>> bool "CXL: Region Cache Management Bypass (TEST)"
>>> depends on CXL_REGION
>>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
>>> index 27f0968449de..725700ab5973 100644
>>> --- a/drivers/cxl/core/core.h
>>> +++ b/drivers/cxl/core/core.h
>>> @@ -9,6 +9,13 @@ extern const struct device_type cxl_nvdimm_type;
>>>
>>> extern struct attribute_group cxl_base_attribute_group;
>>>
>>> +#ifdef CONFIG_CXL_DCD
>>> +extern struct device_attribute dev_attr_create_dc_region;
>>> +#define SET_CXL_DC_REGION_ATTR(x) (&dev_attr_##x.attr),
>>> +#else
>>> +#define SET_CXL_DC_REGION_ATTR(x)
>>> +#endif
>>> +
>>> #ifdef CONFIG_CXL_REGION
>>> extern struct device_attribute dev_attr_create_pmem_region;
>>> extern struct device_attribute dev_attr_create_ram_region;
>>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>>> index 514d30131d92..29649b47d177 100644
>>> --- a/drivers/cxl/core/hdm.c
>>> +++ b/drivers/cxl/core/hdm.c
>>> @@ -233,14 +233,23 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
>>> struct cxl_dev_state *cxlds = cxlmd->cxlds;
>>> struct resource *res = cxled->dpa_res;
>>> resource_size_t skip_start;
>>> + resource_size_t skipped = cxled->skip;
>>>
>>> lockdep_assert_held_write(&cxl_dpa_rwsem);
>>>
>>> /* save @skip_start, before @res is released */
>>> - skip_start = res->start - cxled->skip;
>>> + skip_start = res->start - skipped;
>>> __release_region(&cxlds->dpa_res, res->start, resource_size(res));
>>> - if (cxled->skip)
>>> - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
>>> + if (cxled->skip != 0) {
>>> + while (skipped != 0) {
>>> + res = xa_load(&cxled->skip_res, skip_start);
>>> + __release_region(&cxlds->dpa_res, skip_start,
>>> + resource_size(res));
>>> + xa_erase(&cxled->skip_res, skip_start);
>>> + skip_start += resource_size(res);
>>> + skipped -= resource_size(res);
>>> + }
>>> + }
>>> cxled->skip = 0;
>>> cxled->dpa_res = NULL;
>>> put_device(&cxled->cxld.dev);
>>> @@ -267,6 +276,19 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
>>> __cxl_dpa_release(cxled);
>>> }
>>>
>>> +static int dc_mode_to_region_index(enum cxl_decoder_mode mode)
>>> +{
>>> + int index = 0;
>>> +
>>> + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) {
>>> + if (mode == i)
>>> + return index;
>>> + index++;
>>> + }
>>> +
>>> + return -EINVAL;
>>> +}
>>> +
>>> static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>>> resource_size_t base, resource_size_t len,
>>> resource_size_t skipped)
>>> @@ -275,7 +297,11 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>>> struct cxl_port *port = cxled_to_port(cxled);
>>> struct cxl_dev_state *cxlds = cxlmd->cxlds;
>>> struct device *dev = &port->dev;
>>> + struct device *ed_dev = &cxled->cxld.dev;
>>> + struct resource *dpa_res = &cxlds->dpa_res;
>>> + resource_size_t skip_len = 0;
>>> struct resource *res;
>>> + int rc, index;
>>>
>>> lockdep_assert_held_write(&cxl_dpa_rwsem);
>>>
>>> @@ -304,28 +330,119 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>>> }
>>>
>>> if (skipped) {
>>> - res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
>>> - dev_name(&cxled->cxld.dev), 0);
>>> - if (!res) {
>>> - dev_dbg(dev,
>>> - "decoder%d.%d: failed to reserve skipped space\n",
>>> - port->id, cxled->cxld.id);
>>> - return -EBUSY;
>>> + resource_size_t skip_base = base - skipped;
>>> +
>>> + if (decoder_mode_is_dc(cxled->mode)) {
>>
>> Maybe move this entire block to a helper function to reduce the size of
>> the current function and reduce indent levels and improve readability?
>
> :-/
>
> I'll work on breaking it out more. The logic here is getting kind of
> crazy.
>
>>
>>> + if (resource_size(&cxlds->ram_res) &&
>>> + skip_base <= cxlds->ram_res.end) {
>>> + skip_len = cxlds->ram_res.end - skip_base + 1;
>>> + res = __request_region(dpa_res, skip_base,
>>> + skip_len, dev_name(ed_dev), 0);
>>> + if (!res)
>>> + goto error;
>>> +
>>> + rc = xa_insert(&cxled->skip_res, skip_base, res,
>>> + GFP_KERNEL);
>>> + skip_base += skip_len;
>>> + }
>>> +
>>> + if (resource_size(&cxlds->ram_res) &&
> ^^^^^^^
> pmem_res?
>
>>> + skip_base <= cxlds->pmem_res.end) {
>
> The 2 if statements here are almost exactly the same. To the point I
> wonder if there is a bug.
>
> Navneet,
>
> Why does the code check ram_res the second time but go on to use pmem_res
> in the block?
>
>>> + skip_len = cxlds->pmem_res.end - skip_base + 1;
>>> + res = __request_region(dpa_res, skip_base,
>>> + skip_len, dev_name(ed_dev), 0);
>>> + if (!res)
>>> + goto error;
>>> +
>>> + rc = xa_insert(&cxled->skip_res, skip_base, res,
>>> + GFP_KERNEL);
>>> + skip_base += skip_len;
>>> + }
>>> +
>>> + index = dc_mode_to_region_index(cxled->mode);
>>> + for (int i = 0; i <= index; i++) {
>>> + struct resource *dcr = &cxlds->dc_res[i];
>>> +
>>> + if (skip_base < dcr->start) {
>>> + skip_len = dcr->start - skip_base;
>>> + res = __request_region(dpa_res,
>>> + skip_base, skip_len,
>>> + dev_name(ed_dev), 0);
>>> + if (!res)
>>> + goto error;
>>> +
>>> + rc = xa_insert(&cxled->skip_res, skip_base,
>>> + res, GFP_KERNEL);
>>> + skip_base += skip_len;
>>> + }
>>> +
>>> + if (skip_base == base) {
>>> + dev_dbg(dev, "skip done!\n");
>>> + break;
>>> + }
>>> +
>>> + if (resource_size(dcr) &&
>>> + skip_base <= dcr->end) {
>>> + if (skip_base > base)
>>> + dev_err(dev, "Skip error\n");
>>> +
>>> + skip_len = dcr->end - skip_base + 1;
>>> + res = __request_region(dpa_res, skip_base,
>>> + skip_len,
>>> + dev_name(ed_dev), 0);
>>> + if (!res)
>>> + goto error;
>>> +
>>> + rc = xa_insert(&cxled->skip_res, skip_base,
>>> + res, GFP_KERNEL);
>>> + skip_base += skip_len;
>>> + }
>>> + }
>>> + } else {
>>> + res = __request_region(dpa_res, base - skipped, skipped,
>>> + dev_name(ed_dev), 0);
>>> + if (!res)
>>> + goto error;
>>> +
>>> + rc = xa_insert(&cxled->skip_res, skip_base, res,
>>> + GFP_KERNEL);
>>> }
>>> }
>>> - res = __request_region(&cxlds->dpa_res, base, len,
>>> - dev_name(&cxled->cxld.dev), 0);
>>> +
>>> + res = __request_region(dpa_res, base, len, dev_name(ed_dev), 0);
>>> if (!res) {
>>> dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
>>> - port->id, cxled->cxld.id);
>>> - if (skipped)
>>> - __release_region(&cxlds->dpa_res, base - skipped,
>>> - skipped);
>>> + port->id, cxled->cxld.id);
>>> + if (skipped) {
>>> + resource_size_t skip_base = base - skipped;
>>> +
>>> + while (skipped != 0) {
>>> + if (skip_base > base)
>>> + dev_err(dev, "Skip error\n");
>>> +
>>> + res = xa_load(&cxled->skip_res, skip_base);
>>> + __release_region(dpa_res, skip_base,
>>> + resource_size(res));
>>> + xa_erase(&cxled->skip_res, skip_base);
>>> + skip_base += resource_size(res);
>>> + skipped -= resource_size(res);
>>> + }
>>> + }
>>> return -EBUSY;
>>> }
>>> cxled->dpa_res = res;
>>> cxled->skip = skipped;
>>>
>>> + for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) {
>>> + int index = dc_mode_to_region_index(mode);
>>> +
>>> + if (resource_contains(&cxlds->dc_res[index], res)) {
>>> + cxled->mode = mode;
>>> + dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id,
>>> + cxled->cxld.id, cxled->dpa_res, cxled->mode);
>>> + goto success > + }
>>> + }
>>
>> This block should only happen if decoder_mode_is_dc() right? If that's
>> the case, you might be able to refactor it so the 'goto success' isn't
>> necessary.
>
> I'll check. I looked through this code a couple of times in my review
> before posting because I'm not 100% sure I want to see 8 different modes
> DC decoders and regions.
>
> I think the 'mode' should be 'DC' with an index in the endpoint decoder to
> map DC region that decoder is mapping. But that change was much bigger to
> Navneets code and I wanted to see how others felt about having DC0 - DC7
> modes. My compromise was creating decoder_mode_is_dc().
>
>>
>>> if (resource_contains(&cxlds->pmem_res, res))
>>> cxled->mode = CXL_DECODER_PMEM;
>>> else if (resource_contains(&cxlds->ram_res, res))
>>> @@ -336,9 +453,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>>> cxled->mode = CXL_DECODER_MIXED;
>>> }
>>>
>>> +success:
>>> port->hdm_end++;
>>> get_device(&cxled->cxld.dev);
>>> return 0;
>>> +
>>> +error:
>>> + dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space\n",
>>> + port->id, cxled->cxld.id);
>>> + return -EBUSY;
>>> +
>>> }
>>>
>>> int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>>> @@ -429,6 +553,14 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
>>> switch (mode) {
>>> case CXL_DECODER_RAM:
>>> case CXL_DECODER_PMEM:
>>> + case CXL_DECODER_DC0:
>>> + case CXL_DECODER_DC1:
>>> + case CXL_DECODER_DC2:
>>> + case CXL_DECODER_DC3:
>>> + case CXL_DECODER_DC4:
>>> + case CXL_DECODER_DC5:
>>> + case CXL_DECODER_DC6:
>>> + case CXL_DECODER_DC7:
>
> For example this seems very hacky...
Not sure if it helps, but you can always do:
case CXL_DECODER_DC0 ... CXL_DECODER_DC7:
DJ
>
> [snip]
>
>>>
>>> +/*
>>> + * The region can not be manged by CXL if any portion of
>>> + * it is already online as 'System RAM'
>>> + */
>>> +static bool region_is_system_ram(struct cxl_region *cxlr,
>>> + struct cxl_region_params *p)
>>> +{
>>> + return (walk_iomem_res_desc(IORES_DESC_NONE,
>>> + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
>>> + p->res->start, p->res->end, cxlr,
>>> + is_system_ram) > 0);
>>> +}
>>> +
>>> static int cxl_region_probe(struct device *dev)
>>> {
>>> struct cxl_region *cxlr = to_cxl_region(dev);
>>> @@ -3174,14 +3287,7 @@ static int cxl_region_probe(struct device *dev)
>>> case CXL_DECODER_PMEM:
>>> return devm_cxl_add_pmem_region(cxlr);
>>> case CXL_DECODER_RAM:
>>> - /*
>>> - * The region can not be manged by CXL if any portion of
>>> - * it is already online as 'System RAM'
>>> - */
>>> - if (walk_iomem_res_desc(IORES_DESC_NONE,
>>> - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
>>> - p->res->start, p->res->end, cxlr,
>>> - is_system_ram) > 0)
>>> + if (region_is_system_ram(cxlr, p))
>>
>> Maybe split this change out as a prep patch before the current patch.
>
> That seems reasonable. But the patch is not so large and the
> justification for creating a helper is that we need this same check for DC
> regions. So it seemed ok to leave it like this. Let me see about
> splitting it out.
>
> [snip]
>
>>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>>> index ccdf8de85bd5..eb5eb81bfbd7 100644
>>> --- a/drivers/dax/cxl.c
>>> +++ b/drivers/dax/cxl.c
>>> @@ -23,11 +23,15 @@ static int cxl_dax_region_probe(struct device *dev)
>>> if (!dax_region)
>>> return -ENOMEM;
>>>
>>> + if (decoder_mode_is_dc(cxlr->mode))
>>> + return 0;
>>> +
>>> data = (struct dev_dax_data) {
>>> .dax_region = dax_region,
>>> .id = -1,
>>> .size = range_len(&cxlr_dax->hpa_range),
>>> };
>>> +
>>
>> Stray blank line?
>
> Opps! Fixed!
>
> Ira
next prev parent reply other threads:[~2023-06-15 18:28 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-14 19:16 [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2023-06-14 19:16 ` [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device ira.weiny
2023-06-14 22:53 ` Dave Jiang
2023-06-15 15:04 ` Ira Weiny
2023-06-14 23:49 ` Alison Schofield
2023-06-15 22:46 ` Ira Weiny
2023-06-15 18:30 ` Fan Ni
2023-06-15 19:17 ` Navneet Singh
2023-06-15 21:41 ` Fan Ni
2023-06-22 15:58 ` Jonathan Cameron
2023-06-24 13:08 ` Ira Weiny
2023-07-03 2:29 ` Jonathan Cameron
2023-06-14 19:16 ` [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support ira.weiny
2023-06-14 23:37 ` Dave Jiang
2023-06-15 18:12 ` Ira Weiny
2023-06-15 18:28 ` Dave Jiang [this message]
2023-06-16 3:52 ` Navneet Singh
2023-06-15 18:56 ` Navneet Singh
2023-06-15 0:21 ` Alison Schofield
2023-06-16 2:06 ` Ira Weiny
2023-06-16 15:56 ` Alison Schofield
2023-06-16 16:51 ` Alison Schofield
2023-06-21 2:44 ` Ira Weiny
2023-06-20 17:55 ` Fan Ni
2023-06-20 20:33 ` Ira Weiny
2023-06-21 3:13 ` Navneet Singh
2023-06-21 17:20 ` Fan Ni
2023-06-23 18:02 ` Ira Weiny
2023-06-22 16:34 ` Jonathan Cameron
2023-07-05 14:49 ` Davidlohr Bueso
2023-06-14 19:16 ` [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace ira.weiny
2023-06-15 0:40 ` Alison Schofield
2023-06-16 2:47 ` Ira Weiny
2023-06-16 15:58 ` Dave Jiang
2023-06-20 16:23 ` Ira Weiny
2023-06-20 16:48 ` Dave Jiang
2023-06-15 15:41 ` Dave Jiang
2023-06-14 19:16 ` [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events ira.weiny
2023-06-15 2:19 ` Alison Schofield
2023-06-16 4:11 ` Ira Weiny
2023-06-27 18:20 ` Fan Ni
2023-06-15 16:58 ` Dave Jiang
2023-06-22 17:01 ` Jonathan Cameron
2023-06-29 15:19 ` Ira Weiny
2023-06-27 18:17 ` Fan Ni
2023-07-13 12:55 ` Jørgen Hansen
2023-06-14 19:16 ` [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2023-06-15 17:08 ` Dave Jiang
2023-06-15 0:56 ` [PATCH 0/5] cxl/dcd: Add support for Dynamic Capacity Devices (DCD) Alison Schofield
2023-06-16 2:57 ` Ira Weiny
2023-06-15 14:51 ` Ira Weiny
2023-06-22 15:07 ` Jonathan Cameron
2023-06-22 16:37 ` Jonathan Cameron
2023-06-27 14:59 ` Ira Weiny
2023-06-29 15:30 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06bcf919-1abb-33d2-105b-b7e7a7c73b96@intel.com \
--to=dave.jiang@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=fan.ni@samsung.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=navneet.singh@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox