From: Ira Weiny <ira.weiny@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>, <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
"Navneet Singh" <navneet.singh@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 16/26] cxl/extent: Realize extent devices
Date: Mon, 29 Apr 2024 20:23:10 -0700 [thread overview]
Message-ID: <6630641e5238e_e1f5829431@iweiny-mobl.notmuch> (raw)
In-Reply-To: <20240404173241.00003a6f@Huawei.com>
Jonathan Cameron wrote:
> On Sun, 24 Mar 2024 16:18:19 -0700
> ira.weiny@intel.com wrote:
>
> > From: Navneet Singh <navneet.singh@intel.com>
> >
> > Once all extents of an interleave set are present a region must
> > surface an extent to the region.
> >
> > Without interleaving; endpoint decoder and region extents have a 1:1
> > relationship. Future support for IW > 1 will maintain a N:1
> > relationship between the device extents and region extents.
> >
> > Create a region extent device for every device extent found. Release of
> > the extent device triggers a response to the underlying hardware extent.
> >
> > There is no strong use case to support the addition of extents which
> > overlap previously accepted extent ranges. Reject such new extents
> > until such time as a good use case emerges.
> >
> > Expose the necessary details of region extents by creating the following
> > sysfs entries.
> >
> > /sys/bus/cxl/devices/dax_regionX/extentY
> > /sys/bus/cxl/devices/dax_regionX/extentY/offset
> > /sys/bus/cxl/devices/dax_regionX/extentY/length
> > /sys/bus/cxl/devices/dax_regionX/extentY/label
>
> Docs?
That is a good idea.
> The label in particular worries me a little as I'm not sure what
> is in it.
I envisioned a pass through of the tag.
>
> If it's the tag one possible format is a uuid (not a coincidence
> that it is the same length) and interpreting that as characters isn't
> going to get us far. I wonder if we have to treat it as a binary attr
> given we have no idea what it is.
In thinking about this more (and running some experiments): none of these are
strictly necessary in this initial implementation. No code currently uses
them directly.
I questioned these in the past and I've done so again over the weekend.
I was about to rip them out entirely when I remembered Gregory Price's
comments on Discord. There he indicating a desire to very carefully place
dax devices. Without at least the offset and length above (and to a
lesser extent the label) this can't be done.
One still has to create and delete dax devices carefully
to place a dax device in a specific place. But the above give the user
the information to do so. Without it the user must coordinate with the FM
even more (which we could require initially).
On particular issue is the simplification I made within the kernel to
track extents. The extents are no longer ordered within an xarray.
This means a user can't accurately predict which extent will be used when
allocating a dax device. One has to experiment and look at the resulting
mappings of the dax device to see if it got allocated in the right place.
For example:
| DC region |
|-----------------------------------------------------|
|--------| |--------| |
| (ext0) | | (ext1) | |
| (1G) | | (1G) | |
If the above extents were surfaced in the following order:
ext1
ext0
Then a dax device of size 1G was created. The dax mapping would be:
| DC region |
|-----------------------------------------------------|
|--------| |--------| |
| (ext0) | | (ext1) | |
| (1G) | | (1G) | |
| |(daxX.1)| |
Allocating another dax device would result in:
|(daxX.2)} |(daxX.1)| |
I don't think this is exactly what the user is going to expect. This can
be resolved by by looking at the dax device mappings though.[0] So I'm going
to leave this for now. But I expect some additional porcelain is going to
be required to fully meet Gregory's requirements.
[0]
/sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
/sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
Back to the label field: It is currently just the 'tag' of the individual
extent (because no interleaving). My vision for the interleave case would
be for the kernel to assemble device extents into a region extent only if
the tags match and export that.
Thinking on it more though we should leave label out for now. This is the
second time it has been questioned.
> Otherwise a query inline that may well be answered in later patches.
>
> >
> > The use of the extent devices by the DAX layer is deferred to later
> > patches.
> >
> > Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> > Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
[snip]
> > +int dax_region_create_ext(struct cxl_dax_region *cxlr_dax,
> > + struct range *hpa_range,
> > + const char *label,
> > + struct range *dpa_range,
> > + struct cxl_endpoint_decoder *cxled)
> > +{
> > + struct region_extent *reg_ext;
> > + struct device *dev;
> > + int rc, id;
> > +
> > + id = ida_alloc(&cxl_extent_ida, GFP_KERNEL);
> > + if (id < 0)
> > + return -ENOMEM;
>
> Whilst it doesn't matter hugely, it's nice if the release does things
> in opposite order of the creation. So perhaps move the ida_alloc
> after kzalloc or reg_ext?
Actually there is an ida resource leak here if the alloc fails. I'll fix
that too.
>
> > +
> > + reg_ext = kzalloc(sizeof(*reg_ext), GFP_KERNEL);
> > + if (!reg_ext)
> > + return -ENOMEM;
> > +
> > + reg_ext->hpa_range = *hpa_range;
> > + reg_ext->ed_ext.dpa_range = *dpa_range;
> > + reg_ext->ed_ext.cxled = cxled;
> > + snprintf(reg_ext->label, DAX_EXTENT_LABEL_LEN, "%s", label);
> > +
> > + dev = ®_ext->dev;
> > + device_initialize(dev);
> > + dev->id = id;
> > + device_set_pm_not_required(dev);
> > + dev->parent = &cxlr_dax->dev;
> > + dev->type = ®ion_extent_type;
> > + rc = dev_set_name(dev, "extent%d", dev->id);
> > + if (rc)
> > + goto err;
> > +
> > + rc = device_add(dev);
> > + if (rc)
> > + goto err;
> > +
> > + dev_dbg(dev, "DAX region extent HPA %#llx - %#llx\n",
> > + reg_ext->hpa_range.start, reg_ext->hpa_range.end);
> > +
> > + return devm_add_action_or_reset(&cxlr_dax->dev, region_extent_unregister,
> > + reg_ext);
>
> Indent
Yep.
>
> > +
> > +err:
> > + dev_err(&cxlr_dax->dev, "Failed to initialize DAX extent dev HPA %#llx - %#llx\n",
> > + reg_ext->hpa_range.start, reg_ext->hpa_range.end);
> > +
> > + put_device(dev);
> > + return rc;
> > +}
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 9e33a0976828..6b00e717e42b 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -1020,6 +1020,32 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
> > return rc;
> > }
> >
> > +static int cxl_send_dc_cap_response(struct cxl_memdev_state *mds,
> > + struct range *extent, int opcode)
> > +{
> > + struct cxl_mbox_cmd mbox_cmd;
> > + size_t size;
> > +
> > + struct cxl_mbox_dc_response *dc_res __free(kfree);
> > + size = struct_size(dc_res, extent_list, 1);
> > + dc_res = kzalloc(size, GFP_KERNEL);
> > + if (!dc_res)
> > + return -ENOMEM;
> > +
> > + dc_res->extent_list[0].dpa_start = cpu_to_le64(extent->start);
> > + memset(dc_res->extent_list[0].reserved, 0, 8);
> > + dc_res->extent_list[0].length = cpu_to_le64(range_len(extent));
> > + dc_res->extent_list_size = cpu_to_le32(1);
>
> I guess this comes up later, but such a response means that if we are offered
> multiple extents in an add with the more flag set then we always reject all
> but the first one.
I've thought about how to best support for the more flag without major
complications. So far the use of the more flag is IMO more trouble than
it is worth. I agree that the spec is clear WRT the grouping of a
response with the more flag set but it is very vague on __why__.
>
> > +
> > + mbox_cmd = (struct cxl_mbox_cmd) {
> > + .opcode = opcode,
> > + .size_in = size,
> > + .payload_in = dc_res,
> > + };
> > +
> > + return cxl_internal_send_cmd(mds, &mbox_cmd);
> > +}
> > +
> > static struct cxl_memdev_state *
> > cxled_to_mds(struct cxl_endpoint_decoder *cxled)
> > {
> > @@ -1029,6 +1055,23 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled)
> > return container_of(cxlds, struct cxl_memdev_state, cxlds);
> > }
> >
> > +void cxl_release_ed_extent(struct cxl_ed_extent *extent)
> > +{
> > + struct cxl_endpoint_decoder *cxled = extent->cxled;
> > + struct cxl_memdev_state *mds = cxled_to_mds(cxled);
> > + struct device *dev = mds->cxlds.dev;
> > + int rc;
> > +
> > + dev_dbg(dev, "Releasing DC extent DPA %#llx - %#llx\n",
> > + extent->dpa_range.start, extent->dpa_range.end);
> > +
> > + rc = cxl_send_dc_cap_response(mds, &extent->dpa_range, CXL_MBOX_OP_RELEASE_DC);
>
> Long line that doesn't really need to be.
Yep
>
> > + if (rc)
> > + dev_dbg(dev, "Failed to respond releasing extent DPA %#llx - %#llx; %d\n",
> > + extent->dpa_range.start, extent->dpa_range.end, rc);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_release_ed_extent, CXL);
> > +
> > static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > enum cxl_event_log_type type)
> > {
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 3e563ab29afe..7635ff109578 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1450,11 +1450,81 @@ static int cxl_region_validate_position(struct cxl_region *cxlr,
> > return 0;
> > }
>
> > static int cxl_region_attach_position(struct cxl_region *cxlr,
> > @@ -2684,6 +2754,7 @@ static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr)
> >
> > dev = &cxlr_dax->dev;
> > cxlr_dax->cxlr = cxlr;
> > + cxlr->cxlr_dax = cxlr_dax;
> > device_initialize(dev);
> > lockdep_set_class(&dev->mutex, &cxl_dax_region_key);
> > device_set_pm_not_required(dev);
> > @@ -2799,7 +2870,10 @@ static int cxl_region_read_extents(struct cxl_region *cxlr)
> > static void cxlr_dax_unregister(void *_cxlr_dax)
> > {
> > struct cxl_dax_region *cxlr_dax = _cxlr_dax;
> > + struct cxl_region *cxlr = cxlr_dax->cxlr;
> >
> > + cxlr->cxlr_dax = NULL;
> > + cxlr_dax->cxlr = NULL;
> > device_unregister(&cxlr_dax->dev);
> > }
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index d585f5fdd3ae..5379ad7f5852 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
>
>
> > +/**
> > + * struct region_extent - CXL DAX region extent
> > + * @dev: device representing this extent
> > + * @hpa_range: HPA range of this extent
> > + * @label: label of the extent
> > + * @ed_ext: Endpoint decoder extent which backs this extent
> > + */
> > +#define DAX_EXTENT_LABEL_LEN 64
>
> Something called DAX_* doesn't belong in this header...
> Either give a CXL_DAX_ prefix or move the definition if appropriate.
>
I've remove this as well as the label sysfs instead.
Ira
next prev parent reply other threads:[~2024-04-30 3:23 UTC|newest]
Thread overview: 161+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-24 23:18 [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2024-03-24 23:18 ` [PATCH 01/26] cxl/mbox: Flag " ira.weiny
2024-03-25 16:11 ` Jonathan Cameron
2024-03-25 22:16 ` fan
2024-03-25 22:56 ` Davidlohr Bueso
2024-04-02 22:26 ` Ira Weiny
2024-03-26 16:34 ` Dave Jiang
2024-04-02 22:30 ` Ira Weiny
2024-04-10 18:15 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 02/26] cxl/core: Separate region mode from decoder mode ira.weiny
2024-03-25 16:20 ` Jonathan Cameron
2024-04-02 23:24 ` Ira Weiny
2024-03-25 23:18 ` Davidlohr Bueso
2024-03-28 5:22 ` Ira Weiny
2024-03-28 20:09 ` Dave Jiang
2024-04-02 23:27 ` Ira Weiny
2024-04-24 17:58 ` Ira Weiny
2024-04-02 23:25 ` Ira Weiny
2024-04-10 18:49 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 03/26] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-03-25 17:40 ` Jonathan Cameron
2024-04-03 22:22 ` Ira Weiny
2024-03-25 23:36 ` fan
2024-04-03 22:41 ` Ira Weiny
2024-04-02 11:41 ` Jørgen Hansen
2024-04-05 18:09 ` Ira Weiny
2024-04-09 8:42 ` Jørgen Hansen
2024-04-09 2:00 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 04/26] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-03-25 17:42 ` Jonathan Cameron
2024-03-26 16:17 ` fan
2024-03-27 15:43 ` Dave Jiang
2024-04-05 18:19 ` Ira Weiny
2024-04-06 0:01 ` Dave Jiang
2024-05-14 2:40 ` Zhijian Li (Fujitsu)
2024-03-24 23:18 ` [PATCH 05/26] cxl/core: Simplify cxl_dpa_set_mode() Ira Weiny
2024-03-25 17:46 ` Jonathan Cameron
2024-03-25 21:38 ` Davidlohr Bueso
2024-03-26 16:25 ` fan
2024-03-26 17:46 ` Dave Jiang
2024-04-05 19:21 ` Ira Weiny
2024-04-06 0:02 ` Dave Jiang
2024-04-09 0:43 ` Alison Schofield
2024-05-03 19:09 ` Ira Weiny
2024-05-03 20:33 ` Alison Schofield
2024-05-04 1:19 ` Dan Williams
2024-05-06 4:06 ` Ira Weiny
2024-05-04 4:13 ` Dan Williams
2024-05-06 3:46 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 06/26] cxl/port: Add Dynamic Capacity mode support to endpoint decoders ira.weiny
2024-03-26 16:35 ` fan
2024-04-05 19:50 ` Ira Weiny
2024-03-26 17:58 ` Dave Jiang
2024-04-05 20:34 ` Ira Weiny
2024-04-04 8:32 ` Jonathan Cameron
2024-04-05 20:56 ` Ira Weiny
2024-05-06 16:22 ` Dan Williams
2024-05-10 5:31 ` Ira Weiny
2024-04-10 20:33 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 07/26] cxl/port: Add dynamic capacity size " ira.weiny
2024-04-05 13:54 ` Jonathan Cameron
2024-05-03 17:09 ` Ira Weiny
2024-05-03 17:21 ` Dan Williams
2024-05-06 4:07 ` Ira Weiny
2024-04-10 22:50 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 08/26] cxl/mem: Expose device dynamic capacity capabilities ira.weiny
2024-03-25 23:40 ` Davidlohr Bueso
2024-03-26 18:30 ` fan
2024-04-04 8:44 ` Jonathan Cameron
2024-04-04 8:51 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 09/26] cxl/region: Add Dynamic Capacity CXL region support ira.weiny
2024-03-26 22:31 ` fan
2024-04-10 4:25 ` Ira Weiny
2024-03-27 17:27 ` Dave Jiang
2024-04-10 4:35 ` Ira Weiny
2024-04-04 10:26 ` Jonathan Cameron
2024-04-10 4:40 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 10/26] cxl/events: Factor out event msgnum configuration Ira Weiny
2024-03-27 17:38 ` Dave Jiang
2024-04-04 15:07 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 11/26] cxl/pci: Delay event buffer allocation Ira Weiny
2024-03-25 22:26 ` Davidlohr Bueso
2024-03-27 17:38 ` Dave Jiang
2024-04-04 15:08 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 12/26] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-03-27 17:41 ` Dave Jiang
2024-04-04 15:10 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 13/26] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-03-26 23:12 ` fan
2024-04-10 4:48 ` Ira Weiny
2024-03-27 17:54 ` Dave Jiang
2024-04-10 5:26 ` Ira Weiny
2024-04-04 15:22 ` Jonathan Cameron
2024-04-10 5:34 ` Ira Weiny
2024-04-10 23:23 ` Alison Schofield
2024-05-06 16:56 ` Dan Williams
2024-03-24 23:18 ` [PATCH 14/26] cxl/region: Read existing extents on region creation ira.weiny
2024-03-26 23:27 ` fan
2024-04-10 5:46 ` Ira Weiny
2024-03-27 17:45 ` fan
2024-04-10 6:19 ` Ira Weiny
2024-03-27 18:31 ` Dave Jiang
2024-04-10 6:09 ` Ira Weiny
2024-04-02 13:57 ` Jørgen Hansen
2024-04-10 6:29 ` Ira Weiny
2024-04-04 16:04 ` Jonathan Cameron
2024-04-04 16:13 ` Jonathan Cameron
2024-04-10 17:44 ` Alison Schofield
2024-05-06 18:34 ` Dan Williams
2024-06-29 3:47 ` Ira Weiny
2024-03-24 23:18 ` [PATCH 15/26] range: Add range_overlaps() Ira Weiny
2024-03-25 18:33 ` David Sterba
2024-03-25 21:24 ` Davidlohr Bueso
2024-03-26 12:51 ` Johannes Thumshirn
2024-03-27 17:36 ` fan
2024-03-28 20:09 ` Dave Jiang
2024-04-04 16:06 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 16/26] cxl/extent: Realize extent devices ira.weiny
2024-03-27 22:34 ` fan
2024-03-28 21:11 ` Dave Jiang
2024-04-24 19:57 ` Ira Weiny
2024-04-04 16:32 ` Jonathan Cameron
2024-04-30 3:23 ` Ira Weiny [this message]
2024-05-02 21:12 ` Dan Williams
2024-05-06 4:35 ` Ira Weiny
2024-04-11 0:09 ` Alison Schofield
2024-05-07 1:30 ` Dan Williams
2024-03-24 23:18 ` [PATCH 17/26] dax/region: Create extent resources on DAX region driver load ira.weiny
2024-04-04 16:36 ` Jonathan Cameron
2024-04-09 16:22 ` fan
2024-05-07 2:31 ` Dan Williams
2024-03-24 23:18 ` [PATCH 18/26] cxl/mem: Handle DCD add & release capacity events ira.weiny
2024-04-04 17:03 ` Jonathan Cameron
2024-05-07 5:04 ` Dan Williams
2024-03-24 23:18 ` [PATCH 19/26] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-04-04 17:15 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 20/26] dax: Document dax dev range tuple Ira Weiny
2024-04-01 17:06 ` Dave Jiang
2024-04-04 17:19 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 21/26] dax/region: Prevent range mapping allocation on sparse regions Ira Weiny
2024-04-01 17:07 ` Dave Jiang
2024-04-10 23:02 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 22/26] dax/region: Support DAX device creation on sparse DAX regions Ira Weiny
2024-04-04 17:36 ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 23/26] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-04-01 17:56 ` Dave Jiang
2024-04-04 17:38 ` Jonathan Cameron
2024-04-10 17:03 ` Alison Schofield
2024-03-24 23:18 ` [PATCH 24/26] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-03-24 23:18 ` [PATCH 25/26] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-03-24 23:18 ` [PATCH 26/26] tools/testing/cxl: Add Dynamic Capacity events Ira Weiny
2024-03-25 19:24 ` [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) fan
2024-03-28 5:20 ` Ira Weiny
2024-04-03 20:39 ` Jonathan Cameron
2024-04-04 10:20 ` Jonathan Cameron
2024-04-04 17:49 ` Jonathan Cameron
2024-05-01 23:49 ` Ira Weiny
2024-05-03 9:20 ` Jonathan Cameron
2024-05-06 4:24 ` Ira Weiny
2024-05-08 14:43 ` Jonathan Cameron
2024-04-10 18:01 ` Alison Schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6630641e5238e_e1f5829431@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=navneet.singh@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox