From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
"Navneet Singh" <navneet.singh@intel.com>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Petr Mladek <pmladek@suse.com>,
Steven Rostedt <rostedt@goodmis.org>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Jonathan Corbet <corbet@lwn.net>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
<nvdimm@lists.linux.dev>
Subject: Re: [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents
Date: Fri, 30 Aug 2024 10:21:43 +0100 [thread overview]
Message-ID: <20240830102143.000048fc@Huawei.com> (raw)
In-Reply-To: <66d0e53e9d3e9_f937b294b7@iweiny-mobl.notmuch>
> > > +int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
> > > +{
> > > + u64 start_dpa = le64_to_cpu(extent->start_dpa);
> > > + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> > > + struct cxl_endpoint_decoder *cxled;
> > > + struct range hpa_range, dpa_range;
> > > + struct cxl_region *cxlr;
> > > +
> > > + dpa_range = (struct range) {
> > > + .start = start_dpa,
> > > + .end = start_dpa + le64_to_cpu(extent->length) - 1,
> > > + };
> > > +
> > > + guard(rwsem_read)(&cxl_region_rwsem);
> > > + cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled);
> > > + if (!cxlr) {
> > > + memdev_release_extent(mds, &dpa_range);
> >
> > How does this condition happen? Perhaps a comment needed.
>
> Fair enough. Proposed comment.
>
> /*
> * No region can happen here for a few reasons:
> *
> * 1) Extents were accepted and the host crashed/rebooted
> * leaving them in an accepted state. On reboot the host
> * has not yet created a region to own them.
> *
> * 2) Region destruction won the race with the device releasing
> * all the extents. Here the release will be a duplicate of
> * the one sent via region destruction.
> *
> * 3) The device is confused and releasing extents for which no
> * region ever existed.
> *
> * In all these cases make sure the device knows we are not
> * using this extent.
> */
>
> Item 2 is AFAICS ok with the spec.
I'm not sure I follow 2. Why would device be releasing extents
if we haven't given them back? We aren't supporting the mess that
is force removal.
>
> >
> > > + return -ENXIO;
> > > + }
> > > +
> > > + calc_hpa_range(cxled, cxlr->cxlr_dax, &dpa_range, &hpa_range);
> > > +
> > > + /* Remove region extents which overlap */
> > > + return device_for_each_child(&cxlr->cxlr_dax->dev, &hpa_range,
> > > + cxlr_rm_extent);
> > > +}
> > > +
> > > +/* Callers are expected to ensure cxled has been attached to a region */
> > > +int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
> > > +{
> > > + u64 start_dpa = le64_to_cpu(extent->start_dpa);
> > > + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> > > + struct cxl_endpoint_decoder *cxled;
> > > + struct range ed_range, ext_range;
> > > + struct cxl_dax_region *cxlr_dax;
> > > + struct cxled_extent *ed_extent;
> > > + struct cxl_region *cxlr;
> > > + struct device *dev;
> > > +
> > > + ext_range = (struct range) {
> > > + .start = start_dpa,
> > > + .end = start_dpa + le64_to_cpu(extent->length) - 1,
> > > + };
> > > +
> > > + guard(rwsem_read)(&cxl_region_rwsem);
> > > + cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled);
> > > + if (!cxlr)
> > > + return -ENXIO;
> > > +
> > > + cxlr_dax = cxled->cxld.region->cxlr_dax;
> > > + dev = &cxled->cxld.dev;
> > > + ed_range = (struct range) {
> > > + .start = cxled->dpa_res->start,
> > > + .end = cxled->dpa_res->end,
> > > + };
> > > +
> > > + dev_dbg(&cxled->cxld.dev, "Checking ED (%pr) for extent %par\n",
> > > + cxled->dpa_res, &ext_range);
> > > +
> > > + if (!range_contains(&ed_range, &ext_range)) {
> > > + dev_err_ratelimited(dev,
> > > + "DC extent DPA %par (%*phC) is not fully in ED %par\n",
> > > + &ext_range.start, CXL_EXTENT_TAG_LEN,
> > > + extent->tag, &ed_range);
> > > + return -ENXIO;
> > > + }
> > > +
> > > + if (extents_contain(cxlr_dax, cxled, &ext_range))
> >
> > This case confuses me. If the extents are already there I think we should
> > error out or at least print something as that's very wrong.
>
> I thought we discussed this in one of the community meetings that it would be
> ok to accept these. We could certainly print a warning here.
A warning probably does the job of indicating that 'something' odd is going on.
A device should never resend an extent overlapping one it sent before, (assuming
no removal happened inbetween) so this should never happen, but who knows :(
>
> In all honestly I'm wondering if these restrictions are really needed anymore.
> But at the same time I really, really, really don't think anyone has a good use
> case to have to support these cases. So I'm keeping the code simple for now.
Fair enough.
>
> >
> > > + return 0;
> > > +
> > > + if (extents_overlap(cxlr_dax, cxled, &ext_range))
> > > + return -ENXIO;
> > > +
> > > + ed_extent = kzalloc(sizeof(*ed_extent), GFP_KERNEL);
> > > + if (!ed_extent)
> > > + return -ENOMEM;
> > > +
> > > + ed_extent->cxled = cxled;
> > > + ed_extent->dpa_range = ext_range;
> > > + memcpy(ed_extent->tag, extent->tag, CXL_EXTENT_TAG_LEN);
> > > +
> > > + dev_dbg(dev, "Add extent %par (%*phC)\n", &ed_extent->dpa_range,
> > > + CXL_EXTENT_TAG_LEN, ed_extent->tag);
> > > +
> > > + return cxlr_add_extent(cxlr_dax, cxled, ed_extent);
> > > +}
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 01a447aaa1b1..f629ad7488ac 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -882,6 +882,48 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> > > }
> > > EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> > >
> > > +static int cxl_validate_extent(struct cxl_memdev_state *mds,
> > > + struct cxl_extent *extent)
> > > +{
> > > + u64 start = le64_to_cpu(extent->start_dpa);
> > > + u64 length = le64_to_cpu(extent->length);
> > > + struct device *dev = mds->cxlds.dev;
> > > +
> > > + struct range ext_range = (struct range){
> > > + .start = start,
> > > + .end = start + length - 1,
> > > + };
> > > +
> > > + if (le16_to_cpu(extent->shared_extn_seq) != 0) {
> >
> > That's not the 'main' way to tell if an extent is shared because
> > we could have a single extent (so seq == 0).
> > Should verify it's not in a DCD region that
> > is shareable to make this decision.
>
> Ah... :-/
>
> >
> > I've lost track on the region handling so maybe you already do
> > this by not including those regions at all?
>
> I don't think so.
>
> I'll add the region check. I see now why I glossed over this though. The
> shared nature of a DCD partition is defined in the DSMAS.
>
> Is that correct? Or am I missing something in the spec?
Yes. That's matches my understanding (I might also be missing something
of course :)
> > > +static int cxl_add_pending(struct cxl_memdev_state *mds)
> > > +{
> > > + struct device *dev = mds->cxlds.dev;
> > > + struct cxl_extent *extent;
> > > + unsigned long index;
> > > + unsigned long cnt = 0;
> > > + int rc;
> > > +
> > > + xa_for_each(&mds->pending_extents, index, extent) {
> > > + if (validate_add_extent(mds, extent)) {
> >
> >
> > Add a comment here that not accepting an extent but
> > accepting some or none means this one was rejected (I'd forgotten how
> > that bit worked)
>
> Ok yeah that may not be clear without reading the spec closely.
>
> /*
> * Any extents which are to be rejected are omitted from
> * the response. An empty response means all are
> * rejected.
> */
Perfect.
>
> >
> > > + dev_dbg(dev, "unconsumed DC extent DPA:%#llx LEN:%#llx\n",
> > > + le64_to_cpu(extent->start_dpa),
> > > + le64_to_cpu(extent->length));
> > > + xa_erase(&mds->pending_extents, index);
> > > + kfree(extent);
> > > + continue;
> > > + }
> > > + cnt++;
> > > + }
> > > + rc = cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> > > + &mds->pending_extents, cnt);
> > > + xa_for_each(&mds->pending_extents, index, extent) {
> > > + xa_erase(&mds->pending_extents, index);
> > > + kfree(extent);
> > > + }
> > > + return rc;
> > > +}
> > > +
> > > static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > > enum cxl_event_log_type type)
> > > {
> > > @@ -1044,9 +1287,17 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > > if (!nr_rec)
> > > break;
> > >
> > > - for (i = 0; i < nr_rec; i++)
> > > + for (i = 0; i < nr_rec; i++) {
> > > __cxl_event_trace_record(cxlmd, type,
> > > &payload->records[i]);
> > > + if (type == CXL_EVENT_TYPE_DCD) {
> > Bit of a deep indent so maybe flip logic?
> >
> > Logic wise it's a bit dubious as we might want to match other
> > types in future though so up to you.
>
> I was thinking more along these lines. But the rc is unneeded. That print
> can be in the handle function.
>
>
> Something like this:
Looks good to me. (cut to save on scrolling!)
Jonathan
next prev parent reply other threads:[~2024-08-30 9:21 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-16 14:44 [PATCH v3 00/25] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-08-16 14:44 ` [PATCH v3 01/25] range: Add range_overlaps() Ira Weiny
2024-08-16 14:44 ` [PATCH v3 02/25] printk: Add print format (%par) for struct range Ira Weiny
2024-08-20 14:08 ` Petr Mladek
2024-08-22 17:53 ` Ira Weiny
2024-08-22 18:10 ` Andy Shevchenko
2024-08-26 13:23 ` Petr Mladek
2024-08-26 17:23 ` Andy Shevchenko
2024-08-26 21:17 ` Ira Weiny
2024-08-27 7:43 ` Petr Mladek
2024-08-27 13:21 ` Andy Shevchenko
2024-08-27 21:44 ` Ira Weiny
2024-08-27 13:17 ` Andy Shevchenko
2024-08-28 4:12 ` Ira Weiny
2024-08-28 13:50 ` Andy Shevchenko
2024-08-26 13:17 ` Petr Mladek
2024-08-26 13:24 ` Andy Shevchenko
2024-08-16 14:44 ` [PATCH v3 03/25] dax: Document dax dev range tuple Ira Weiny
2024-08-16 20:58 ` Dave Jiang
2024-08-23 15:29 ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 04/25] cxl/pci: Delay event buffer allocation Ira Weiny
2024-09-03 6:49 ` Li, Ming4
2024-09-05 19:44 ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-09-03 6:50 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 06/25] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-08-16 21:45 ` Dave Jiang
2024-08-20 17:01 ` Fan Ni
2024-08-23 2:01 ` Ira Weiny
2024-08-23 2:02 ` Ira Weiny
2024-08-23 15:45 ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 07/25] cxl/core: Separate region mode from decoder mode ira.weiny
2024-08-16 22:11 ` Dave Jiang
2024-08-23 15:47 ` Jonathan Cameron
2024-09-03 6:56 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 08/25] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-08-16 22:14 ` Dave Jiang
2024-09-03 6:57 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 09/25] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-08-16 23:08 ` Dave Jiang
2024-08-23 2:26 ` Ira Weiny
2024-08-23 16:09 ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 10/25] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-08-16 23:17 ` Dave Jiang
2024-08-23 16:12 ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 11/25] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-08-16 23:42 ` Dave Jiang
2024-08-23 2:28 ` Ira Weiny
2024-08-23 14:58 ` Dave Jiang
2024-08-23 16:14 ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 12/25] cxl/region: Refactor common create region code Ira Weiny
2024-08-16 23:43 ` Dave Jiang
2024-08-22 18:51 ` Fan Ni
2024-08-23 16:17 ` Jonathan Cameron
2024-09-03 7:04 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 13/25] cxl/region: Add sparse DAX region support ira.weiny
2024-08-16 23:51 ` Dave Jiang
2024-08-22 18:50 ` Fan Ni
2024-08-23 16:59 ` Jonathan Cameron
2024-09-03 2:15 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 14/25] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-08-16 23:57 ` Dave Jiang
2024-08-22 21:39 ` Fan Ni
2024-08-23 17:01 ` Jonathan Cameron
2024-09-03 7:06 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 15/25] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-08-22 21:41 ` Fan Ni
2024-09-03 7:07 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 16/25] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-08-17 0:02 ` Dave Jiang
2024-08-23 17:08 ` Jonathan Cameron
2024-09-03 7:09 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 17/25] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-08-19 16:35 ` Dave Jiang
2024-08-23 17:12 ` Jonathan Cameron
2024-09-03 7:10 ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-08-19 18:51 ` Dave Jiang
2024-08-23 2:53 ` Ira Weiny
2024-08-23 21:32 ` Fan Ni
2024-08-27 12:08 ` Jonathan Cameron
2024-08-27 16:02 ` Fan Ni
2024-08-27 13:18 ` Jonathan Cameron
2024-08-29 21:16 ` Ira Weiny
2024-08-30 9:21 ` Jonathan Cameron [this message]
2024-09-03 6:37 ` Li, Ming4
2024-09-05 19:30 ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 19/25] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-08-19 19:05 ` Dave Jiang
2024-08-23 2:58 ` Ira Weiny
2024-08-23 17:17 ` Jonathan Cameron
2024-08-23 17:19 ` Jonathan Cameron
2024-08-28 17:44 ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 20/25] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-08-19 22:35 ` Dave Jiang
2024-08-27 13:26 ` Jonathan Cameron
2024-08-29 21:36 ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 21/25] dax/region: Create resources on sparse DAX regions ira.weiny
2024-08-18 11:38 ` Markus Elfring
2024-08-19 23:30 ` Dave Jiang
2024-08-23 14:28 ` Ira Weiny
2024-08-27 14:12 ` Jonathan Cameron
2024-08-29 21:54 ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 22/25] cxl/region: Read existing extents on region creation ira.weiny
2024-08-20 0:06 ` Dave Jiang
2024-08-23 21:31 ` Ira Weiny
2024-08-27 14:19 ` Jonathan Cameron
2024-09-05 19:35 ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 23/25] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-08-20 22:54 ` Dave Jiang
2024-08-26 18:02 ` Ira Weiny
2024-08-27 14:20 ` Jonathan Cameron
2024-09-05 19:38 ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 24/25] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-08-20 23:30 ` Dave Jiang
2024-08-27 14:32 ` Jonathan Cameron
2024-09-09 13:57 ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 25/25] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-08-27 14:39 ` Jonathan Cameron
2024-09-09 14:08 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240830102143.000048fc@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=clm@fb.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=dsterba@suse.com \
--cc=fan.ni@samsung.com \
--cc=ira.weiny@intel.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=navneet.singh@intel.com \
--cc=nvdimm@lists.linux.dev \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).