All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	"Navneet Singh" <navneet.singh@intel.com>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<nvdimm@lists.linux.dev>
Subject: Re: [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents
Date: Fri, 30 Aug 2024 10:21:43 +0100	[thread overview]
Message-ID: <20240830102143.000048fc@Huawei.com> (raw)
In-Reply-To: <66d0e53e9d3e9_f937b294b7@iweiny-mobl.notmuch>

> > > +int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
> > > +{
> > > +	u64 start_dpa = le64_to_cpu(extent->start_dpa);
> > > +	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> > > +	struct cxl_endpoint_decoder *cxled;
> > > +	struct range hpa_range, dpa_range;
> > > +	struct cxl_region *cxlr;
> > > +
> > > +	dpa_range = (struct range) {
> > > +		.start = start_dpa,
> > > +		.end = start_dpa + le64_to_cpu(extent->length) - 1,
> > > +	};
> > > +
> > > +	guard(rwsem_read)(&cxl_region_rwsem);
> > > +	cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled);
> > > +	if (!cxlr) {
> > > +		memdev_release_extent(mds, &dpa_range);  
> > 
> > How does this condition happen?  Perhaps a comment needed.  
> 
> Fair enough.  Proposed comment.
> 
> 	/*
> 	 * No region can happen here for a few reasons:
> 	 *
> 	 * 1) Extents were accepted and the host crashed/rebooted
> 	 *    leaving them in an accepted state.  On reboot the host
> 	 *    has not yet created a region to own them.
> 	 *
> 	 * 2) Region destruction won the race with the device releasing
> 	 *    all the extents.  Here the release will be a duplicate of
> 	 *    the one sent via region destruction.
> 	 *
> 	 * 3) The device is confused and releasing extents for which no
> 	 *    region ever existed.
> 	 *
> 	 * In all these cases make sure the device knows we are not
> 	 * using this extent.
> 	 */
> 
> Item 2 is AFAICS ok with the spec.

I'm not sure I follow 2.  Why would device be releasing extents
if we haven't given them back? We aren't supporting the mess that
is force removal.

> 
> >   
> > > +		return -ENXIO;
> > > +	}
> > > +
> > > +	calc_hpa_range(cxled, cxlr->cxlr_dax, &dpa_range, &hpa_range);
> > > +
> > > +	/* Remove region extents which overlap */
> > > +	return device_for_each_child(&cxlr->cxlr_dax->dev, &hpa_range,
> > > +				     cxlr_rm_extent);
> > > +}
> > > +
> > > +/* Callers are expected to ensure cxled has been attached to a region */
> > > +int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
> > > +{
> > > +	u64 start_dpa = le64_to_cpu(extent->start_dpa);
> > > +	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> > > +	struct cxl_endpoint_decoder *cxled;
> > > +	struct range ed_range, ext_range;
> > > +	struct cxl_dax_region *cxlr_dax;
> > > +	struct cxled_extent *ed_extent;
> > > +	struct cxl_region *cxlr;
> > > +	struct device *dev;
> > > +
> > > +	ext_range = (struct range) {
> > > +		.start = start_dpa,
> > > +		.end = start_dpa + le64_to_cpu(extent->length) - 1,
> > > +	};
> > > +
> > > +	guard(rwsem_read)(&cxl_region_rwsem);
> > > +	cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled);
> > > +	if (!cxlr)
> > > +		return -ENXIO;
> > > +
> > > +	cxlr_dax = cxled->cxld.region->cxlr_dax;
> > > +	dev = &cxled->cxld.dev;
> > > +	ed_range = (struct range) {
> > > +		.start = cxled->dpa_res->start,
> > > +		.end = cxled->dpa_res->end,
> > > +	};
> > > +
> > > +	dev_dbg(&cxled->cxld.dev, "Checking ED (%pr) for extent %par\n",
> > > +		cxled->dpa_res, &ext_range);
> > > +
> > > +	if (!range_contains(&ed_range, &ext_range)) {
> > > +		dev_err_ratelimited(dev,
> > > +				    "DC extent DPA %par (%*phC) is not fully in ED %par\n",
> > > +				    &ext_range.start, CXL_EXTENT_TAG_LEN,
> > > +				    extent->tag, &ed_range);
> > > +		return -ENXIO;
> > > +	}
> > > +
> > > +	if (extents_contain(cxlr_dax, cxled, &ext_range))  
> > 
> > This case confuses me. If the extents are already there I think we should
> > error out or at least print something as that's very wrong.  
> 
> I thought we discussed this in one of the community meetings that it would be
> ok to accept these.  We could certainly print a warning here.

A warning probably does the job of indicating that 'something' odd is going on.
A device should never resend an extent overlapping one it sent before, (assuming
no removal happened inbetween) so this should never happen, but who knows :(

> 
> In all honestly I'm wondering if these restrictions are really needed anymore.
> But at the same time I really, really, really don't think anyone has a good use
> case to have to support these cases.  So I'm keeping the code simple for now.

Fair enough.
> 
> >   
> > > +		return 0;
> > > +
> > > +	if (extents_overlap(cxlr_dax, cxled, &ext_range))
> > > +		return -ENXIO;
> > > +
> > > +	ed_extent = kzalloc(sizeof(*ed_extent), GFP_KERNEL);
> > > +	if (!ed_extent)
> > > +		return -ENOMEM;
> > > +
> > > +	ed_extent->cxled = cxled;
> > > +	ed_extent->dpa_range = ext_range;
> > > +	memcpy(ed_extent->tag, extent->tag, CXL_EXTENT_TAG_LEN);
> > > +
> > > +	dev_dbg(dev, "Add extent %par (%*phC)\n", &ed_extent->dpa_range,
> > > +		CXL_EXTENT_TAG_LEN, ed_extent->tag);
> > > +
> > > +	return cxlr_add_extent(cxlr_dax, cxled, ed_extent);
> > > +}
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 01a447aaa1b1..f629ad7488ac 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -882,6 +882,48 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> > >  
> > > +static int cxl_validate_extent(struct cxl_memdev_state *mds,
> > > +			       struct cxl_extent *extent)
> > > +{
> > > +	u64 start = le64_to_cpu(extent->start_dpa);
> > > +	u64 length = le64_to_cpu(extent->length);
> > > +	struct device *dev = mds->cxlds.dev;
> > > +
> > > +	struct range ext_range = (struct range){
> > > +		.start = start,
> > > +		.end = start + length - 1,
> > > +	};
> > > +
> > > +	if (le16_to_cpu(extent->shared_extn_seq) != 0) {  
> > 
> > That's not the 'main' way to tell if an extent is shared because
> > we could have a single extent (so seq == 0).
> > Should verify it's not in a DCD region that
> > is shareable to make this decision.  
> 
> Ah...  :-/
> 
> > 
> > I've lost track on the region handling so maybe you already do
> > this by not including those regions at all?  
> 
> I don't think so.
> 
> I'll add the region check.  I see now why I glossed over this though.  The
> shared nature of a DCD partition is defined in the DSMAS.
> 
> Is that correct?  Or am I missing something in the spec?

Yes. That's matches my understanding (I might also be missing something
of course :)


> > > +static int cxl_add_pending(struct cxl_memdev_state *mds)
> > > +{
> > > +	struct device *dev = mds->cxlds.dev;
> > > +	struct cxl_extent *extent;
> > > +	unsigned long index;
> > > +	unsigned long cnt = 0;
> > > +	int rc;
> > > +
> > > +	xa_for_each(&mds->pending_extents, index, extent) {
> > > +		if (validate_add_extent(mds, extent)) {  
> > 
> > 
> > Add a comment here that not accepting an extent but
> > accepting some or none means this one was rejected (I'd forgotten how
> > that bit worked)  
> 
> Ok yeah that may not be clear without reading the spec closely.
> 
> 	/*
> 	 * Any extents which are to be rejected are omitted from
> 	 * the response.  An empty response means all are
> 	 * rejected.
> 	 */

Perfect.

> 
> >   
> > > +			dev_dbg(dev, "unconsumed DC extent DPA:%#llx LEN:%#llx\n",
> > > +				le64_to_cpu(extent->start_dpa),
> > > +				le64_to_cpu(extent->length));
> > > +			xa_erase(&mds->pending_extents, index);
> > > +			kfree(extent);
> > > +			continue;
> > > +		}
> > > +		cnt++;
> > > +	}
> > > +	rc = cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> > > +				  &mds->pending_extents, cnt);
> > > +	xa_for_each(&mds->pending_extents, index, extent) {
> > > +		xa_erase(&mds->pending_extents, index);
> > > +		kfree(extent);
> > > +	}
> > > +	return rc;
> > > +}
> > > +

> > >  static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > >  				    enum cxl_event_log_type type)
> > >  {
> > > @@ -1044,9 +1287,17 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > >  		if (!nr_rec)
> > >  			break;
> > >  
> > > -		for (i = 0; i < nr_rec; i++)
> > > +		for (i = 0; i < nr_rec; i++) {
> > >  			__cxl_event_trace_record(cxlmd, type,
> > >  						 &payload->records[i]);
> > > +			if (type == CXL_EVENT_TYPE_DCD) {  
> > Bit of a deep indent so maybe flip logic?
> > 
> > Logic wise it's a bit dubious as we might want to match other
> > types in future though so up to you.  
> 
> I was thinking more along these lines.  But the rc is unneeded.  That print
> can be in the handle function.
> 
> 
> Something like this:
Looks good to me. (cut to save on scrolling!)

Jonathan

  reply	other threads:[~2024-08-30  9:21 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 14:44 [PATCH v3 00/25] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-08-16 14:44 ` [PATCH v3 01/25] range: Add range_overlaps() Ira Weiny
2024-08-16 14:44 ` [PATCH v3 02/25] printk: Add print format (%par) for struct range Ira Weiny
2024-08-20 14:08   ` Petr Mladek
2024-08-22 17:53     ` Ira Weiny
2024-08-22 18:10       ` Andy Shevchenko
2024-08-26 13:23         ` Petr Mladek
2024-08-26 17:23           ` Andy Shevchenko
2024-08-26 21:17             ` Ira Weiny
2024-08-27  7:43               ` Petr Mladek
2024-08-27 13:21                 ` Andy Shevchenko
2024-08-27 21:44                 ` Ira Weiny
2024-08-27 13:17               ` Andy Shevchenko
2024-08-28  4:12                 ` Ira Weiny
2024-08-28 13:50                   ` Andy Shevchenko
2024-08-26 13:17       ` Petr Mladek
2024-08-26 13:24         ` Andy Shevchenko
2024-08-16 14:44 ` [PATCH v3 03/25] dax: Document dax dev range tuple Ira Weiny
2024-08-16 20:58   ` Dave Jiang
2024-08-23 15:29   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 04/25] cxl/pci: Delay event buffer allocation Ira Weiny
2024-09-03  6:49   ` Li, Ming4
2024-09-05 19:44   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-09-03  6:50   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 06/25] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-08-16 21:45   ` Dave Jiang
2024-08-20 17:01     ` Fan Ni
2024-08-23  2:01       ` Ira Weiny
2024-08-23  2:02       ` Ira Weiny
2024-08-23 15:45   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 07/25] cxl/core: Separate region mode from decoder mode ira.weiny
2024-08-16 22:11   ` Dave Jiang
2024-08-23 15:47   ` Jonathan Cameron
2024-09-03  6:56   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 08/25] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-08-16 22:14   ` Dave Jiang
2024-09-03  6:57   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 09/25] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-08-16 23:08   ` Dave Jiang
2024-08-23  2:26     ` Ira Weiny
2024-08-23 16:09   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 10/25] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-08-16 23:17   ` Dave Jiang
2024-08-23 16:12   ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 11/25] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-08-16 23:42   ` Dave Jiang
2024-08-23  2:28     ` Ira Weiny
2024-08-23 14:58       ` Dave Jiang
2024-08-23 16:14       ` Jonathan Cameron
2024-08-16 14:44 ` [PATCH v3 12/25] cxl/region: Refactor common create region code Ira Weiny
2024-08-16 23:43   ` Dave Jiang
2024-08-22 18:51   ` Fan Ni
2024-08-23 16:17   ` Jonathan Cameron
2024-09-03  7:04   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 13/25] cxl/region: Add sparse DAX region support ira.weiny
2024-08-16 23:51   ` Dave Jiang
2024-08-22 18:50   ` Fan Ni
2024-08-23 16:59   ` Jonathan Cameron
2024-09-03  2:15   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 14/25] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-08-16 23:57   ` Dave Jiang
2024-08-22 21:39   ` Fan Ni
2024-08-23 17:01   ` Jonathan Cameron
2024-09-03  7:06   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 15/25] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-08-22 21:41   ` Fan Ni
2024-09-03  7:07   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 16/25] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-08-17  0:02   ` Dave Jiang
2024-08-23 17:08   ` Jonathan Cameron
2024-09-03  7:09   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 17/25] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-08-19 16:35   ` Dave Jiang
2024-08-23 17:12   ` Jonathan Cameron
2024-09-03  7:10   ` Li, Ming4
2024-08-16 14:44 ` [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-08-19 18:51   ` Dave Jiang
2024-08-23  2:53     ` Ira Weiny
2024-08-23 21:32   ` Fan Ni
2024-08-27 12:08     ` Jonathan Cameron
2024-08-27 16:02       ` Fan Ni
2024-08-27 13:18   ` Jonathan Cameron
2024-08-29 21:16     ` Ira Weiny
2024-08-30  9:21       ` Jonathan Cameron [this message]
2024-09-03  6:37   ` Li, Ming4
2024-09-05 19:30   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 19/25] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-08-19 19:05   ` Dave Jiang
2024-08-23  2:58     ` Ira Weiny
2024-08-23 17:17       ` Jonathan Cameron
2024-08-23 17:19   ` Jonathan Cameron
2024-08-28 17:44   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 20/25] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-08-19 22:35   ` Dave Jiang
2024-08-27 13:26   ` Jonathan Cameron
2024-08-29 21:36     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 21/25] dax/region: Create resources on sparse DAX regions ira.weiny
2024-08-18 11:38   ` Markus Elfring
2024-08-19 23:30   ` Dave Jiang
2024-08-23 14:28     ` Ira Weiny
2024-08-27 14:12   ` Jonathan Cameron
2024-08-29 21:54     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 22/25] cxl/region: Read existing extents on region creation ira.weiny
2024-08-20  0:06   ` Dave Jiang
2024-08-23 21:31     ` Ira Weiny
2024-08-27 14:19   ` Jonathan Cameron
2024-09-05 19:35   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 23/25] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-08-20 22:54   ` Dave Jiang
2024-08-26 18:02     ` Ira Weiny
2024-08-27 14:20   ` Jonathan Cameron
2024-09-05 19:38   ` Fan Ni
2024-08-16 14:44 ` [PATCH v3 24/25] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-08-20 23:30   ` Dave Jiang
2024-08-27 14:32   ` Jonathan Cameron
2024-09-09 13:57     ` Ira Weiny
2024-08-16 14:44 ` [PATCH v3 25/25] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-08-27 14:39   ` Jonathan Cameron
2024-09-09 14:08     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240830102143.000048fc@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=dsterba@suse.com \
    --cc=fan.ni@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=navneet.singh@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.