linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>, <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	"Navneet Singh" <navneet.singh@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <nvdimm@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 21/28] cxl/extent: Process DCD events and realize region extents
Date: Thu, 17 Oct 2024 16:39:57 -0500	[thread overview]
Message-ID: <6711842d88fa_2cee2946a@iweiny-mobl.notmuch> (raw)
In-Reply-To: <20241010155821.00005079@Huawei.com>

Jonathan Cameron wrote:
> On Mon, 07 Oct 2024 18:16:27 -0500
> ira.weiny@intel.com wrote:
> 
> > From: Navneet Singh <navneet.singh@intel.com>
> > 
> > A dynamic capacity device (DCD) sends events to signal the host for
> > changes in the availability of Dynamic Capacity (DC) memory.  These
> > events contain extents describing a DPA range and meta data for memory
> > to be added or removed.  Events may be sent from the device at any time.
> > 
> > Three types of events can be signaled, Add, Release, and Force Release.
> > 
> > On add, the host may accept or reject the memory being offered.  If no
> > region exists, or the extent is invalid, the extent should be rejected.
> > Add extent events may be grouped by a 'more' bit which indicates those
> > extents should be processed as a group.
> > 
> > On remove, the host can delay the response until the host is safely not
> > using the memory.  If no region exists the release can be sent
> > immediately.  The host may also release extents (or partial extents) at
> > any time.  Thus the 'more' bit grouping of release events is of less
> > value and can be ignored in favor of sending multiple release capacity
> > responses for groups of release events.
> 
> True today - I think that would be an error for shared extents
> though as they need to be released in one go.  We can deal with
> that when it matters.  
> 
> 
> Mind you patch seems to try to handle more bit anyway, so maybe just
> remove that discussion from this description?

It only handles more bit response on ADD because on RELEASE the count is always
1.


+       if (cxl_send_dc_response(mds, CXL_MBOX_OP_RELEASE_DC, &extent_list, 1)) 
+               dev_dbg(dev, "Failed to release [range 0x%016llx-0x%016llx]\n", 
+                       range->start, range->end);                              


For shared; a flag will need to be added to the extents and additional logic to
group these extents for checking use etc.  

I agree, we need to handle that later on and get this basic support in.  For
now I think my comments are correct WRT the sending of release responses.

> > 
> > Simplify extent tracking with the following restrictions.
> > 
> > 	1) Flag for removal any extent which overlaps a requested
> > 	   release range.
> > 	2) Refuse the offer of extents which overlap already accepted
> > 	   memory ranges.
> > 	3) Accept again a range which has already been accepted by the
> > 	   host.  Eating duplicates serves three purposes.  First, this
> > 	   simplifies the code if the device should get out of sync with
> > 	   the host. 
> 
> Maybe scream about this a little.  AFAIK that happening is a device
> bug.

Agreed but because of the 2nd purpose this is difficult to scream about because
this situation can come up in normal operation.  Here is the scenario:

1) Device has 2 DCD partitions active, A and B
2) Host crashes
3) Region X is created on A
4) Region Y is created on B
5) Region Y scans for extents
6) Region X surfaces a new extent while Y is scanning
7) Gen number changes due to new extent in X
8) Region Y rescans for existing extents and sees duplicates.

These duplicates need to be ignored without signaling an error.

> 
> > And it should be safe to acknowledge the extent
> > 	   again.  Second, this simplifies the code to process existing
> > 	   extents if the extent list should change while the extent
> > 	   list is being read.

This is the 'normal' case.

> > Third, duplicates for a given region
> > 	   which are seen during a race between the hardware surfacing
> > 	   an extent and the cxl dax driver scanning for existing
> > 	   extents will be ignored.
> 
> This last one is a good justification.

I think the second justification is actually better than this one.  Regardless
this makes everything ok and should work.

> 
> > 
> > 	   NOTE: Processing existing extents is done in a later patch.
> > 
> > Management of the region extent devices must be synchronized with
> > potential uses of the memory within the DAX layer.  Create region extent
> > devices as children of the cxl_dax_region device such that the DAX
> > region driver can co-drive them and synchronize with the DAX layer.
> > Synchronization and management is handled in a subsequent patch.
> > 
> > Tag support within the DAX layer is not yet supported.  To maintain
> > compatibility legacy DAX/region processing only tags with a value of 0
> > are allowed.  This defines existing DAX devices as having a 0 tag which
> > makes the most logical sense as a default.
> > 
> > Process DCD events and create region devices.
> > 
> > Signed-off-by: Navneet Singh <navneet.singh@intel.com>
> > Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> A couple of minor comments from me.

I do appreciate the review.


[snip]

> >  
> > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> > +				struct xarray *extent_array, int cnt)
> > +{
> > +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> > +	struct cxl_mbox_dc_response *p;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	struct cxl_extent *extent;
> > +	unsigned long index;
> > +	u32 pl_index;
> > +	int rc;
> > +
> > +	size_t pl_size = struct_size(p, extent_list, cnt);
> > +	u32 max_extents = cnt;
> > +
> > +	/* May have to use more bit on response. */
> 
> I thought you argued in the patch description that it didn't matter if you
> didn't set it?

Only on RELEASE responses.  ADD responses might need it depending on the
payload size and number of extents being added.

Sorry that was not clear.

> 
> > +	if (pl_size > cxl_mbox->payload_size) {
> > +		max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> > +			      sizeof(struct updated_extent_list);
> > +		pl_size = struct_size(p, extent_list, max_extents);
> > +	}
> > +
> > +	struct cxl_mbox_dc_response *response __free(kfree) =
> > +						kzalloc(pl_size, GFP_KERNEL);
> > +	if (!response)
> > +		return -ENOMEM;
> > +
> > +	pl_index = 0;
> > +	xa_for_each(extent_array, index, extent) {
> > +
> > +		response->extent_list[pl_index].dpa_start = extent->start_dpa;
> > +		response->extent_list[pl_index].length = extent->length;
> > +		pl_index++;
> > +		response->extent_list_size = cpu_to_le32(pl_index);
> > +
> > +		if (pl_index == max_extents) {
> > +			mbox_cmd = (struct cxl_mbox_cmd) {
> > +				.opcode = opcode,
> > +				.size_in = struct_size(response, extent_list,
> > +						       pl_index),
> > +				.payload_in = response,
> > +			};
> > +
> > +			response->flags = 0;
> > +			if (pl_index < cnt)
> > +				response->flags &= CXL_DCD_EVENT_MORE;
> Covered in other branch of thread.

Yep.


[snip]

> 
> >  
> > +/* See CXL 3.0 8.2.9.2.1.5 */
> 
> Maybe update to 3.1? Otherwise patch reviewer needs to open two 
> spec versions!  In 3.1 it is 8.2.9.2.1.6

Yep missed this one.  Thanks,
Ira

  reply	other threads:[~2024-10-17 21:40 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-07 23:16 [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2024-10-07 23:16 ` [PATCH v4 01/28] test printk: Add very basic struct resource tests Ira Weiny
2024-10-08 16:35   ` Andy Shevchenko
2024-10-09 12:24   ` Jonathan Cameron
2024-10-09 17:09   ` Fan Ni
2024-10-10 14:59   ` Petr Mladek
2024-10-11 14:49     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 02/28] printk: Add print format (%pra) for struct range Ira Weiny
2024-10-08 16:56   ` Andy Shevchenko
2024-10-09 12:27     ` Jonathan Cameron
2024-10-09 14:42       ` Andy Shevchenko
2024-10-09 13:30   ` Rasmus Villemoes
2024-10-09 14:41     ` Andy Shevchenko
2024-10-14  0:08       ` Ira Weiny
2024-10-11 16:54     ` Ira Weiny
2024-10-09 17:33   ` Fan Ni
2024-10-11  2:09   ` Bagas Sanjaya
2024-10-17 20:57     ` Ira Weiny
2024-10-25 12:42       ` Bagas Sanjaya
2024-10-07 23:16 ` [PATCH v4 03/28] cxl/cdat: Use %pra for dpa range outputs Ira Weiny
2024-10-09 12:33   ` Jonathan Cameron
2024-10-09 17:34   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 04/28] range: Add range_overlaps() Ira Weiny
2024-10-08 16:10   ` David Sterba
2024-10-09 14:45     ` Andy Shevchenko
2024-10-09 14:46       ` Andy Shevchenko
2024-10-14  0:12         ` Ira Weiny
2024-10-09 15:36       ` David Sterba
2024-10-09 16:04         ` Andy Shevchenko
2024-10-10 15:24     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 05/28] dax: Document dax dev range tuple Ira Weiny
2024-10-09 12:42   ` Jonathan Cameron
2024-10-11 20:40     ` Ira Weiny
2024-10-16 15:48       ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 06/28] cxl/pci: Delay event buffer allocation Ira Weiny
2024-10-09 17:47   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 07/28] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) ira.weiny
2024-10-07 23:16 ` [PATCH v4 08/28] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-10-09 12:49   ` Jonathan Cameron
2024-10-14  0:05     ` Ira Weiny
2024-10-16 15:54       ` Jonathan Cameron
2024-10-16 16:59         ` Kees Cook
2024-10-07 23:16 ` [PATCH v4 09/28] cxl/core: Separate region mode from decoder mode ira.weiny
2024-10-09 12:51   ` Jonathan Cameron
2024-10-09 18:06   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 10/28] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-10-07 23:16 ` [PATCH v4 11/28] cxl/hdm: Add dynamic capacity size support to endpoint decoders ira.weiny
2024-10-10 12:45   ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny
2024-10-09 14:42   ` Rafael J. Wysocki
2024-10-11 20:38     ` Ira Weiny
2024-10-14 20:52       ` Wysocki, Rafael J
2024-10-09 18:16   ` Fan Ni
2024-10-14  1:16     ` Ira Weiny
2024-10-10 12:51   ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 13/28] cxl/mem: Expose DCD partition capabilities in sysfs ira.weiny
2024-10-09 20:46   ` Fan Ni
2024-10-14  1:34     ` Ira Weiny
2024-10-10 13:04   ` Jonathan Cameron
2024-10-16 21:34     ` Ira Weiny
2024-10-11  2:15   ` Bagas Sanjaya
2024-10-07 23:16 ` [PATCH v4 14/28] cxl/port: Add endpoint decoder DC mode support to sysfs ira.weiny
2024-10-10 13:14   ` Jonathan Cameron
2024-10-17 17:51     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 15/28] cxl/region: Refactor common create region code Ira Weiny
2024-10-10 13:18   ` Jonathan Cameron
2024-10-17 20:29     ` Ira Weiny
2024-10-10 16:27   ` Fan Ni
2024-10-24  2:17   ` Alison Schofield
2024-10-07 23:16 ` [PATCH v4 16/28] cxl/region: Add sparse DAX region support ira.weiny
2024-10-10 13:46   ` Jonathan Cameron
2024-10-10 17:41   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 17/28] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2024-10-10 13:49   ` Jonathan Cameron
2024-10-10 17:58   ` Fan Ni
2024-10-24  2:33     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 18/28] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-10-10 18:07   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 19/28] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-10-10 14:15   ` Jonathan Cameron
2024-10-10 18:25   ` Fan Ni
2024-10-24  3:09     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 20/28] cxl/core: Return endpoint decoder information from region search Ira Weiny
2024-10-10 14:21   ` Jonathan Cameron
2024-10-10 18:29   ` Fan Ni
2024-10-24  2:30   ` Alison Schofield
2024-10-07 23:16 ` [PATCH v4 21/28] cxl/extent: Process DCD events and realize region extents ira.weiny
2024-10-09  1:56   ` Li, Ming4
2024-10-09 19:49     ` Ira Weiny
2024-10-10  3:06       ` Li, Ming4
2024-10-14  2:05         ` Ira Weiny
2024-10-10 14:50       ` Jonathan Cameron
2024-10-11 19:14         ` Fan Ni
2024-10-17 21:15         ` Ira Weiny
2024-10-18  9:03           ` Jonathan Cameron
2024-10-21 14:04             ` Ira Weiny
2024-10-21 14:47               ` Jonathan Cameron
2024-10-10 14:58   ` Jonathan Cameron
2024-10-17 21:39     ` Ira Weiny [this message]
2024-10-18  9:09       ` Jonathan Cameron
2024-10-21 18:45         ` Ira Weiny
2024-10-22 17:01           ` Jonathan Cameron
2024-10-07 23:16 ` [PATCH v4 22/28] cxl/region/extent: Expose region extent information in sysfs ira.weiny
2024-10-10 15:01   ` Jonathan Cameron
2024-10-18 18:26     ` Ira Weiny
2024-10-21  9:37       ` Jonathan Cameron
2024-10-14 16:08   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 23/28] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-10-10 15:06   ` Jonathan Cameron
2024-10-21 21:16     ` Ira Weiny
2024-10-14 16:56   ` Fan Ni
2024-10-07 23:16 ` [PATCH v4 24/28] dax/region: Create resources on sparse DAX regions ira.weiny
2024-10-10 15:27   ` Jonathan Cameron
2024-10-23  1:20     ` Ira Weiny
2024-10-23 11:22       ` Jonathan Cameron
2024-10-24  3:50         ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 25/28] cxl/region: Read existing extents on region creation ira.weiny
2024-10-10 15:33   ` Jonathan Cameron
2024-10-24  1:41     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 26/28] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-10-10 15:41   ` Jonathan Cameron
2024-10-24  1:52     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 27/28] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-10-10 15:49   ` Jonathan Cameron
2024-10-24  1:59     ` Ira Weiny
2024-10-07 23:16 ` [PATCH v4 28/28] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-10-10 15:58   ` Jonathan Cameron
2024-10-24  2:23     ` Ira Weiny
2024-10-08 22:57 ` [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni
2024-10-08 23:06   ` Fan Ni
2024-10-10 15:30     ` Ira Weiny
2024-10-10 15:31     ` Ira Weiny
2024-10-21 16:47 ` Fan Ni
2024-10-22 17:05   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6711842d88fa_2cee2946a@iweiny-mobl.notmuch \
    --to=ira.weiny@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=navneet.singh@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).