Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, Dave Jiang <dave.jiang@intel.com>,
	"Alejandro Lucero" <alucerop@amd.com>,
	Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH 4/4] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
Date: Fri, 17 Jan 2025 10:37:05 -0800	[thread overview]
Message-ID: <678aa351ddd8e_20fa29490@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20250117111215.000061c7@huawei.com>

Jonathan Cameron wrote:
> On Thu, 16 Jan 2025 22:10:50 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> > allocations being distinct from RAM allocations in specific ways when in
> > practice the allocation rules are only relative to DPA partition index.
> > 
> > The rules for cxl_dpa_alloc() are:
> > 
> > - allocations can only come from 1 partition
> > 
> > - if allocating at partition-index-N, all free space in partitions less
> >   than partition-index-N must be skipped over
> > 
> > Use the new 'struct cxl_dpa_partition' array to support allocation with
> > an arbitrary number of DPA partitions on the device.
> > 
> > A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
> > concept and supersede it with looking up the memory properties from
> > partition metadata.
> 
> If we'd move to meta data and these were tightly packed then I'd be fine
> with nr_partitions. Until that step, I find it confusing.
> 
> A few comments inline. This series does bring some advantages though
> at cost of code that needs a bit more documentation at the very least.
> 
> > 
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/hdm.c |  167 +++++++++++++++++++++++++++++++++---------------
> >  drivers/cxl/cxlmem.h   |    9 +++
> >  2 files changed, 125 insertions(+), 51 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 7e1559b3ed88..4a2816102a1e 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -223,6 +223,30 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
> >  
> 
> Some documentation would be useful. I'm not sure I understand
> this algorithm correctly.
> 
> I think this complexity is all about ensuring that skip regions have
> their resources broken up on partition boundaries?
> 
> Can we potentially relax constraints a little more to make this
> easier to read by not caring on the ordering?  Find overlap
> of skip region with any partition and remove that bit unconditionally.
> 
> > +static void release_skip(struct cxl_dev_state *cxlds,
> > +			 const resource_size_t skip_base,
> > +			 const resource_size_t skip_len)
> > +{
> > +	resource_size_t skip_start = skip_base, skip_rem = skip_len;
> > +
> > +	for (int i = 0; i < cxlds->nr_partitions; i++) {
> > +		const struct resource *part_res = &cxlds->part[i].res;
> > +		resource_size_t skip_end, skip_size;
> > +
> > +		if (skip_start < part_res->start || skip_start > part_res->end)
> > +			continue;
> > +
> > +		skip_end = min(part_res->end, skip_start + skip_rem - 1);
> > +		skip_size = skip_end - skip_start + 1;
> > +		__release_region(&cxlds->dpa_res, skip_start, skip_size);
> > +		skip_start += skip_size;
> > +		skip_rem -= skip_size;
> > +
> > +		if (!skip_rem)
> > +			break;
> > +	}
> > +}
> > +
> >  /*
> >   * Must be called in a context that synchronizes against this decoder's
> >   * port ->remove() callback (like an endpoint decoder sysfs attribute)
> > @@ -241,7 +265,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> >  	skip_start = res->start - cxled->skip;
> >  	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> >  	if (cxled->skip)
> > -		__release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> > +		release_skip(cxlds, skip_start, cxled->skip);
> >  	cxled->skip = 0;
> >  	cxled->dpa_res = NULL;
> >  	put_device(&cxled->cxld.dev);
> > @@ -268,6 +292,47 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> >  	__cxl_dpa_release(cxled);
> >  }
> >  
> > +static int request_skip(struct cxl_dev_state *cxlds,
> > +			struct cxl_endpoint_decoder *cxled,
> > +			const resource_size_t skip_base,
> > +			const resource_size_t skip_len)
> > +{
> > +	resource_size_t skip_start = skip_base, skip_rem = skip_len;
> > +
> > +	for (int i = 0; i < cxlds->nr_partitions; i++) {
> 
> Likewise, if we relax a constraint on ordering can we make this simpler?
> Would just need to keep track on whether we had reserved enough. I'm not
> 100% sure that is sufficient for the final error check.
> 
> > +		const struct resource *part_res = &cxlds->part[i].res;
> > +		struct cxl_port *port = cxled_to_port(cxled);
> > +		resource_size_t skip_end, skip_size;
> > +		struct resource *res;
> > +
> > +		if (skip_start < part_res->start || skip_start > part_res->end)
> > +			continue;
> > +
> > +		skip_end = min(part_res->end, skip_start + skip_rem - 1);
> > +		skip_size = skip_end - skip_start + 1;
> > +
> > +		res = __request_region(&cxlds->dpa_res, skip_start, skip_size,
> > +				       dev_name(&cxled->cxld.dev), 0);
> > +		if (!res) {
> > +			dev_dbg(cxlds->dev,
> > +				"decoder%d.%d: failed to reserve skipped space\n",
> > +				port->id, cxled->cxld.id);
> > +			break;
> > +		}
> > +		skip_start += skip_size;
> > +		skip_rem -= skip_size;
> > +		if (!skip_rem)
> > +			break;
> > +	}
> > +
> > +	if (skip_rem == 0)
> > +		return 0;
> > +
> > +	release_skip(cxlds, skip_base, skip_len - skip_rem);
> Ah, this complicates possibility of relaxations as we'd need to pass in what
> partion number we'd reached when fail occurred.
> Maybe this is the best algorithm, but I'd definitely like docs for this
> function to make it clear what it's assumptions are (paritions in order of DPA etc)

Yes, the software requirements and assumptions for tracking "DPABase"
deserve to be called out in comments for these helpers. Those "DPABase"
tracking assumptions are spelled out in the implementation note in
"8.2.4.19.13 Decoder Protection".

  reply	other threads:[~2025-01-17 18:37 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17  6:10 [PATCH 0/4] cxl: DPA partition metadata is a mess Dan Williams
2025-01-17  6:10 ` [PATCH 1/4] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-17 10:03   ` Jonathan Cameron
2025-01-17 17:47     ` Dan Williams
2025-01-17 10:24   ` Alejandro Lucero Palau
2025-01-17 17:54     ` Dan Williams
2025-01-17 18:45   ` Ira Weiny
2025-01-17  6:10 ` [PATCH 2/4] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-17 10:20   ` Jonathan Cameron
2025-01-17 10:23     ` Jonathan Cameron
2025-01-17 17:55       ` Dan Williams
2025-01-17 13:33   ` Alejandro Lucero Palau
2025-01-17 20:47     ` Dan Williams
2025-01-17  6:10 ` [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-17 10:52   ` Jonathan Cameron
2025-01-17 13:38     ` Alejandro Lucero Palau
2025-01-17 18:23     ` Dan Williams
2025-01-17 20:32       ` Ira Weiny
2025-01-20 12:24       ` Alejandro Lucero Palau
2025-01-31 23:54         ` Dan Williams
2025-01-17 15:58   ` Alejandro Lucero Palau
2025-01-17 22:52     ` Dan Williams
2025-01-17 20:42   ` Ira Weiny
2025-01-17 22:08   ` Ira Weiny
2025-01-31 23:39     ` Dan Williams
2025-01-17  6:10 ` [PATCH 4/4] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-17 11:12   ` Jonathan Cameron
2025-01-17 18:37     ` Dan Williams [this message]
2025-01-17 15:42   ` Alejandro Lucero Palau
2025-01-17 20:57     ` Dan Williams
2025-01-20 12:39       ` Alejandro Lucero Palau
2025-02-01  0:08         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=678aa351ddd8e_20fa29490@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alucerop@amd.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox