public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
From: "Fontenot, Nathan" <nafonten@amd.com>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
	Nathan Fontenot <nathan.fontenot@amd.com>
Cc: alison.schofield@intel.com, dan.j.williams@intel.com,
	linux-cxl@vger.kernel.org
Subject: Re: [RFC] cxl: Update Soft Reserved resources upon region creation
Date: Thu, 17 Oct 2024 09:49:01 -0500	[thread overview]
Message-ID: <45f3d3e7-4145-4bc9-96bb-98d7c10c35e0@amd.com> (raw)
In-Reply-To: <20241016164324.000037b9@Huawei.com>

Hi Jonathan,

On 10/16/2024 10:43 AM, Jonathan Cameron wrote:
> On Fri, 4 Oct 2024 13:17:54 -0500
> Nathan Fontenot <nathan.fontenot@amd.com> wrote:
> 
>> Update handling of SOFT RESERVE iomem resources that intersect with
>> CXL region resources to remove the intersections from the SOFT RESERVE
>> resources. The current approach of leaving the SOFT RESERVE
>> resource as is can cause failures during hotplug replace of CXL
>> devices because the resource is not available for reuse after
>> teardown.
> Good to give an example here + any debug error messages etc.

Good point. I'll include a better explanation of the failure with examples
in the next version.

> 
>>
>> The approach sought is to trim out any pieces of SOFT RESERVE
>> resources that intersect with CXL regions. To do this, first set
>> aside any SOFT RESERVE resources that intersect with a CFMWS
>> into a separate resource tree during e820__reserve_resources_late()
>> that would have been otherwise added to the iomem resource tree.
>>
>> As CXL regions are created the cxl resource created for the new
>> region is used to trim intersections from the SOFT RESERVE
>> resources that were previously set aside.
>>
>> The next steps are to add any SOFT RESERVE resources remaining to the
>> iomem resource tree after CXL device probe completes and to notify
>> the dax driver so it may consume the added SOFT RESERVE resources.
>>
>> This patch includes the use of a delayed work queue to wait
>> for CXL device probe completion and then have a worker thread add
>> any remaining SOFT RESERVE resources to the iomem resource tree.
>>
>> Not in this patch is notification of the dax driver so it may consume
>> the SOFT RESERVE resources.
>>
>> The goal of presenting this RFC is to drive discussion of the
>> current approach for trimming SOFT RESERVE resources, the use of
>> a delayed work queue to add remaining SOFT RESERVE resources to
>> the iomem resource tree, and methods for notifying the dax driver
>> of any newly added SOFT RESERVE resources.
>>
>> NOTE: As this is a RFC the temporary pr_err("CXL DEBUG...")  messages
>> have been left in to aid in testing and validation.
>>
>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
>> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> Trivial, but why Alison's SoB?  She didn't send the patch which would
> be normal reason for last SoB in a list like this (handling of the patch).
> 
> Maybe a Co-developed-by is missing?

Alison helped develop the code so a Co-developed-by should be here.

> 
> Whilst I appreciate this is an RFC, the arch specific bleed
> into CXL code needs to go.  + we need some info here on whether this
> is an x86 specific issue.  I'm fairly sure it's not and that should
> be called out as future work.

As far as I know this is an x86 issue for handling the SOFT RESERVE
resources, if other arches are seeing this than we need to update the approach.

> 
> Also, much of this is actually generic, with only a few bits
> being e820 related.  Can we pull the soft reserve list etc out
> of them into generic code.

Yes, we could do that. The handling of the soft reserve list doesn't
need to live in the e820 code.

> 
> To me the general approach seems reasonable but I'd suggest +CC linux-mm
> as the issue maybe more about the impacts on DAX than this working for CXL.

ok, I'll add linux-mm. One of the pieces I wanted to get feedback on was the
dax component and how we could invoke the dax driver to consume any soft
reserves that remain after the cxl driver removes intersecting cxl region
resources.

> 
> Comments inline are mostly general suggestions of simplifications
> and refactoring to aid readability.
> 
>> ---
>>  arch/x86/include/asm/e820/api.h |   3 +
>>  arch/x86/kernel/e820.c          | 156 +++++++++++++++++++++++++++++++-
>>  drivers/cxl/core/region.c       |  14 ++-
>>  drivers/cxl/port.c              |  34 +++++++
>>  4 files changed, 203 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
>> index 2e74a7f0e935..542a52871beb 100644
>> --- a/arch/x86/include/asm/e820/api.h
>> +++ b/arch/x86/include/asm/e820/api.h
>> @@ -44,6 +44,9 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn);
>>  
>>  extern int  e820__get_entry_type(u64 start, u64 end);
>>  
>> +extern void e820__trim_soft_reserves(const struct resource *cxl_res);
>> +extern void e820__insert_soft_reserves(void);
>> +
>>  /*
>>   * Returns true iff the specified range [start,end) is completely contained inside
>>   * the ISA region.
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>> index 4893d30ce438..855c26460bb9 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
> 
> 
> 
>> +static void e820__trim_soft_reserve(struct resource *res,
>> +				    const struct resource *cxl_res)
>> +{
>> +	resource_size_t new_start, new_end;
>> +	int rc;
>> +
>> +	pr_err("CXL DEBUG Trimming Soft Reserves for %pr\n", cxl_res);
>> +
>> +	if (res->start == cxl_res->start && res->end == cxl_res->end) {
>> +		pr_err("CXL DEBUG Releasing resource %pr\n", res);
>> +		release_resource(res);
>> +		kfree(res);
>> +	} else if (res->start == cxl_res->start || res->end == cxl_res->end) {
>> +		if (res->start == cxl_res->start) {
> Not a huge amount of sharing in here.
> 	} else if (res->start = cxl_res->start) {	
> 		/* Left over bit at top end */
> 		rc = adjust_resource(res, cxl_res->end + 1, res->end - cxl_res->end);
> 		if (rc)...
> 
> 	} else if (res->end == cxl_res->end) {
> 		/* Left over bit at bottom end */
> 		rc = adjust_resource(res, res->start, cxl_res->start - res->start);
> 		if (rc) ...
> 	} else {
> 		/* Left over bits on either side. */
> 	}

Yep, I'll update that.

>> +			new_start = cxl_res->end + 1;
>> +			new_end = res->end;
>> +		} else {
>> +			new_start = res->start;
>> +			new_end = cxl_res->start - 1;
>> +		}
>> +
>> +		pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n",
>> +		       res, new_start, new_end);
>> +		rc = adjust_resource(res, new_start, new_end - new_start + 1);
>> +		if (rc)
>> +			pr_debug("Cannot adjust soft reserved resource %pr\n",
>> +				 res);
>> +	} else {
>> +		new_start = res->start;
>> +		new_end = res->end;
> Confusing use of parameters here as they don't match use in early legs.
> Maybe just putting the values inline would be clearer.

will do.

>> +
>> +		/*Adjust existing to beginning resource */
>> +		pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", res,
>> +		       new_start, cxl_res->start);
>> +		adjust_resource(res, new_start, cxl_res->start - new_start + 1);
> 
> Off by 1?  It's different form above leftover bottom part (should be the same).

I thought I got the math right, maybe I didn't. I'll re-run tests to and check it.

> 
> 
>> +
>> +		/* Add new resource for end piece */
>> +		e820__add_soft_reserve(cxl_res->end + 1, new_end - cxl_res->end,
>> +				       res->flags);
>> +	}
>> +}
>> +
>> +void e820__trim_soft_reserves(const struct resource *cxl_res)
>> +{
>> +	struct resource *res, *next;
>> +
>> +	pr_err("CXL DEBUG Trimming Soft Reserves\n");
>> +	for (res = e820_sr_res.child; res; res = next) {
> 	for (res = e820_sr_res.child; res; res = res->sibling)
> is the same I think and avoids need for next.

Yes, it should be. I'll update it.

>> +		next = res->sibling;
>> +
>> +		if (resource_contains(res, cxl_res)) {
>> +			e820__trim_soft_reserve(res, cxl_res);
>> +			break;
> return;
> 
>> +		}
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(e820__trim_soft_reserves);
>> +
>> +static int __init e820_parse_cfmws(union acpi_subtable_headers *hdr, void *arg,
>> +				   const unsigned long unused)
>> +{
>> +	struct acpi_cedt_cfmws *cfmws;
>> +	struct resource *res = arg;
>> +	struct resource cfmws_res;
>> +
>> +	/* Validation check, remove when finished debugging */
>> +	if (!res->parent && res->end)
>> +		pr_err("CXL DEBUG Should insert %pr\n", res);
>> +
>> +	if (res->parent || !res->end)
>> +		return 0;
>> +
>> +	cfmws = (struct acpi_cedt_cfmws *)hdr;
>> +	cfmws_res = DEFINE_RES_MEM(cfmws->base_hpa,
>> +				   cfmws->base_hpa + cfmws->window_size);
>> +	pr_err("CXL DEBUG Found CFMWS: %pr\n", &cfmws_res);
>> +
>> +	if (resource_overlaps(&cfmws_res, res)) {
>> +		pr_err("CXL DEBUG Found SOFT RESERVE intersection %llx - %llx : %llx - %llx\n",
>> +		       res->start, res->end, cfmws_res.start, cfmws_res.end);
>> +		e820__add_soft_reserve(res->start, resource_size(res),
>> +				       res->flags);
>> +		return 1;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  void __init e820__reserve_resources_late(void)
>>  {
>> -	int i;
>> +	int i, rc;
>>  	struct resource *res;
>>  
>> +	/*
>> +	 * Prior to inserting SOFT_RESERVED resources we want to check for an
>> +	 * intersection with potential CXL resources. Any SOFT_RESERVED resources
>> +	 * that do intersect a potential CXL resource are set aside so they
>> +	 * can be trimmed to accommodate CXL resource intersections and added to
>> +	 * the iomem resource tree after the CXL drivers have completed their
>> +	 * device probe.
>> +	 */
>> +	pr_err("CXL DEBUG Checking e820 iomem resources\n");
>> +
>>  	res = e820_res;
>>  	for (i = 0; i < e820_table->nr_entries; i++) {
>> -		if (!res->parent && res->end)
>> -			insert_resource_expand_to_fit(&iomem_resource, res);
>> +		pr_err("CXL DEBUG Checking e820 iomem resource %llx - %llx\n",
>> +		       res->start, res->end);
>> +		if (res->desc == IORES_DESC_SOFT_RESERVED) {
>> +			rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS,
>> +						   e820_parse_cfmws, res);
> Can we make this somewhat generic by having a parse to find a match
> then call 
> e820__add_soft_reserve(res->start, resource_size(res), res->flags);
> if the match occurs. That way the check can be in arch independent code.
> 
> cxl_res_overlaps_cfmws() perhaps.
> 

I like this better than what I have. I was never real crazy about having the
e820 code parsing the cfmws.

> 
>> +			if (rc) {
>> +				res++;
> 
> Move the res++ into the for loop ... i++, res++)

will do.

> 
>> +				continue;
>> +			}
>> +		}
>> +		pr_err("CXL DEBUG Inserting %llx - %llx\n", res->start, res->end);
>> +		insert_resource_expand_to_fit(&iomem_resource, res);
>>  		res++;
>>  	}
>>  
> 
>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
>> index d7d5d982ce69..9f94730c488f 100644
>> --- a/drivers/cxl/port.c
>> +++ b/drivers/cxl/port.c
>> @@ -7,6 +7,10 @@
>>  #include "cxlmem.h"
>>  #include "cxlpci.h"
>>  
>> +#if CONFIG_X86
>> +#include <asm/e820/api.h>
>> +#endif
>> +
>>  /**
>>   * DOC: cxl port
>>   *
>> @@ -89,6 +93,35 @@ static int cxl_switch_port_probe(struct cxl_port *port)
>>  	return -ENXIO;
>>  }
>>  
>> +DECLARE_RWSEM(cxl_sr_rwsem);
>> +
>> +static void cxl_sr_update(struct work_struct *w)
>> +{
>> +	down_write(&cxl_sr_rwsem);
>> +	pr_err("CXL DEBUG Updating soft reserves\n");
>> +	e820__insert_soft_reserves();
> 
> That's arch specific code bleeding over to the cxl code.
> So needs at minimum to be hidden behind an arch call.

Agreed, keeping #ifdef <ARCH> out of the driver would be good. I'll look
at adding an arch call.

> 
>> +	up_write(&cxl_sr_rwsem);
>> +}
>> +
>> +DECLARE_DELAYED_WORK(cxl_sr_work, cxl_sr_update);
>> +
>> +static void schedule_soft_reserve_update(void)
>> +{
>> +	static bool update_scheduled;
>> +	int timeout = 5 * HZ;
>> +
>> +	down_write(&cxl_sr_rwsem);
> For the locks use guard()

Yes can switch to guard().

> 
> However, what are they for?

The concern is that we could have races on scheduling the soft reserve updates
and I wanted to try to keep only one work item on the queue. They may not be
needed, I was erring on the side of safety

> 
>> +	if (update_scheduled) {
>> +		pr_err("CXL DEBUG modifying delayed work timeout\n");
>> +		mod_delayed_work(system_wq, &cxl_sr_work, timeout);
> 
> See docs for mod_delayed_work_on()
> 
> "If @dwork is idle, equivalent to queue_delayed_work_on(); "
> which is where schedule_delayed_work() ends up.
> So other than maybe for the debug prints you don't need this dance
> just always call mod_delayed_work()
> 

Thanks, I'll look into it.

...and thanks for all the review comments.

-Nathan

> 
> 
>> +	} else {
>> +		pr_err("CXL DEBUG Adding delayed work\n");
>> +		schedule_delayed_work(&cxl_sr_work, timeout);
>> +		update_scheduled = true;
>> +	}
>> +	up_write(&cxl_sr_rwsem);
>> +}
>> +
>>  static int cxl_endpoint_port_probe(struct cxl_port *port)
>>  {
>>  	struct cxl_endpoint_dvsec_info info = { .port = port };
>> @@ -140,6 +173,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
>>  	 */
>>  	device_for_each_child(&port->dev, root, discover_region);
>>  
>> +	schedule_soft_reserve_update();
>>  	return 0;
>>  }
>>  
> 

  reply	other threads:[~2024-10-17 14:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04 18:17 [RFC] cxl: Update Soft Reserved resources upon region creation Nathan Fontenot
2024-10-16 15:43 ` Jonathan Cameron
2024-10-17 14:49   ` Fontenot, Nathan [this message]
2024-10-17 16:53     ` Jonathan Cameron
2024-10-18 10:17       ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45f3d3e7-4145-4bc9-96bb-98d7c10c35e0@amd.com \
    --to=nafonten@amd.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nathan.fontenot@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox