Linux CXL
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, <linux-cxl@vger.kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	<dan.j.williams@intel.com>, <ira.weiny@intel.com>,
	<vishal.l.verma@intel.com>, <alison.schofield@intel.com>,
	<dave@stgolabs.net>
Subject: Re: [PATCH v3 3/3] cxl: Add memory hotplug notifier for cxl region
Date: Mon, 8 Jan 2024 12:15:38 +0000	[thread overview]
Message-ID: <20240108121538.00001369@Huawei.com> (raw)
In-Reply-To: <87r0is9v6o.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Mon, 08 Jan 2024 14:49:03 +0800
"Huang, Ying" <ying.huang@intel.com> wrote:

> Dave Jiang <dave.jiang@intel.com> writes:
> 
> > When the CXL region is formed, the driver would computed the performance
> > data for the region. However this data is not available at the node data
> > collection that has been populated by the HMAT during kernel
> > initialization. Add a memory hotplug notifier to update the performance
> > data to the node hmem_attrs to expose the newly calculated region
> > performance data. The CXL region is created under specific CFMWS. The
> > node for the CFMWS is created during SRAT parsing by acpi_parse_cfmws().
> > Additional regions may overwrite the initial data, but since this is
> > for the same proximity domain it's a don't care for now.
> >
> > node_set_perf_attrs() symbol is exported to allow update of perf attribs
> > for a node. The sysfs path of
> > /sys/devices/system/node/nodeX/access0/initiators/* is created by
> > ndoe_set_perf_attrs() for the various attributes where nodeX is matched
> > to the proximity domain of the CXL region.

As per discussion below.  Why is access1 not also relevant for CXL memory?
(it's probably more relevant than access0 in many cases!)

For historical references, I wanted access0 to be the CPU only one, but
review feedback was that access0 was already defined as 'initiator based'
so we couldn't just make the 0 indexed one the case most people care about.
Hence we grew access1 to cover the CPU only case which most software cares
about.

> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>
> > Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > ---
> > v3:
> > - Change EXPORT_SYMBOL_NS_GPL(,CXL) to EXPORT_SYMBOL_GPL() (Jonathan)
> > - use read_bandwidth as check for valid coords (Jonathan)
> > - Remove setting of coord access level 1. (Jonathan)
> > ---
> >  drivers/base/node.c       |    1 +
> >  drivers/cxl/core/region.c |   42 ++++++++++++++++++++++++++++++++++++++++++
> >  drivers/cxl/cxl.h         |    3 +++
> >  3 files changed, 46 insertions(+)
> >
> > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > index cb2b6cc7f6e6..48e5cb292765 100644
> > --- a/drivers/base/node.c
> > +++ b/drivers/base/node.c
> > @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord,
> >  		}
> >  	}
> >  }
> > +EXPORT_SYMBOL_GPL(node_set_perf_attrs);
> >  
> >  /**
> >   * struct node_cache_info - Internal tracking for memory node caches
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index d28d24524d41..bee65f535d6c 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -4,6 +4,7 @@
> >  #include <linux/genalloc.h>
> >  #include <linux/device.h>
> >  #include <linux/module.h>
> > +#include <linux/memory.h>
> >  #include <linux/slab.h>
> >  #include <linux/uuid.h>
> >  #include <linux/sort.h>
> > @@ -2972,6 +2973,42 @@ static int is_system_ram(struct resource *res, void *arg)
> >  	return 1;
> >  }
> >  
> > +static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
> > +					  unsigned long action, void *arg)
> > +{
> > +	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
> > +					       memory_notifier);
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	struct cxl_endpoint_decoder *cxled = p->targets[0];
> > +	struct cxl_decoder *cxld = &cxled->cxld;
> > +	struct memory_notify *mnb = arg;
> > +	int nid = mnb->status_change_nid;
> > +	int region_nid;
> > +
> > +	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
> > +		return NOTIFY_DONE;
> > +
> > +	region_nid = phys_to_target_node(cxld->hpa_range.start);
> > +	if (nid != region_nid)
> > +		return NOTIFY_DONE;
> > +
> > +	/* Don't set if there's no coordinate information */
> > +	if (!cxlr->coord.write_bandwidth)
> > +		return NOTIFY_DONE;  
> 
> Although you said you will use "read_bandwidth" in changelog, you
> actually didn't do that.
> 
> > +
> > +	node_set_perf_attrs(nid, &cxlr->coord, 0);
> > +	node_set_perf_attrs(nid, &cxlr->coord, 1);  
> 
> And this.
> 
> But I don't think it's good to remove access level 1.  According to
> commit b9fffe47212c ("node: Add access1 class to represent CPU to memory
> characteristics").  Access level 1 is for performance from CPU to
> memory.  So, we should keep access level 1.  For CXL memory device,
> access level 0 and access level 1 should be equivalent.  Will the code
> be used for something like GPU connected via CXL?  Where the access
> level 0 may be for the performance from GPU to the memory.
> 
I disagree. They are no more equivalent than they are on any other complex system.

e.g. A CXL root port being described using generic Port infrastructure may be
on a different die (IO dies are a common architecture) in the package
than the CPU cores and that IO die may well have generic initiators that
are much nearer than the CPU cores.

In those cases access0 will cover initators on the IO die but access1 will
cover the nearest CPU cores (initiators).

Both should arguably be there for CXL memory as both are as relevant as
they are for any other memory.

If / when we get some GPUs etc on CXL that are initiators this will all
get a lot more fun but for now we can kick that into the long grass.

Jonathan


> --
> Best Regards,
> Huang, Ying
> 
> > +
> > +	return NOTIFY_OK;
> > +}
> > +
> > +static void remove_coord_notifier(void *data)
> > +{
> > +	struct cxl_region *cxlr = data;
> > +
> > +	unregister_memory_notifier(&cxlr->memory_notifier);
> > +}
> > +
> >  static int cxl_region_probe(struct device *dev)
> >  {
> >  	struct cxl_region *cxlr = to_cxl_region(dev);
> > @@ -2997,6 +3034,11 @@ static int cxl_region_probe(struct device *dev)
> >  		goto out;
> >  	}
> >  
> > +	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
> > +	cxlr->memory_notifier.priority = HMAT_CALLBACK_PRI;
> > +	register_memory_notifier(&cxlr->memory_notifier);
> > +	rc = devm_add_action_or_reset(&cxlr->dev, remove_coord_notifier, cxlr);
> > +
> >  	/*
> >  	 * From this point on any path that changes the region's state away from
> >  	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 4639d0d6ef54..2498086c8edc 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -6,6 +6,7 @@
> >  
> >  #include <linux/libnvdimm.h>
> >  #include <linux/bitfield.h>
> > +#include <linux/notifier.h>
> >  #include <linux/bitops.h>
> >  #include <linux/log2.h>
> >  #include <linux/node.h>
> > @@ -520,6 +521,7 @@ struct cxl_region_params {
> >   * @flags: Region state flags
> >   * @params: active + config params for the region
> >   * @coord: QoS access coordinates for the region
> > + * @memory_notifier: notifier for setting the access coordinates to node
> >   */
> >  struct cxl_region {
> >  	struct device dev;
> > @@ -531,6 +533,7 @@ struct cxl_region {
> >  	unsigned long flags;
> >  	struct cxl_region_params params;
> >  	struct access_coordinate coord;
> > +	struct notifier_block memory_notifier;
> >  };
> >  
> >  struct cxl_nvdimm_bridge {  
> 


  reply	other threads:[~2024-01-08 12:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-04 23:48 [PATCH v3 0/3] cxl: Add support to report region access coordinates to numa nodes Dave Jiang
2024-01-04 23:48 ` [PATCH v3 1/3] cxl/region: Calculate performance data for a region Dave Jiang
2024-01-05  0:07   ` Dan Williams
2024-01-05 22:50     ` Dave Jiang
2024-01-04 23:48 ` [PATCH v3 2/3] cxl/region: Add sysfs attribute for locality attributes of CXL regions Dave Jiang
2024-01-05  0:19   ` Dan Williams
2024-01-08 12:07     ` Jonathan Cameron
2024-01-04 23:48 ` [PATCH v3 3/3] cxl: Add memory hotplug notifier for cxl region Dave Jiang
2024-01-05 22:00   ` Dan Williams
2024-01-08  6:49   ` Huang, Ying
2024-01-08 12:15     ` Jonathan Cameron [this message]
2024-01-08 18:18       ` Dave Jiang
2024-01-09  2:15         ` Huang, Ying
2024-01-09 15:55           ` Dave Jiang
2024-01-09 16:27         ` Jonathan Cameron
2024-01-09 19:28           ` Dan Williams
2024-01-10 10:00             ` Jonathan Cameron
2024-01-10 15:27               ` Dave Jiang
2024-01-12 11:30                 ` Jonathan Cameron
2024-01-12 15:57                   ` Dave Jiang
2024-01-09  0:26       ` Dan Williams
2024-01-08 16:12     ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240108121538.00001369@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox