Linux CXL
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: "Olivi, Matteo" <molivi3@gatech.edu>
Cc: "linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Subject: Re: How to programmatically discover online and offline memory and its latency and bandwidth from user space?
Date: Fri, 10 Jan 2025 17:01:50 +0000	[thread overview]
Message-ID: <20250110170150.00005446@huawei.com> (raw)
In-Reply-To: <DM5PR07MB354837188B085472F4EE3B5397122@DM5PR07MB3548.namprd07.prod.outlook.com>

On Wed, 8 Jan 2025 17:55:41 +0000
"Olivi, Matteo" <molivi3@gatech.edu> wrote:

> Hello,
> I'm a PhD student working on orchestrator support for memory disaggregation.
> 
> I have some questions about how Linux presents CXL memory and its performance
> characteristics to user space.
> 
> 1. What is the simplest way for a user space program (with root privileges) to learn the
> latency and bandwidth between each pair of NUMA nodes (even non-CXL ones)? Are
> reading the HMAT and shelling out to the cxl cli the only two options? I've read
> https://docs.kernel.org/admin-guide/mm/numaperf.html but AFAIU given a memory target
> those sysfs files only report the performance from the local initiators. I care about each pair,
> not just local ones.

Unfortunately the interface indeed only presents a tiny part of the data in a full HMAT table.
The original discussion on this a few years back concluded that was all that made sense
until there was a clear use case for more complete data.

HMAT doesn't have to be complete but I'd assume it normally is.

> 
> 2. Is there a way to get the information question 1 asks for for memory that is physically
> connected to the host, but logically isn't? The ACPI spec https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/17_NUMA_Architecture_Platforms/NUMA_Architecture_Platforms.html#system-resource-affinity-table-definition 
> states that "The SRAT describes the system locality that all processors and memory
> present in a system belong to at system boot. This includes memory that can be hot-added (that
> is memory that can be added to the system while it is running, without requiring a reboot)."
> I interpret that to mean that if (CXL) memory is physically, but not logically, connected to the host,
> the SRAT will still describe the corresponding NUMA node.
> But what about the HMAT? The ACPI spec
> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/17_NUMA_Architecture_Platforms/NUMA_Architecture_Platforms.html#heterogeneous-memory-attributes-information
> states that  "The static HMAT table provides the boot time description of the memory latency and bandwidth
> among all memory access Initiator and memory Target System Localities. For hot-added devices and
> dynamic reconfiguration of the system localities, the _HMA object must be used for runtime update."
> but it's unclear to me if that applies only to physically hot-plugged memory or to logically hot-plugged
> memory as well.

The BIOS may have configured the CXL memory and done the work for SRAT and HMAT to include
that memory.  Or it may present HMAT to a generic port entry in SRAT and leave the discovery of
performance to the OS when it is setting up the memory mappings etc.
For now we present the data for the nearest initiator (cpu / cpu or other) to the CXL memory.

> 
> 3. Is there a recommended way for a user space program to tell CXL NUMA nodes from local NUMA nodes
> (both online and offline ones)? One hack would be to check whether the NUMA node has CPUs or not.
> Another option would be shelling out to the cxl-cli. 

In general not really. It's just memory, you should never care that it is CXL beyond that
it's performance characteristics are different and maybe for error handling reasons.
You can indeed use cxl-cli or reads of the sysfs entries that tool is using to figure it out.

> 
> 4. Is there a way for a user space program (with root privileges) to learn IDs of CXL NUMA
> nodes (both online and offline ones) that are globally unique? What I want is:
> a. if two hosts are both connected to the same CXL memory, they should see that memory
> with the same ID.

Look at serial numbers of the devices.  That's not connected to NUMA node IDs that are local
to a given host.  Those can be obtained with lspci and are unique (assuming manufacturer
set them - which sometimes doesn't happen in prototype parts).

> b. two different CXL memory pools will never be seen with the same ID by different hosts.
ID here can't be NUMA node as those are used to index non sparse structures so it wouldn't
scale.

Once we get upstream support for DCD (only sensible way to do pools and remain compliant for
the spec) and tagging of what that provides, then the globally unique ID will be associated
with particular bit of shared memory on the device rather than the whole device.
My guess is that will take a few kernel cycles though.

> 
> All my questions talk about NUMA nodes. I understand that Linux has multiple
> layers of abstractions to represent memory, and NUMA nodes are one of the highest ones.
> If any of the questions above can be answered but at a lower level of abstraction than NUMA
> nodes, that's fine as long as there's a way to map the entity in the lower level of abstraction
> to the corresponding NUMA node.

Hope that helps a little!

Jonathan

> 
> Thanks,
> Matteo Olivi.
> 


  reply	other threads:[~2025-01-10 17:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08 17:55 How to programmatically discover online and offline memory and its latency and bandwidth from user space? Olivi, Matteo
2025-01-10 17:01 ` Jonathan Cameron [this message]
2025-08-22  2:38   ` Olivi, Matteo
2025-08-26 13:58     ` Jonathan Cameron
2025-08-26 17:31     ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250110170150.00005446@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=molivi3@gatech.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox