From: Balbir Singh <bsingharora@gmail.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org
Cc: "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com>,
"Box, David E" <david.e.box@intel.com>,
"Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
"Moore, Robert" <robert.moore@intel.com>,
"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Schmauss, Erik" <erik.schmauss@intel.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"Zheng, Lv" <lv.zheng@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jerome Glisse <jglisse@redhat.com>, Len Brown <lenb@kernel.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
devel@acpica.org, linux-acpi@vger.kernel.orglinux-
Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information
Date: Fri, 07 Jul 2017 16:27:16 +1000 [thread overview]
Message-ID: <1499408836.23251.3.camel@gmail.com> (raw)
In-Reply-To: <20170706215233.11329-1-ross.zwisler@linux.intel.com>
On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote:
> ==== Quick Summary ====
>
> Platforms in the very near future will have multiple types of memory
> attached to a single CPU. These disparate memory ranges will have some
> characteristics in common, such as CPU cache coherence, but they can have
> wide ranges of performance both in terms of latency and bandwidth.
>
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
>
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node. This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
>
> We solve this issue by providing userspace with performance information on
> individual memory ranges. This performance information is exposed via
> sysfs:
>
> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> mem_tgt2/firmware_id:1
> mem_tgt2/is_cached:0
> mem_tgt2/is_enabled:1
> mem_tgt2/is_isolated:0
Could you please explain these charactersitics, are they in the patches
to follow?
> mem_tgt2/phys_addr_base:0x0
> mem_tgt2/phys_length_bytes:0x800000000
> mem_tgt2/local_init/read_bw_MBps:30720
> mem_tgt2/local_init/read_lat_nsec:100
> mem_tgt2/local_init/write_bw_MBps:30720
> mem_tgt2/local_init/write_lat_nsec:100
>
How to these numbers compare to normal system memory?
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.
>
> This series is built upon acpica-1705:
>
> https://github.com/zetalog/linux/commits/acpica-1705
>
> And you can find a working tree here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmem_sysfs
>
> ==== Lots of Details ====
>
> This patch set is only concerned with CPU-addressable memory types, not
> on-device memory like what we have with Jerome Glisse's HMM series:
>
> https://lwn.net/Articles/726691/
>
> This patch set works by enabling the new Heterogeneous Memory Attribute
> Table (HMAT) table, newly defined in ACPI 6.2. One major conceptual change
> in ACPI 6.2 related to this work is that proximity domains no longer need
> to contain a processor. We can now have memory-only proximity domains,
> which means that we can now have memory-only Linux NUMA nodes.
>
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
>
> +---------------+ +----------------+
> | Processor | | Memory |
> | prox domain 0 +---+ prox domain 1 |
> | NUMA node 1 | | NUMA node 2 |
> +-------+-------+ +----------------+
> |
> +-------+----------+
> | HBM |
> | prox domain 2 |
> | NUMA node 0 |
> +------------------+
>
> This gives us one initiator (the processor) and two targets (the two memory
> ranges). Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node. Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up. Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.
Could you expand on proximity domains, are they the same as node distance
or is this ACPI terminology for something more?
>
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will always need to be
> separated.
>
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics. This can broadly be
> done in one of two ways:
>
> Option 1: Provide the user with a way to map between proximity domains and
> NUMA nodes and a way to access the HMAT directly (probably via
> /sys/firmware/acpi/tables). Then, through possibly a library and a daemon,
> provide an API so that applications can either request information about
> memory ranges, or request memory allocations that meet a given set of
> performance characteristics.
>
> Option 2: Provide the user with HMAT performance data directly in sysfs,
> allowing applications to directly access it without the need for the
> library and daemon.
>
> The kernel work for option 1 is started by patches 1-3. These just surface
> the minimal amount of information in sysfs to allow userspace to map
> between proximity domains and NUMA nodes so that the raw data in the HMAT
> table can be understood.
>
> Patches 4 and 5 enable option 2, adding performance information from the
> HMAT to sysfs. The second option is complicated by the amount of HMAT data
> that could be present in very large systems, so in this series we only
> surface performance information for local (initiator,target) pairings. The
> changelog for patch 5 discusses this in detail.
>
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table. Suggestions for
> better naming are welcome.
>
Balbir Singh.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Balbir Singh <bsingharora@gmail.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org
Cc: "Box, David E" <david.e.box@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
"Zheng, Lv" <lv.zheng@intel.com>,
linux-nvdimm@lists.01.org,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>, Anaczkowski,,
Robert, Lukasz, "Erik <erik.schmauss@intel.com>,
Len Brown" <lenb@kernel.org>, Jerome Glisse <jglisse@redhat.com>,
devel@acpica.org, "Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
linux-mm@kvack.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Lahtinen, Joonas <joonas.lahtinen@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tim Chen" <tim.c.chen@linux.intel.com>
Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information
Date: Fri, 07 Jul 2017 16:27:16 +1000 [thread overview]
Message-ID: <1499408836.23251.3.camel@gmail.com> (raw)
In-Reply-To: <20170706215233.11329-1-ross.zwisler@linux.intel.com>
On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote:
> ==== Quick Summary ====
>
> Platforms in the very near future will have multiple types of memory
> attached to a single CPU. These disparate memory ranges will have some
> characteristics in common, such as CPU cache coherence, but they can have
> wide ranges of performance both in terms of latency and bandwidth.
>
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
>
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node. This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
>
> We solve this issue by providing userspace with performance information on
> individual memory ranges. This performance information is exposed via
> sysfs:
>
> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> mem_tgt2/firmware_id:1
> mem_tgt2/is_cached:0
> mem_tgt2/is_enabled:1
> mem_tgt2/is_isolated:0
Could you please explain these charactersitics, are they in the patches
to follow?
> mem_tgt2/phys_addr_base:0x0
> mem_tgt2/phys_length_bytes:0x800000000
> mem_tgt2/local_init/read_bw_MBps:30720
> mem_tgt2/local_init/read_lat_nsec:100
> mem_tgt2/local_init/write_bw_MBps:30720
> mem_tgt2/local_init/write_lat_nsec:100
>
How to these numbers compare to normal system memory?
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.
>
> This series is built upon acpica-1705:
>
> https://github.com/zetalog/linux/commits/acpica-1705
>
> And you can find a working tree here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmem_sysfs
>
> ==== Lots of Details ====
>
> This patch set is only concerned with CPU-addressable memory types, not
> on-device memory like what we have with Jerome Glisse's HMM series:
>
> https://lwn.net/Articles/726691/
>
> This patch set works by enabling the new Heterogeneous Memory Attribute
> Table (HMAT) table, newly defined in ACPI 6.2. One major conceptual change
> in ACPI 6.2 related to this work is that proximity domains no longer need
> to contain a processor. We can now have memory-only proximity domains,
> which means that we can now have memory-only Linux NUMA nodes.
>
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
>
> +---------------+ +----------------+
> | Processor | | Memory |
> | prox domain 0 +---+ prox domain 1 |
> | NUMA node 1 | | NUMA node 2 |
> +-------+-------+ +----------------+
> |
> +-------+----------+
> | HBM |
> | prox domain 2 |
> | NUMA node 0 |
> +------------------+
>
> This gives us one initiator (the processor) and two targets (the two memory
> ranges). Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node. Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up. Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.
Could you expand on proximity domains, are they the same as node distance
or is this ACPI terminology for something more?
>
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will always need to be
> separated.
>
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics. This can broadly be
> done in one of two ways:
>
> Option 1: Provide the user with a way to map between proximity domains and
> NUMA nodes and a way to access the HMAT directly (probably via
> /sys/firmware/acpi/tables). Then, through possibly a library and a daemon,
> provide an API so that applications can either request information about
> memory ranges, or request memory allocations that meet a given set of
> performance characteristics.
>
> Option 2: Provide the user with HMAT performance data directly in sysfs,
> allowing applications to directly access it without the need for the
> library and daemon.
>
> The kernel work for option 1 is started by patches 1-3. These just surface
> the minimal amount of information in sysfs to allow userspace to map
> between proximity domains and NUMA nodes so that the raw data in the HMAT
> table can be understood.
>
> Patches 4 and 5 enable option 2, adding performance information from the
> HMAT to sysfs. The second option is complicated by the amount of HMAT data
> that could be present in very large systems, so in this series we only
> surface performance information for local (initiator,target) pairings. The
> changelog for patch 5 discusses this in detail.
>
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table. Suggestions for
> better naming are welcome.
>
Balbir Singh.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Balbir Singh <bsingharora@gmail.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org
Cc: "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com>,
"Box, David E" <david.e.box@intel.com>,
"Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
"Moore, Robert" <robert.moore@intel.com>,
"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Schmauss, Erik" <erik.schmauss@intel.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"Zheng, Lv" <lv.zheng@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jerome Glisse <jglisse@redhat.com>, Len Brown <lenb@kernel.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@lists.01.org
Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information
Date: Fri, 07 Jul 2017 16:27:16 +1000 [thread overview]
Message-ID: <1499408836.23251.3.camel@gmail.com> (raw)
In-Reply-To: <20170706215233.11329-1-ross.zwisler@linux.intel.com>
On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote:
> ==== Quick Summary ====
>
> Platforms in the very near future will have multiple types of memory
> attached to a single CPU. These disparate memory ranges will have some
> characteristics in common, such as CPU cache coherence, but they can have
> wide ranges of performance both in terms of latency and bandwidth.
>
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
>
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node. This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
>
> We solve this issue by providing userspace with performance information on
> individual memory ranges. This performance information is exposed via
> sysfs:
>
> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> mem_tgt2/firmware_id:1
> mem_tgt2/is_cached:0
> mem_tgt2/is_enabled:1
> mem_tgt2/is_isolated:0
Could you please explain these charactersitics, are they in the patches
to follow?
> mem_tgt2/phys_addr_base:0x0
> mem_tgt2/phys_length_bytes:0x800000000
> mem_tgt2/local_init/read_bw_MBps:30720
> mem_tgt2/local_init/read_lat_nsec:100
> mem_tgt2/local_init/write_bw_MBps:30720
> mem_tgt2/local_init/write_lat_nsec:100
>
How to these numbers compare to normal system memory?
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.
>
> This series is built upon acpica-1705:
>
> https://github.com/zetalog/linux/commits/acpica-1705
>
> And you can find a working tree here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmem_sysfs
>
> ==== Lots of Details ====
>
> This patch set is only concerned with CPU-addressable memory types, not
> on-device memory like what we have with Jerome Glisse's HMM series:
>
> https://lwn.net/Articles/726691/
>
> This patch set works by enabling the new Heterogeneous Memory Attribute
> Table (HMAT) table, newly defined in ACPI 6.2. One major conceptual change
> in ACPI 6.2 related to this work is that proximity domains no longer need
> to contain a processor. We can now have memory-only proximity domains,
> which means that we can now have memory-only Linux NUMA nodes.
>
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
>
> +---------------+ +----------------+
> | Processor | | Memory |
> | prox domain 0 +---+ prox domain 1 |
> | NUMA node 1 | | NUMA node 2 |
> +-------+-------+ +----------------+
> |
> +-------+----------+
> | HBM |
> | prox domain 2 |
> | NUMA node 0 |
> +------------------+
>
> This gives us one initiator (the processor) and two targets (the two memory
> ranges). Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node. Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up. Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.
Could you expand on proximity domains, are they the same as node distance
or is this ACPI terminology for something more?
>
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will always need to be
> separated.
>
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics. This can broadly be
> done in one of two ways:
>
> Option 1: Provide the user with a way to map between proximity domains and
> NUMA nodes and a way to access the HMAT directly (probably via
> /sys/firmware/acpi/tables). Then, through possibly a library and a daemon,
> provide an API so that applications can either request information about
> memory ranges, or request memory allocations that meet a given set of
> performance characteristics.
>
> Option 2: Provide the user with HMAT performance data directly in sysfs,
> allowing applications to directly access it without the need for the
> library and daemon.
>
> The kernel work for option 1 is started by patches 1-3. These just surface
> the minimal amount of information in sysfs to allow userspace to map
> between proximity domains and NUMA nodes so that the raw data in the HMAT
> table can be understood.
>
> Patches 4 and 5 enable option 2, adding performance information from the
> HMAT to sysfs. The second option is complicated by the amount of HMAT data
> that could be present in very large systems, so in this series we only
> surface performance information for local (initiator,target) pairings. The
> changelog for patch 5 discusses this in detail.
>
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table. Suggestions for
> better naming are welcome.
>
Balbir Singh.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Balbir Singh <bsingharora@gmail.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org
Cc: "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com>,
"Box, David E" <david.e.box@intel.com>,
"Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
"Moore, Robert" <robert.moore@intel.com>,
"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Schmauss, Erik" <erik.schmauss@intel.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"Zheng, Lv" <lv.zheng@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jerome Glisse <jglisse@redhat.com>, Len Brown <lenb@kernel.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@lists.01.org
Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information
Date: Fri, 07 Jul 2017 16:27:16 +1000 [thread overview]
Message-ID: <1499408836.23251.3.camel@gmail.com> (raw)
In-Reply-To: <20170706215233.11329-1-ross.zwisler@linux.intel.com>
On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote:
> ==== Quick Summary ====
>
> Platforms in the very near future will have multiple types of memory
> attached to a single CPU. These disparate memory ranges will have some
> characteristics in common, such as CPU cache coherence, but they can have
> wide ranges of performance both in terms of latency and bandwidth.
>
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
>
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node. This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
>
> We solve this issue by providing userspace with performance information on
> individual memory ranges. This performance information is exposed via
> sysfs:
>
> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> mem_tgt2/firmware_id:1
> mem_tgt2/is_cached:0
> mem_tgt2/is_enabled:1
> mem_tgt2/is_isolated:0
Could you please explain these charactersitics, are they in the patches
to follow?
> mem_tgt2/phys_addr_base:0x0
> mem_tgt2/phys_length_bytes:0x800000000
> mem_tgt2/local_init/read_bw_MBps:30720
> mem_tgt2/local_init/read_lat_nsec:100
> mem_tgt2/local_init/write_bw_MBps:30720
> mem_tgt2/local_init/write_lat_nsec:100
>
How to these numbers compare to normal system memory?
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.
>
> This series is built upon acpica-1705:
>
> https://github.com/zetalog/linux/commits/acpica-1705
>
> And you can find a working tree here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmem_sysfs
>
> ==== Lots of Details ====
>
> This patch set is only concerned with CPU-addressable memory types, not
> on-device memory like what we have with Jerome Glisse's HMM series:
>
> https://lwn.net/Articles/726691/
>
> This patch set works by enabling the new Heterogeneous Memory Attribute
> Table (HMAT) table, newly defined in ACPI 6.2. One major conceptual change
> in ACPI 6.2 related to this work is that proximity domains no longer need
> to contain a processor. We can now have memory-only proximity domains,
> which means that we can now have memory-only Linux NUMA nodes.
>
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
>
> +---------------+ +----------------+
> | Processor | | Memory |
> | prox domain 0 +---+ prox domain 1 |
> | NUMA node 1 | | NUMA node 2 |
> +-------+-------+ +----------------+
> |
> +-------+----------+
> | HBM |
> | prox domain 2 |
> | NUMA node 0 |
> +------------------+
>
> This gives us one initiator (the processor) and two targets (the two memory
> ranges). Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node. Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up. Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.
Could you expand on proximity domains, are they the same as node distance
or is this ACPI terminology for something more?
>
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will always need to be
> separated.
>
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics. This can broadly be
> done in one of two ways:
>
> Option 1: Provide the user with a way to map between proximity domains and
> NUMA nodes and a way to access the HMAT directly (probably via
> /sys/firmware/acpi/tables). Then, through possibly a library and a daemon,
> provide an API so that applications can either request information about
> memory ranges, or request memory allocations that meet a given set of
> performance characteristics.
>
> Option 2: Provide the user with HMAT performance data directly in sysfs,
> allowing applications to directly access it without the need for the
> library and daemon.
>
> The kernel work for option 1 is started by patches 1-3. These just surface
> the minimal amount of information in sysfs to allow userspace to map
> between proximity domains and NUMA nodes so that the raw data in the HMAT
> table can be understood.
>
> Patches 4 and 5 enable option 2, adding performance information from the
> HMAT to sysfs. The second option is complicated by the amount of HMAT data
> that could be present in very large systems, so in this series we only
> surface performance information for local (initiator,target) pairings. The
> changelog for patch 5 discusses this in detail.
>
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table. Suggestions for
> better naming are welcome.
>
Balbir Singh.
next prev parent reply other threads:[~2017-07-07 6:27 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-06 21:52 [Devel] [RFC v2 0/5] surface heterogeneous memory performance information Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` [Devel] [RFC v2 3/5] hmem: add heterogeneous memory sysfs support Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-07 5:53 ` John Hubbard
2017-07-07 5:53 ` John Hubbard
2017-07-07 5:53 ` John Hubbard
2017-07-07 5:53 ` John Hubbard
2017-07-06 23:08 ` [RFC v2 0/5] surface heterogeneous memory performance information Jerome Glisse
2017-07-06 23:08 ` Jerome Glisse
2017-07-06 23:08 ` Jerome Glisse
2017-07-06 23:08 ` Jerome Glisse
[not found] ` <20170706230803.GE2919-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-06 23:30 ` Dave Hansen
2017-07-06 23:30 ` Dave Hansen
2017-07-06 23:30 ` Dave Hansen
2017-07-06 23:30 ` Dave Hansen
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-07-07 5:30 ` John Hubbard
2017-07-07 5:30 ` John Hubbard
2017-07-07 5:30 ` John Hubbard
2017-07-07 5:30 ` John Hubbard
2017-07-07 6:27 ` Balbir Singh [this message]
2017-07-07 6:27 ` Balbir Singh
2017-07-07 6:27 ` Balbir Singh
2017-07-07 6:27 ` Balbir Singh
2017-07-07 16:19 ` Dave Hansen
2017-07-07 16:19 ` Dave Hansen
2017-07-07 16:19 ` Dave Hansen
2017-07-19 9:48 ` Bob Liu
2017-07-19 9:48 ` Bob Liu
2017-07-19 9:48 ` Bob Liu
2017-07-19 9:48 ` Bob Liu
2017-07-19 15:25 ` Dave Hansen
2017-07-19 15:25 ` Dave Hansen
2017-07-19 15:25 ` Dave Hansen
2017-07-19 15:25 ` Dave Hansen
-- strict thread matches above, loose matches on Subject: below --
2017-07-06 21:52 [Devel] [RFC v2 1/5] acpi: add missing include in acpi_numa.h Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 [Devel] [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 [Devel] [RFC v2 4/5] sysfs: add sysfs_add_group_link() Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 [Devel] [RFC v2 5/5] hmem: add performance attributes Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` Ross Zwisler
2017-07-06 22:08 [Devel] [RFC v2 1/5] acpi: add missing include in acpi_numa.h Rafael J. Wysocki
2017-07-06 22:08 ` Rafael J. Wysocki
2017-07-06 22:08 ` Rafael J. Wysocki
2017-07-06 22:08 ` Rafael J. Wysocki
2017-07-06 22:13 [Devel] [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array() Rafael J. Wysocki
2017-07-06 22:13 ` Rafael J. Wysocki
2017-07-06 22:13 ` Rafael J. Wysocki
2017-07-06 22:13 ` Rafael J. Wysocki
2017-07-06 22:22 [Devel] " Ross Zwisler
2017-07-06 22:22 ` Ross Zwisler
2017-07-06 22:22 ` Ross Zwisler
2017-07-06 22:22 ` Ross Zwisler
2017-07-06 22:22 ` Ross Zwisler
2017-07-06 22:36 [Devel] " Rafael J. Wysocki
2017-07-06 22:36 ` Rafael J. Wysocki
2017-07-06 22:36 ` Rafael J. Wysocki
2017-07-06 22:36 ` Rafael J. Wysocki
2017-07-06 22:36 ` Rafael J. Wysocki
2017-07-07 16:25 [Devel] [RFC v2 0/5] surface heterogeneous memory performance information Ross Zwisler
2017-07-07 16:25 ` Ross Zwisler
2017-07-07 16:25 ` Ross Zwisler
2017-07-07 16:25 ` Ross Zwisler
2017-07-07 16:25 ` Ross Zwisler
2017-07-07 16:30 [Devel] " Ross Zwisler
2017-07-07 16:30 ` Ross Zwisler
2017-07-07 16:30 ` Ross Zwisler
2017-07-07 16:30 ` Ross Zwisler
2017-07-07 16:30 ` Ross Zwisler
2017-07-07 16:32 [Devel] [RFC v2 3/5] hmem: add heterogeneous memory sysfs support Ross Zwisler
2017-07-07 16:32 ` Ross Zwisler
2017-07-07 16:32 ` Ross Zwisler
2017-07-07 16:32 ` Ross Zwisler
2017-07-07 16:32 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1499408836.23251.3.camel@gmail.com \
--to=bsingharora@gmail.com \
--cc=Jaroslaw.Kogut@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=david.e.box@intel.com \
--cc=devel@acpica.org \
--cc=erik.schmauss@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=jglisse@redhat.com \
--cc=joonas.lahtinen@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.orglinux- \
--cc=linux-kernel@vger.kernel.org \
--cc=lukasz.anaczkowski@intel.com \
--cc=lukasz.odzioba@intel.com \
--cc=lv.zheng@intel.com \
--cc=murugasamy.nachimuthu@intel.com \
--cc=rafael.j.wysocki@intel.com \
--cc=rjw@rjwysocki.net \
--cc=robert.moore@intel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.