* [RFC v2 1/5] acpi: add missing include in acpi_numa.h
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
@ 2017-07-06 21:52 ` Ross Zwisler
[not found] ` <20170706215233.11329-2-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-07-06 21:52 ` [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
` (4 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Ross Zwisler @ 2017-07-06 21:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Box, David E, Dave Hansen, Zheng, Lv,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Rafael J. Wysocki,
Anaczkowski, Lukasz, Moore, Robert,
linux-acpi-u79uwXL29TY76Z2rM5mHXA, Odzioba, Lukasz,
Schmauss, Erik, Len Brown, Jerome Glisse,
devel-E0kO6a4B6psdnm+yROfE0A, Kogut, Jaroslaw,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Greg Kroah-Hartman,
Nachimuthu, Murugasamy, Rafael J. Wysocki, Lahtinen, Joonas,
Andrew Morton, Tim Chen
Right now if a file includes acpi_numa.h and they don't happen to include
linux/numa.h before it, they get the following warning:
./include/acpi/acpi_numa.h:9:5: warning: "MAX_NUMNODES" is not defined [-Wundef]
#if MAX_NUMNODES > 256
^~~~~~~~~~~~
Signed-off-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
---
include/acpi/acpi_numa.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
index d4b7294..1e3a74f 100644
--- a/include/acpi/acpi_numa.h
+++ b/include/acpi/acpi_numa.h
@@ -3,6 +3,7 @@
#ifdef CONFIG_ACPI_NUMA
#include <linux/kernel.h>
+#include <linux/numa.h>
/* Proximity bitmap length */
#if MAX_NUMNODES > 256
--
2.9.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array()
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-07-06 21:52 ` [RFC v2 1/5] acpi: add missing include in acpi_numa.h Ross Zwisler
@ 2017-07-06 21:52 ` Ross Zwisler
2017-07-06 22:13 ` Rafael J. Wysocki
2017-07-06 21:52 ` [RFC v2 3/5] hmem: add heterogeneous memory sysfs support Ross Zwisler
` (3 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Ross Zwisler @ 2017-07-06 21:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Box, David E, Dave Hansen, Zheng, Lv,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Rafael J. Wysocki,
Anaczkowski, Lukasz, Moore, Robert,
linux-acpi-u79uwXL29TY76Z2rM5mHXA, Odzioba, Lukasz,
Schmauss, Erik, Len Brown, Jerome Glisse,
devel-E0kO6a4B6psdnm+yROfE0A, Kogut, Jaroslaw,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Greg Kroah-Hartman,
Nachimuthu, Murugasamy, Rafael J. Wysocki, Lahtinen, Joonas,
Andrew Morton, Tim Chen
The current implementation of acpi_parse_entries_array() assumes that each
subtable has a standard ACPI subtable entry of type struct
acpi_sutbable_header. This standard subtable header has a one byte length
followed by a one byte type.
The HMAT subtables have to allow for a longer length so they have subtable
headers of type struct acpi_hmat_structure which has a 2 byte type and a 4
byte length.
Enhance the subtable parsing in acpi_parse_entries_array() so that it can
handle these new HMAT subtables.
Signed-off-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
---
drivers/acpi/numa.c | 2 +-
drivers/acpi/tables.c | 52 ++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 42 insertions(+), 12 deletions(-)
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index edb0c79..917f1cc 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -443,7 +443,7 @@ int __init acpi_numa_init(void)
* So go over all cpu entries in SRAT to get apicid to node mapping.
*/
- /* SRAT: Static Resource Affinity Table */
+ /* SRAT: System Resource Affinity Table */
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
struct acpi_subtable_proc srat_proc[3];
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index ff42539..7979171 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -218,6 +218,33 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
}
}
+static unsigned long __init
+acpi_get_entry_type(char *id, void *entry)
+{
+ if (!strncmp(id, ACPI_SIG_HMAT, 4))
+ return ((struct acpi_hmat_structure *)entry)->type;
+ else
+ return ((struct acpi_subtable_header *)entry)->type;
+}
+
+static unsigned long __init
+acpi_get_entry_length(char *id, void *entry)
+{
+ if (!strncmp(id, ACPI_SIG_HMAT, 4))
+ return ((struct acpi_hmat_structure *)entry)->length;
+ else
+ return ((struct acpi_subtable_header *)entry)->length;
+}
+
+static unsigned long __init
+acpi_get_subtable_header_length(char *id)
+{
+ if (!strncmp(id, ACPI_SIG_HMAT, 4))
+ return sizeof(struct acpi_hmat_structure);
+ else
+ return sizeof(struct acpi_subtable_header);
+}
+
/**
* acpi_parse_entries_array - for each proc_num find a suitable subtable
*
@@ -242,10 +269,10 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
struct acpi_subtable_proc *proc, int proc_num,
unsigned int max_entries)
{
- struct acpi_subtable_header *entry;
- unsigned long table_end;
+ unsigned long table_end, subtable_header_length;
int count = 0;
int errs = 0;
+ void *entry;
int i;
if (acpi_disabled)
@@ -263,19 +290,23 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
}
table_end = (unsigned long)table_header + table_header->length;
+ subtable_header_length = acpi_get_subtable_header_length(id);
/* Parse all entries looking for a match. */
- entry = (struct acpi_subtable_header *)
- ((unsigned long)table_header + table_size);
+ entry = (void *)table_header + table_size;
+
+ while (((unsigned long)entry) + subtable_header_length < table_end) {
+ unsigned long entry_type, entry_length;
- while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
- table_end) {
if (max_entries && count >= max_entries)
break;
+ entry_type = acpi_get_entry_type(id, entry);
+ entry_length = acpi_get_entry_length(id, entry);
+
for (i = 0; i < proc_num; i++) {
- if (entry->type != proc[i].id)
+ if (entry_type != proc[i].id)
continue;
if (!proc[i].handler ||
(!errs && proc[i].handler(entry, table_end))) {
@@ -290,16 +321,15 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
count++;
/*
- * If entry->length is 0, break from this loop to avoid
+ * If entry_length is 0, break from this loop to avoid
* infinite loop.
*/
- if (entry->length == 0) {
+ if (entry_length == 0) {
pr_err("[%4.4s:0x%02x] Invalid zero length\n", id, proc->id);
return -EINVAL;
}
- entry = (struct acpi_subtable_header *)
- ((unsigned long)entry + entry->length);
+ entry += entry_length;
}
if (max_entries && count > max_entries) {
--
2.9.4
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array()
2017-07-06 21:52 ` [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
@ 2017-07-06 22:13 ` Rafael J. Wysocki
[not found] ` <CAJZ5v0gUA4d+NFqEdsXPVktXf+2AX9MurEQAiCFGxU_eaoYE5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-07-06 22:13 UTC (permalink / raw)
To: Ross Zwisler
Cc: Linux Kernel Mailing List, Anaczkowski, Lukasz, Box, David E,
Kogut, Jaroslaw, Lahtinen, Joonas, Moore, Robert,
Nachimuthu, Murugasamy, Odzioba, Lukasz, Rafael J. Wysocki,
Rafael J. Wysocki, Schmauss, Erik, Verma, Vishal L, Zheng, Lv,
Andrew Morton, Dan Williams, Dave Hansen, Greg Kroah-Hartman
On Thu, Jul 6, 2017 at 11:52 PM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> The current implementation of acpi_parse_entries_array() assumes that each
> subtable has a standard ACPI subtable entry of type struct
> acpi_sutbable_header. This standard subtable header has a one byte length
> followed by a one byte type.
>
> The HMAT subtables have to allow for a longer length so they have subtable
> headers of type struct acpi_hmat_structure which has a 2 byte type and a 4
> byte length.
>
> Enhance the subtable parsing in acpi_parse_entries_array() so that it can
> handle these new HMAT subtables.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
> drivers/acpi/numa.c | 2 +-
> drivers/acpi/tables.c | 52 ++++++++++++++++++++++++++++++++++++++++-----------
> 2 files changed, 42 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index edb0c79..917f1cc 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -443,7 +443,7 @@ int __init acpi_numa_init(void)
> * So go over all cpu entries in SRAT to get apicid to node mapping.
> */
>
> - /* SRAT: Static Resource Affinity Table */
> + /* SRAT: System Resource Affinity Table */
> if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
> struct acpi_subtable_proc srat_proc[3];
>
This change is unrelated to the rest of the patch.
Maybe send it separately?
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index ff42539..7979171 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -218,6 +218,33 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
> }
> }
>
> +static unsigned long __init
> +acpi_get_entry_type(char *id, void *entry)
> +{
> + if (!strncmp(id, ACPI_SIG_HMAT, 4))
> + return ((struct acpi_hmat_structure *)entry)->type;
> + else
> + return ((struct acpi_subtable_header *)entry)->type;
> +}
I slightly prefer to use ? : in similar situations.
> +
> +static unsigned long __init
> +acpi_get_entry_length(char *id, void *entry)
> +{
> + if (!strncmp(id, ACPI_SIG_HMAT, 4))
> + return ((struct acpi_hmat_structure *)entry)->length;
> + else
> + return ((struct acpi_subtable_header *)entry)->length;
> +}
> +
> +static unsigned long __init
> +acpi_get_subtable_header_length(char *id)
> +{
> + if (!strncmp(id, ACPI_SIG_HMAT, 4))
> + return sizeof(struct acpi_hmat_structure);
> + else
> + return sizeof(struct acpi_subtable_header);
> +}
> +
> /**
> * acpi_parse_entries_array - for each proc_num find a suitable subtable
> *
> @@ -242,10 +269,10 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
> struct acpi_subtable_proc *proc, int proc_num,
> unsigned int max_entries)
> {
> - struct acpi_subtable_header *entry;
> - unsigned long table_end;
> + unsigned long table_end, subtable_header_length;
> int count = 0;
> int errs = 0;
> + void *entry;
> int i;
>
> if (acpi_disabled)
> @@ -263,19 +290,23 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
> }
>
> table_end = (unsigned long)table_header + table_header->length;
> + subtable_header_length = acpi_get_subtable_header_length(id);
>
> /* Parse all entries looking for a match. */
>
> - entry = (struct acpi_subtable_header *)
> - ((unsigned long)table_header + table_size);
> + entry = (void *)table_header + table_size;
> +
> + while (((unsigned long)entry) + subtable_header_length < table_end) {
> + unsigned long entry_type, entry_length;
>
> - while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) <
> - table_end) {
> if (max_entries && count >= max_entries)
> break;
>
> + entry_type = acpi_get_entry_type(id, entry);
> + entry_length = acpi_get_entry_length(id, entry);
> +
> for (i = 0; i < proc_num; i++) {
> - if (entry->type != proc[i].id)
> + if (entry_type != proc[i].id)
> continue;
> if (!proc[i].handler ||
> (!errs && proc[i].handler(entry, table_end))) {
> @@ -290,16 +321,15 @@ acpi_parse_entries_array(char *id, unsigned long table_size,
> count++;
>
> /*
> - * If entry->length is 0, break from this loop to avoid
> + * If entry_length is 0, break from this loop to avoid
> * infinite loop.
> */
> - if (entry->length == 0) {
> + if (entry_length == 0) {
> pr_err("[%4.4s:0x%02x] Invalid zero length\n", id, proc->id);
> return -EINVAL;
> }
>
> - entry = (struct acpi_subtable_header *)
> - ((unsigned long)entry + entry->length);
> + entry += entry_length;
> }
>
> if (max_entries && count > max_entries) {
> --
The rest is fine by me.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC v2 3/5] hmem: add heterogeneous memory sysfs support
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-07-06 21:52 ` [RFC v2 1/5] acpi: add missing include in acpi_numa.h Ross Zwisler
2017-07-06 21:52 ` [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
@ 2017-07-06 21:52 ` Ross Zwisler
2017-07-07 5:53 ` John Hubbard
2017-07-06 21:52 ` [RFC v2 4/5] sysfs: add sysfs_add_group_link() Ross Zwisler
` (2 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Ross Zwisler @ 2017-07-06 21:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Box, David E, Dave Hansen, Zheng, Lv,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Rafael J. Wysocki,
Anaczkowski, Lukasz, Moore, Robert,
linux-acpi-u79uwXL29TY76Z2rM5mHXA, Odzioba, Lukasz,
Schmauss, Erik, Len Brown, Jerome Glisse,
devel-E0kO6a4B6psdnm+yROfE0A, Kogut, Jaroslaw,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Greg Kroah-Hartman,
Nachimuthu, Murugasamy, Rafael J. Wysocki, Lahtinen, Joonas,
Andrew Morton, Tim Chen
Add a new sysfs subsystem, /sys/devices/system/hmem, which surfaces
information about memory initiators and memory targets to the user. These
initiators and targets are described by the ACPI SRAT and HMAT tables.
A "memory initiator" in this case is any device such as a CPU or a separate
memory I/O device that can initiate a memory request. A "memory target" is
a CPU-accessible physical address range.
The key piece of information surfaced by this patch is the mapping between
the ACPI table "proximity domain" numbers, held in the "firmware_id"
attribute, and Linux NUMA node numbers.
Initiators are found at /sys/devices/system/hmem/mem_initX, and the
attributes for a given initiator look like this:
# tree mem_init0/
mem_init0/
├── cpu0 -> ../../cpu/cpu0
├── firmware_id
├── is_enabled
├── node0 -> ../../node/node0
├── power
│ ├── async
│ ...
├── subsystem -> ../../../../bus/hmem
└── uevent
Where "mem_init0" on my system represents the CPU acting as a memory
initiator at NUMA node 0.
Targets are found at /sys/devices/system/hmem/mem_tgtX, and the attributes
for a given target look like this:
# tree mem_tgt2/
mem_tgt2/
├── firmware_id
├── is_cached
├── is_enabled
├── is_isolated
├── node2 -> ../../node/node2
├── phys_addr_base
├── phys_length_bytes
├── power
│ ├── async
│ ...
├── subsystem -> ../../../../bus/hmem
└── uevent
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
MAINTAINERS | 5 +
drivers/acpi/Kconfig | 1 +
drivers/acpi/Makefile | 1 +
drivers/acpi/hmem/Kconfig | 7 +
drivers/acpi/hmem/Makefile | 2 +
drivers/acpi/hmem/core.c | 569 ++++++++++++++++++++++++++++++++++++++++++
drivers/acpi/hmem/hmem.h | 47 ++++
drivers/acpi/hmem/initiator.c | 61 +++++
drivers/acpi/hmem/target.c | 97 +++++++
9 files changed, 790 insertions(+)
create mode 100644 drivers/acpi/hmem/Kconfig
create mode 100644 drivers/acpi/hmem/Makefile
create mode 100644 drivers/acpi/hmem/core.c
create mode 100644 drivers/acpi/hmem/hmem.h
create mode 100644 drivers/acpi/hmem/initiator.c
create mode 100644 drivers/acpi/hmem/target.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 053c3bd..554b833 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6085,6 +6085,11 @@ S: Supported
F: drivers/scsi/hisi_sas/
F: Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
+HMEM (ACPI HETEROGENEOUS MEMORY SUPPORT)
+M: Ross Zwisler <ross.zwisler@linux.intel.com>
+S: Supported
+F: drivers/acpi/hmem/
+
HOST AP DRIVER
M: Jouni Malinen <j@w1.fi>
L: linux-wireless@vger.kernel.org
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 1ce52f8..44dd97f 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -460,6 +460,7 @@ config ACPI_REDUCED_HARDWARE_ONLY
If you are unsure what to do, do not enable this option.
source "drivers/acpi/nfit/Kconfig"
+source "drivers/acpi/hmem/Kconfig"
source "drivers/acpi/apei/Kconfig"
source "drivers/acpi/dptf/Kconfig"
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index b1aacfc..31e3f20 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_ACPI_PROCESSOR) += processor.o
obj-$(CONFIG_ACPI) += container.o
obj-$(CONFIG_ACPI_THERMAL) += thermal.o
obj-$(CONFIG_ACPI_NFIT) += nfit/
+obj-$(CONFIG_ACPI_HMEM) += hmem/
obj-$(CONFIG_ACPI) += acpi_memhotplug.o
obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
obj-$(CONFIG_ACPI_BATTERY) += battery.o
diff --git a/drivers/acpi/hmem/Kconfig b/drivers/acpi/hmem/Kconfig
new file mode 100644
index 0000000..09282be
--- /dev/null
+++ b/drivers/acpi/hmem/Kconfig
@@ -0,0 +1,7 @@
+config ACPI_HMEM
+ bool "ACPI Heterogeneous Memory Support"
+ depends on ACPI_NUMA
+ depends on SYSFS
+ help
+ Exports a sysfs representation of the ACPI Heterogeneous Memory
+ Attributes Table (HMAT).
diff --git a/drivers/acpi/hmem/Makefile b/drivers/acpi/hmem/Makefile
new file mode 100644
index 0000000..d2aa546
--- /dev/null
+++ b/drivers/acpi/hmem/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_ACPI_HMEM) := hmem.o
+hmem-y := core.o initiator.o target.o
diff --git a/drivers/acpi/hmem/core.c b/drivers/acpi/hmem/core.c
new file mode 100644
index 0000000..f7638db
--- /dev/null
+++ b/drivers/acpi/hmem/core.c
@@ -0,0 +1,569 @@
+/*
+ * Heterogeneous memory representation in sysfs
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <acpi/acpi_numa.h>
+#include <linux/acpi.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "hmem.h"
+
+static LIST_HEAD(target_list);
+static LIST_HEAD(initiator_list);
+
+static bool bad_hmem;
+
+static int link_node_for_kobj(unsigned int node, struct kobject *kobj)
+{
+ if (node_devices[node])
+ return sysfs_create_link(kobj, &node_devices[node]->dev.kobj,
+ kobject_name(&node_devices[node]->dev.kobj));
+
+ return 0;
+}
+
+static void remove_node_for_kobj(unsigned int node, struct kobject *kobj)
+{
+ if (node_devices[node])
+ sysfs_remove_link(kobj,
+ kobject_name(&node_devices[node]->dev.kobj));
+}
+
+#define HMEM_CLASS_NAME "hmem"
+
+static struct bus_type hmem_subsys = {
+ /*
+ * .dev_name is set before device_register() based on the type of
+ * device we are registering.
+ */
+ .name = HMEM_CLASS_NAME,
+};
+
+/* memory initiators */
+static int link_cpu_under_mem_init(struct memory_initiator *init)
+{
+ struct device *cpu_dev;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ cpu_dev = get_cpu_device(cpu);
+ if (!cpu_dev)
+ continue;
+
+ if (pxm_to_node(init->pxm) == cpu_to_node(cpu)) {
+ return sysfs_create_link(&init->dev.kobj,
+ &cpu_dev->kobj,
+ kobject_name(&cpu_dev->kobj));
+ }
+
+ }
+ return 0;
+}
+
+static void remove_cpu_under_mem_init(struct memory_initiator *init)
+{
+ struct device *cpu_dev;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ cpu_dev = get_cpu_device(cpu);
+ if (!cpu_dev)
+ continue;
+
+ if (pxm_to_node(init->pxm) == cpu_to_node(cpu)) {
+ sysfs_remove_link(&init->dev.kobj,
+ kobject_name(&cpu_dev->kobj));
+ return;
+ }
+
+ }
+}
+
+static void release_memory_initiator(struct device *dev)
+{
+ struct memory_initiator *init = to_memory_initiator(dev);
+
+ list_del(&init->list);
+ kfree(init);
+}
+
+static void __init remove_memory_initiator(struct memory_initiator *init)
+{
+ if (init->is_registered) {
+ remove_cpu_under_mem_init(init);
+ remove_node_for_kobj(pxm_to_node(init->pxm), &init->dev.kobj);
+ device_unregister(&init->dev);
+ } else
+ release_memory_initiator(&init->dev);
+}
+
+static int __init register_memory_initiator(struct memory_initiator *init)
+{
+ int ret;
+
+ hmem_subsys.dev_name = "mem_init";
+ init->dev.bus = &hmem_subsys;
+ init->dev.id = pxm_to_node(init->pxm);
+ init->dev.release = release_memory_initiator;
+ init->dev.groups = memory_initiator_attribute_groups;
+
+ ret = device_register(&init->dev);
+ if (ret < 0)
+ return ret;
+
+ init->is_registered = true;
+
+ ret = link_cpu_under_mem_init(init);
+ if (ret < 0)
+ return ret;
+
+ return link_node_for_kobj(pxm_to_node(init->pxm), &init->dev.kobj);
+}
+
+static struct memory_initiator * __init add_memory_initiator(int pxm)
+{
+ struct memory_initiator *init;
+
+ if (pxm_to_node(pxm) == NUMA_NO_NODE) {
+ pr_err("HMEM: No NUMA node for PXM %d\n", pxm);
+ bad_hmem = true;
+ return ERR_PTR(-EINVAL);
+ }
+
+ /*
+ * Make sure we haven't already added an initiator for this proximity
+ * domain. We don't care about any other differences in the SRAT
+ * tables (apic_id, etc), so we just use the data from the first table
+ * we see for a given proximity domain.
+ */
+ list_for_each_entry(init, &initiator_list, list)
+ if (init->pxm == pxm)
+ return 0;
+
+ init = kzalloc(sizeof(*init), GFP_KERNEL);
+ if (!init) {
+ bad_hmem = true;
+ return ERR_PTR(-ENOMEM);
+ }
+
+ init->pxm = pxm;
+
+ list_add_tail(&init->list, &initiator_list);
+ return init;
+}
+
+/* memory targets */
+static void release_memory_target(struct device *dev)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ list_del(&tgt->list);
+ kfree(tgt);
+}
+
+static void __init remove_memory_target(struct memory_target *tgt)
+{
+ if (tgt->is_registered) {
+ remove_node_for_kobj(pxm_to_node(tgt->ma->proximity_domain),
+ &tgt->dev.kobj);
+ device_unregister(&tgt->dev);
+ } else
+ release_memory_target(&tgt->dev);
+}
+
+static int __init register_memory_target(struct memory_target *tgt)
+{
+ int ret;
+
+ if (!tgt->ma || !tgt->spa) {
+ pr_err("HMEM: Incomplete memory target found\n");
+ return -EINVAL;
+ }
+
+ hmem_subsys.dev_name = "mem_tgt";
+ tgt->dev.bus = &hmem_subsys;
+ tgt->dev.id = pxm_to_node(tgt->ma->proximity_domain);
+ tgt->dev.release = release_memory_target;
+ tgt->dev.groups = memory_target_attribute_groups;
+
+ ret = device_register(&tgt->dev);
+ if (ret < 0)
+ return ret;
+
+ tgt->is_registered = true;
+
+ return link_node_for_kobj(pxm_to_node(tgt->ma->proximity_domain),
+ &tgt->dev.kobj);
+}
+
+static int __init add_memory_target(struct acpi_srat_mem_affinity *ma)
+{
+ struct memory_target *tgt;
+
+ if (pxm_to_node(ma->proximity_domain) == NUMA_NO_NODE) {
+ pr_err("HMEM: No NUMA node for PXM %d\n", ma->proximity_domain);
+ bad_hmem = true;
+ return -EINVAL;
+ }
+
+ tgt = kzalloc(sizeof(*tgt), GFP_KERNEL);
+ if (!tgt) {
+ bad_hmem = true;
+ return -ENOMEM;
+ }
+
+ tgt->ma = ma;
+
+ list_add_tail(&tgt->list, &target_list);
+ return 0;
+}
+
+/* ACPI parsing code, starting with the HMAT */
+static int __init hmem_noop_parse(struct acpi_table_header *table)
+{
+ /* real work done by the hmat_parse_* and srat_parse_* routines */
+ return 0;
+}
+
+static bool __init hmat_spa_matches_srat(struct acpi_hmat_address_range *spa,
+ struct acpi_srat_mem_affinity *ma)
+{
+ if (spa->physical_address_base != ma->base_address ||
+ spa->physical_address_length != ma->length)
+ return false;
+
+ return true;
+}
+
+static void find_local_initiator(struct memory_target *tgt)
+{
+ struct memory_initiator *init;
+
+ if (!(tgt->spa->flags & ACPI_HMAT_PROCESSOR_PD_VALID) ||
+ pxm_to_node(tgt->spa->processor_PD) == NUMA_NO_NODE)
+ return;
+
+ list_for_each_entry(init, &initiator_list, list) {
+ if (init->pxm == tgt->spa->processor_PD) {
+ tgt->local_init = init;
+ return;
+ }
+ }
+}
+
+/* ACPI HMAT parsing routines */
+static int __init
+hmat_parse_address_range(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_hmat_address_range *spa;
+ struct memory_target *tgt;
+
+ if (bad_hmem)
+ return 0;
+
+ spa = (struct acpi_hmat_address_range *)header;
+ if (!spa) {
+ pr_err("HMEM: NULL table entry\n");
+ goto err;
+ }
+
+ if (spa->header.length != sizeof(*spa)) {
+ pr_err("HMEM: Unexpected header length: %d\n",
+ spa->header.length);
+ goto err;
+ }
+
+ list_for_each_entry(tgt, &target_list, list) {
+ if ((spa->flags & ACPI_HMAT_MEMORY_PD_VALID) &&
+ spa->memory_PD == tgt->ma->proximity_domain) {
+ if (!hmat_spa_matches_srat(spa, tgt->ma)) {
+ pr_err("HMEM: SRAT and HMAT disagree on "
+ "address range info\n");
+ goto err;
+ }
+ tgt->spa = spa;
+ find_local_initiator(tgt);
+ return 0;
+ }
+ }
+
+ return 0;
+err:
+ bad_hmem = true;
+ return -EINVAL;
+}
+
+static int __init hmat_parse_cache(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_hmat_cache *cache;
+ struct memory_target *tgt;
+
+ if (bad_hmem)
+ return 0;
+
+ cache = (struct acpi_hmat_cache *)header;
+ if (!cache) {
+ pr_err("HMEM: NULL table entry\n");
+ goto err;
+ }
+
+ if (cache->header.length < sizeof(*cache)) {
+ pr_err("HMEM: Unexpected header length: %d\n",
+ cache->header.length);
+ goto err;
+ }
+
+ list_for_each_entry(tgt, &target_list, list) {
+ if (cache->memory_PD == tgt->ma->proximity_domain) {
+ tgt->is_cached = true;
+ return 0;
+ }
+ }
+
+ pr_err("HMEM: Couldn't find cached target PXM %d\n", cache->memory_PD);
+err:
+ bad_hmem = true;
+ return -EINVAL;
+}
+
+/*
+ * SRAT parsing. We use srat_disabled() and pxm_to_node() so we don't redo
+ * any of the SRAT sanity checking done in drivers/acpi/numa.c.
+ */
+static int __init
+srat_parse_processor_affinity(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_srat_cpu_affinity *cpu;
+ struct memory_initiator *init;
+ u32 pxm;
+
+ if (bad_hmem)
+ return 0;
+
+ cpu = (struct acpi_srat_cpu_affinity *)header;
+ if (!cpu) {
+ pr_err("HMEM: NULL table entry\n");
+ bad_hmem = true;
+ return -EINVAL;
+ }
+
+ pxm = cpu->proximity_domain_lo;
+ if (acpi_srat_revision >= 2)
+ pxm |= *((unsigned int *)cpu->proximity_domain_hi) << 8;
+
+ if (!(cpu->flags & ACPI_SRAT_CPU_ENABLED))
+ return 0;
+
+ init = add_memory_initiator(pxm);
+ if (IS_ERR_OR_NULL(init))
+ return PTR_ERR(init);
+
+ init->cpu = cpu;
+ return 0;
+}
+
+static int __init
+srat_parse_x2apic_affinity(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_srat_x2apic_cpu_affinity *x2apic;
+ struct memory_initiator *init;
+
+ if (bad_hmem)
+ return 0;
+
+ x2apic = (struct acpi_srat_x2apic_cpu_affinity *)header;
+ if (!x2apic) {
+ pr_err("HMEM: NULL table entry\n");
+ bad_hmem = true;
+ return -EINVAL;
+ }
+
+ if (!(x2apic->flags & ACPI_SRAT_CPU_ENABLED))
+ return 0;
+
+ init = add_memory_initiator(x2apic->proximity_domain);
+ if (IS_ERR_OR_NULL(init))
+ return PTR_ERR(init);
+
+ init->x2apic = x2apic;
+ return 0;
+}
+
+static int __init
+srat_parse_gicc_affinity(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_srat_gicc_affinity *gicc;
+ struct memory_initiator *init;
+
+ if (bad_hmem)
+ return 0;
+
+ gicc = (struct acpi_srat_gicc_affinity *)header;
+ if (!gicc) {
+ pr_err("HMEM: NULL table entry\n");
+ bad_hmem = true;
+ return -EINVAL;
+ }
+
+ if (!(gicc->flags & ACPI_SRAT_GICC_ENABLED))
+ return 0;
+
+ init = add_memory_initiator(gicc->proximity_domain);
+ if (IS_ERR_OR_NULL(init))
+ return PTR_ERR(init);
+
+ init->gicc = gicc;
+ return 0;
+}
+
+static int __init
+srat_parse_memory_affinity(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_srat_mem_affinity *ma;
+
+ if (bad_hmem)
+ return 0;
+
+ ma = (struct acpi_srat_mem_affinity *)header;
+ if (!ma) {
+ pr_err("HMEM: NULL table entry\n");
+ bad_hmem = true;
+ return -EINVAL;
+ }
+
+ if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
+ return 0;
+
+ return add_memory_target(ma);
+}
+
+/*
+ * Remove our sysfs entries, unregister our devices and free allocated memory.
+ */
+static void hmem_cleanup(void)
+{
+ struct memory_initiator *init, *init_iter;
+ struct memory_target *tgt, *tgt_iter;
+
+ list_for_each_entry_safe(tgt, tgt_iter, &target_list, list)
+ remove_memory_target(tgt);
+
+ list_for_each_entry_safe(init, init_iter, &initiator_list, list)
+ remove_memory_initiator(init);
+}
+
+static int __init hmem_init(void)
+{
+ struct acpi_table_header *tbl;
+ struct memory_initiator *init;
+ struct memory_target *tgt;
+ acpi_status status = AE_OK;
+ int ret;
+
+ if (srat_disabled())
+ return 0;
+
+ /*
+ * We take a permanent reference to both the HMAT and SRAT in ACPI
+ * memory so we can keep pointers to their subtables. These tables
+ * already had references on them which would never be released, taken
+ * by acpi_sysfs_init(), so this shouldn't negatively impact anything.
+ */
+ status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
+ if (ACPI_FAILURE(status))
+ return 0;
+
+ status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+ if (ACPI_FAILURE(status))
+ return 0;
+
+ ret = subsys_system_register(&hmem_subsys, NULL);
+ if (ret)
+ return ret;
+
+ if (!acpi_table_parse(ACPI_SIG_SRAT, hmem_noop_parse)) {
+ struct acpi_subtable_proc srat_proc[4];
+
+ memset(srat_proc, 0, sizeof(srat_proc));
+ srat_proc[0].id = ACPI_SRAT_TYPE_CPU_AFFINITY;
+ srat_proc[0].handler = srat_parse_processor_affinity;
+ srat_proc[1].id = ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY;
+ srat_proc[1].handler = srat_parse_x2apic_affinity;
+ srat_proc[2].id = ACPI_SRAT_TYPE_GICC_AFFINITY;
+ srat_proc[2].handler = srat_parse_gicc_affinity;
+ srat_proc[3].id = ACPI_SRAT_TYPE_MEMORY_AFFINITY;
+ srat_proc[3].handler = srat_parse_memory_affinity;
+
+ acpi_table_parse_entries_array(ACPI_SIG_SRAT,
+ sizeof(struct acpi_table_srat),
+ srat_proc, ARRAY_SIZE(srat_proc), 0);
+ }
+
+ if (!acpi_table_parse(ACPI_SIG_HMAT, hmem_noop_parse)) {
+ struct acpi_subtable_proc hmat_proc[2];
+
+ memset(hmat_proc, 0, sizeof(hmat_proc));
+ hmat_proc[0].id = ACPI_HMAT_TYPE_ADDRESS_RANGE;
+ hmat_proc[0].handler = hmat_parse_address_range;
+ hmat_proc[1].id = ACPI_HMAT_TYPE_CACHE;
+ hmat_proc[1].handler = hmat_parse_cache;
+
+ acpi_table_parse_entries_array(ACPI_SIG_HMAT,
+ sizeof(struct acpi_table_hmat),
+ hmat_proc, ARRAY_SIZE(hmat_proc), 0);
+ }
+
+ if (bad_hmem) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ list_for_each_entry(init, &initiator_list, list) {
+ ret = register_memory_initiator(init);
+ if (ret)
+ goto err;
+ }
+
+ list_for_each_entry(tgt, &target_list, list) {
+ ret = register_memory_target(tgt);
+ if (ret)
+ goto err;
+ }
+
+ return 0;
+err:
+ pr_err("HMEM: Error during initialization\n");
+ hmem_cleanup();
+ return ret;
+}
+
+static __exit void hmem_exit(void)
+{
+ hmem_cleanup();
+}
+
+module_init(hmem_init);
+module_exit(hmem_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/acpi/hmem/hmem.h b/drivers/acpi/hmem/hmem.h
new file mode 100644
index 0000000..38ff540
--- /dev/null
+++ b/drivers/acpi/hmem/hmem.h
@@ -0,0 +1,47 @@
+/*
+ * Heterogeneous memory representation in sysfs
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef _ACPI_HMEM_H_
+#define _ACPI_HMEM_H_
+
+struct memory_initiator {
+ struct list_head list;
+ struct device dev;
+
+ /* only one of the following three will be set */
+ struct acpi_srat_cpu_affinity *cpu;
+ struct acpi_srat_x2apic_cpu_affinity *x2apic;
+ struct acpi_srat_gicc_affinity *gicc;
+
+ int pxm;
+ bool is_registered;
+};
+#define to_memory_initiator(d) container_of((d), struct memory_initiator, dev)
+
+struct memory_target {
+ struct list_head list;
+ struct device dev;
+ struct acpi_srat_mem_affinity *ma;
+ struct acpi_hmat_address_range *spa;
+ struct memory_initiator *local_init;
+
+ bool is_cached;
+ bool is_registered;
+};
+#define to_memory_target(d) container_of((d), struct memory_target, dev)
+
+extern const struct attribute_group *memory_initiator_attribute_groups[];
+extern const struct attribute_group *memory_target_attribute_groups[];
+#endif /* _ACPI_HMEM_H_ */
diff --git a/drivers/acpi/hmem/initiator.c b/drivers/acpi/hmem/initiator.c
new file mode 100644
index 0000000..905f030
--- /dev/null
+++ b/drivers/acpi/hmem/initiator.c
@@ -0,0 +1,61 @@
+/*
+ * Heterogeneous memory initiator sysfs attributes
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <acpi/acpi_numa.h>
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+#include "hmem.h"
+
+static ssize_t firmware_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_initiator *init = to_memory_initiator(dev);
+
+ return sprintf(buf, "%d\n", init->pxm);
+}
+static DEVICE_ATTR_RO(firmware_id);
+
+static ssize_t is_enabled_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_initiator *init = to_memory_initiator(dev);
+ int is_enabled;
+
+ if (init->cpu)
+ is_enabled = !!(init->cpu->flags & ACPI_SRAT_CPU_ENABLED);
+ else if (init->x2apic)
+ is_enabled = !!(init->x2apic->flags & ACPI_SRAT_CPU_ENABLED);
+ else
+ is_enabled = !!(init->gicc->flags & ACPI_SRAT_GICC_ENABLED);
+
+ return sprintf(buf, "%d\n", is_enabled);
+}
+static DEVICE_ATTR_RO(is_enabled);
+
+static struct attribute *memory_initiator_attributes[] = {
+ &dev_attr_firmware_id.attr,
+ &dev_attr_is_enabled.attr,
+ NULL,
+};
+
+static struct attribute_group memory_initiator_attribute_group = {
+ .attrs = memory_initiator_attributes,
+};
+
+const struct attribute_group *memory_initiator_attribute_groups[] = {
+ &memory_initiator_attribute_group,
+ NULL,
+};
diff --git a/drivers/acpi/hmem/target.c b/drivers/acpi/hmem/target.c
new file mode 100644
index 0000000..dd57437
--- /dev/null
+++ b/drivers/acpi/hmem/target.c
@@ -0,0 +1,97 @@
+/*
+ * Heterogeneous memory target sysfs attributes
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <acpi/acpi_numa.h>
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+#include "hmem.h"
+
+/* attributes for memory targets */
+static ssize_t phys_addr_base_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ return sprintf(buf, "%#llx\n", tgt->ma->base_address);
+}
+static DEVICE_ATTR_RO(phys_addr_base);
+
+static ssize_t phys_length_bytes_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ return sprintf(buf, "%#llx\n", tgt->ma->length);
+}
+static DEVICE_ATTR_RO(phys_length_bytes);
+
+static ssize_t firmware_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ return sprintf(buf, "%d\n", tgt->ma->proximity_domain);
+}
+static DEVICE_ATTR_RO(firmware_id);
+
+static ssize_t is_cached_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ return sprintf(buf, "%d\n", tgt->is_cached);
+}
+static DEVICE_ATTR_RO(is_cached);
+
+static ssize_t is_isolated_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ return sprintf(buf, "%d\n",
+ !!(tgt->spa->flags & ACPI_HMAT_RESERVATION_HINT));
+}
+static DEVICE_ATTR_RO(is_isolated);
+
+static ssize_t is_enabled_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct memory_target *tgt = to_memory_target(dev);
+
+ return sprintf(buf, "%d\n",
+ !!(tgt->ma->flags & ACPI_SRAT_MEM_ENABLED));
+}
+static DEVICE_ATTR_RO(is_enabled);
+
+static struct attribute *memory_target_attributes[] = {
+ &dev_attr_phys_addr_base.attr,
+ &dev_attr_phys_length_bytes.attr,
+ &dev_attr_firmware_id.attr,
+ &dev_attr_is_cached.attr,
+ &dev_attr_is_isolated.attr,
+ &dev_attr_is_enabled.attr,
+ NULL
+};
+
+/* attributes which are present for all memory targets */
+static struct attribute_group memory_target_attribute_group = {
+ .attrs = memory_target_attributes,
+};
+
+const struct attribute_group *memory_target_attribute_groups[] = {
+ &memory_target_attribute_group,
+ NULL,
+};
--
2.9.4
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC v2 3/5] hmem: add heterogeneous memory sysfs support
2017-07-06 21:52 ` [RFC v2 3/5] hmem: add heterogeneous memory sysfs support Ross Zwisler
@ 2017-07-07 5:53 ` John Hubbard
[not found] ` <9ea40a37-3549-2294-8605-036b37aec023-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 21+ messages in thread
From: John Hubbard @ 2017-07-07 5:53 UTC (permalink / raw)
To: Ross Zwisler, linux-kernel
Cc: Anaczkowski, Lukasz, Box, David E, Kogut, Jaroslaw,
Lahtinen, Joonas, Moore, Robert, Nachimuthu, Murugasamy,
Odzioba, Lukasz, Rafael J. Wysocki, Rafael J. Wysocki,
Schmauss, Erik, Verma, Vishal L, Zheng, Lv, Andrew Morton,
Dan Williams, Dave Hansen, Greg Kroah-Hartman, Jerome Glisse, Len
On 07/06/2017 02:52 PM, Ross Zwisler wrote:
[...]
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index b1aacfc..31e3f20 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -72,6 +72,7 @@ obj-$(CONFIG_ACPI_PROCESSOR) += processor.o
> obj-$(CONFIG_ACPI) += container.o
> obj-$(CONFIG_ACPI_THERMAL) += thermal.o
> obj-$(CONFIG_ACPI_NFIT) += nfit/
> +obj-$(CONFIG_ACPI_HMEM) += hmem/
> obj-$(CONFIG_ACPI) += acpi_memhotplug.o
> obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
> obj-$(CONFIG_ACPI_BATTERY) += battery.o
Hi Ross,
Following are a series of suggestions, intended to clarify naming just
enough so that, when Jerome's HMM patchset lands, we'll be able to
tell the difference between the two types of Heterogeneous Memory.
> diff --git a/drivers/acpi/hmem/Kconfig b/drivers/acpi/hmem/Kconfig
> new file mode 100644
> index 0000000..09282be
> --- /dev/null
> +++ b/drivers/acpi/hmem/Kconfig
> @@ -0,0 +1,7 @@
> +config ACPI_HMEM
> + bool "ACPI Heterogeneous Memory Support"
How about:
bool "ACPI Heterogeneous Memory Attribute Table Support"
The idea here, and throughout, is that this type of
Heterogeneous Memory support is all about "the Heterogeneous Memory
that you found via ACPI's Heterogeneous Memory Attribute Table".
That's different from "the Heterogeneous Memory that you found
when you installed a PCIe device that supports HMM". Or, at least
it is different, until the day that someone decides to burn in
support for an HMM device, into the ACPI tables. Seems unlikely,
though. :) And even so, I think it would still work.
> + depends on ACPI_NUMA
> + depends on SYSFS
> + help
> + Exports a sysfs representation of the ACPI Heterogeneous Memory
> + Attributes Table (HMAT).
> diff --git a/drivers/acpi/hmem/Makefile b/drivers/acpi/hmem/Makefile
> new file mode 100644
> index 0000000..d2aa546
> --- /dev/null
> +++ b/drivers/acpi/hmem/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_ACPI_HMEM) := hmem.o
> +hmem-y := core.o initiator.o target.o
> diff --git a/drivers/acpi/hmem/core.c b/drivers/acpi/hmem/core.c
> new file mode 100644
> index 0000000..f7638db
> --- /dev/null
> +++ b/drivers/acpi/hmem/core.c
> @@ -0,0 +1,569 @@
> +/*
> + * Heterogeneous memory representation in sysfs
Heterogeneous memory, as discovered via ACPI's Heterogeneous Memory
Attribute Table: representation in sysfs
> + *
> + * Copyright (c) 2017, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +
> +#include <acpi/acpi_numa.h>
[...]
> diff --git a/drivers/acpi/hmem/hmem.h b/drivers/acpi/hmem/hmem.h
> new file mode 100644
> index 0000000..38ff540
> --- /dev/null
> +++ b/drivers/acpi/hmem/hmem.h
> @@ -0,0 +1,47 @@
> +/*
> + * Heterogeneous memory representation in sysfs
Heterogeneous memory, as discovered via ACPI's Heterogeneous Memory
Attribute Table: representation in sysfs
> + *
> + * Copyright (c) 2017, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +
> +#ifndef _ACPI_HMEM_H_
> +#define _ACPI_HMEM_H_
> +
> +struct memory_initiator {
> + struct list_head list;
> + struct device dev;
> +
> + /* only one of the following three will be set */
> + struct acpi_srat_cpu_affinity *cpu;
> + struct acpi_srat_x2apic_cpu_affinity *x2apic;
> + struct acpi_srat_gicc_affinity *gicc;
> +
> + int pxm;
> + bool is_registered;
> +};
> +#define to_memory_initiator(d) container_of((d), struct memory_initiator, dev)
> +
> +struct memory_target {
> + struct list_head list;
> + struct device dev;
> + struct acpi_srat_mem_affinity *ma;
> + struct acpi_hmat_address_range *spa;
> + struct memory_initiator *local_init;
> +
> + bool is_cached;
> + bool is_registered;
> +};
> +#define to_memory_target(d) container_of((d), struct memory_target, dev)
> +
> +extern const struct attribute_group *memory_initiator_attribute_groups[];
> +extern const struct attribute_group *memory_target_attribute_groups[];
> +#endif /* _ACPI_HMEM_H_ */
> diff --git a/drivers/acpi/hmem/initiator.c b/drivers/acpi/hmem/initiator.c
> new file mode 100644
> index 0000000..905f030
> --- /dev/null
> +++ b/drivers/acpi/hmem/initiator.c
> @@ -0,0 +1,61 @@
> +/*
> + * Heterogeneous memory initiator sysfs attributes
HMAT (Heterogeneous Memory Attribute Table)-based memory: initiator sysfs attributes
> + *
> + * Copyright (c) 2017, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +
> +#include <acpi/acpi_numa.h>
> +#include <linux/acpi.h>
> +#include <linux/device.h>
> +#include <linux/sysfs.h>
> +#include "hmem.h"
> +
> +static ssize_t firmware_id_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct memory_initiator *init = to_memory_initiator(dev);
> +
> + return sprintf(buf, "%d\n", init->pxm);
> +}
> +static DEVICE_ATTR_RO(firmware_id);
> +
> +static ssize_t is_enabled_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct memory_initiator *init = to_memory_initiator(dev);
> + int is_enabled;
> +
> + if (init->cpu)
> + is_enabled = !!(init->cpu->flags & ACPI_SRAT_CPU_ENABLED);
> + else if (init->x2apic)
> + is_enabled = !!(init->x2apic->flags & ACPI_SRAT_CPU_ENABLED);
> + else
> + is_enabled = !!(init->gicc->flags & ACPI_SRAT_GICC_ENABLED);
> +
> + return sprintf(buf, "%d\n", is_enabled);
> +}
> +static DEVICE_ATTR_RO(is_enabled);
> +
> +static struct attribute *memory_initiator_attributes[] = {
> + &dev_attr_firmware_id.attr,
> + &dev_attr_is_enabled.attr,
> + NULL,
> +};
> +
> +static struct attribute_group memory_initiator_attribute_group = {
> + .attrs = memory_initiator_attributes,
> +};
> +
> +const struct attribute_group *memory_initiator_attribute_groups[] = {
> + &memory_initiator_attribute_group,
> + NULL,
> +};
> diff --git a/drivers/acpi/hmem/target.c b/drivers/acpi/hmem/target.c
> new file mode 100644
> index 0000000..dd57437
> --- /dev/null
> +++ b/drivers/acpi/hmem/target.c
> @@ -0,0 +1,97 @@
> +/*
> + * Heterogeneous memory target sysfs attributes
HMAT (Heterogeneous Memory Attribute Table)-based memory: target sysfs attributes
So, maybe those will help.
thanks
john h
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC v2 4/5] sysfs: add sysfs_add_group_link()
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
` (2 preceding siblings ...)
2017-07-06 21:52 ` [RFC v2 3/5] hmem: add heterogeneous memory sysfs support Ross Zwisler
@ 2017-07-06 21:52 ` Ross Zwisler
2017-07-06 21:52 ` [RFC v2 5/5] hmem: add performance attributes Ross Zwisler
2017-07-07 5:30 ` [RFC v2 0/5] surface heterogeneous memory performance information John Hubbard
5 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2017-07-06 21:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Box, David E, Dave Hansen, Zheng, Lv,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Rafael J. Wysocki,
Anaczkowski, Lukasz, Moore, Robert,
linux-acpi-u79uwXL29TY76Z2rM5mHXA, Odzioba, Lukasz,
Schmauss, Erik, Len Brown, Jerome Glisse,
devel-E0kO6a4B6psdnm+yROfE0A, Kogut, Jaroslaw,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Greg Kroah-Hartman,
Nachimuthu, Murugasamy, Rafael J. Wysocki, Lahtinen, Joonas,
Andrew Morton, Tim Chen
The current __compat_only_sysfs_link_entry_to_kobj() code allows us to
create symbolic links in sysfs to groups or attributes. Something like:
/sys/.../entry1/groupA -> /sys/.../entry2/groupA
This patch extends this functionality with a new sysfs_add_group_link()
call that allows the link to have a different name than the group or
attribute, so:
/sys/.../entry1/link_name -> /sys/.../entry2/groupA
__compat_only_sysfs_link_entry_to_kobj() now just calls
sysfs_add_group_link(), passing in the same name for both the
group/attribute and for the link name.
This is needed by the ACPI HMAT enabling work because we want to have a
group of performance attributes that live in a memory target. This group
represents the performance between a memory target and its local
inititator. In the target the attribute group is named "local_init":
# tree mem_tgt2/local_init/
mem_tgt2/local_init/
├── mem_init0 -> ../../mem_init0
├── mem_tgt2 -> ../../mem_tgt2
├── read_bw_MBps
├── read_lat_nsec
├── write_bw_MBps
└── write_lat_nsec
We then want to link to this attribute group from the initiator, but change
the name of the link to "mem_tgtX" since we're now looking at it from the
initiator's perspective, and because a given initiator can have multiple
local memory targets:
# ls -l mem_init0/mem_tgt2
lrwxrwxrwx. 1 root root 0 Jul 5 14:38 mem_init0/mem_tgt2 ->
../mem_tgt2/local_init
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
fs/sysfs/group.c | 30 +++++++++++++++++++++++-------
include/linux/sysfs.h | 2 ++
2 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index ac2de0e..19db57c8 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -367,15 +367,15 @@ void sysfs_remove_link_from_group(struct kobject *kobj, const char *group_name,
EXPORT_SYMBOL_GPL(sysfs_remove_link_from_group);
/**
- * __compat_only_sysfs_link_entry_to_kobj - add a symlink to a kobject pointing
- * to a group or an attribute
+ * sysfs_add_group_link - add a symlink to a kobject pointing to a group or
+ * an attribute
* @kobj: The kobject containing the group.
* @target_kobj: The target kobject.
* @target_name: The name of the target group or attribute.
+ * @link_name: The name of the link to the target group or attribute.
*/
-int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
- struct kobject *target_kobj,
- const char *target_name)
+int sysfs_add_group_link(struct kobject *kobj, struct kobject *target_kobj,
+ const char *target_name, const char *link_name)
{
struct kernfs_node *target;
struct kernfs_node *entry;
@@ -400,12 +400,28 @@ int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
return -ENOENT;
}
- link = kernfs_create_link(kobj->sd, target_name, entry);
+ link = kernfs_create_link(kobj->sd, link_name, entry);
if (IS_ERR(link) && PTR_ERR(link) == -EEXIST)
- sysfs_warn_dup(kobj->sd, target_name);
+ sysfs_warn_dup(kobj->sd, link_name);
kernfs_put(entry);
kernfs_put(target);
return IS_ERR(link) ? PTR_ERR(link) : 0;
}
+EXPORT_SYMBOL_GPL(sysfs_add_group_link);
+
+/**
+ * __compat_only_sysfs_link_entry_to_kobj - add a symlink to a kobject pointing
+ * to a group or an attribute
+ * @kobj: The kobject containing the group.
+ * @target_kobj: The target kobject.
+ * @target_name: The name of the target group or attribute.
+ */
+int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
+ struct kobject *target_kobj,
+ const char *target_name)
+{
+ return sysfs_add_group_link(kobj, target_kobj, target_name,
+ target_name);
+}
EXPORT_SYMBOL_GPL(__compat_only_sysfs_link_entry_to_kobj);
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index c6f0f0d..865f499 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -278,6 +278,8 @@ int sysfs_add_link_to_group(struct kobject *kobj, const char *group_name,
struct kobject *target, const char *link_name);
void sysfs_remove_link_from_group(struct kobject *kobj, const char *group_name,
const char *link_name);
+int sysfs_add_group_link(struct kobject *kobj, struct kobject *target_kobj,
+ const char *target_name, const char *link_name);
int __compat_only_sysfs_link_entry_to_kobj(struct kobject *kobj,
struct kobject *target_kobj,
const char *target_name);
--
2.9.4
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC v2 5/5] hmem: add performance attributes
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
` (3 preceding siblings ...)
2017-07-06 21:52 ` [RFC v2 4/5] sysfs: add sysfs_add_group_link() Ross Zwisler
@ 2017-07-06 21:52 ` Ross Zwisler
2017-07-07 5:30 ` [RFC v2 0/5] surface heterogeneous memory performance information John Hubbard
5 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2017-07-06 21:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Box, David E, Dave Hansen, Zheng, Lv,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Rafael J. Wysocki,
Anaczkowski, Lukasz, Moore, Robert,
linux-acpi-u79uwXL29TY76Z2rM5mHXA, Odzioba, Lukasz,
Schmauss, Erik, Len Brown, Jerome Glisse,
devel-E0kO6a4B6psdnm+yROfE0A, Kogut, Jaroslaw,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Greg Kroah-Hartman,
Nachimuthu, Murugasamy, Rafael J. Wysocki, Lahtinen, Joonas,
Andrew Morton, Tim Chen
Add performance information found in the HMAT to the sysfs representation.
This information lives as an attribute group named "local_init" in the
memory target:
# tree mem_tgt2/
mem_tgt2/
├── firmware_id
├── is_cached
├── is_enabled
├── is_isolated
├── local_init
│ ├── mem_init0 -> ../../mem_init0
│ ├── mem_init1 -> ../../mem_init1
│ ├── mem_tgt2 -> ../../mem_tgt2
│ ├── read_bw_MBps
│ ├── read_lat_nsec
│ ├── write_bw_MBps
│ └── write_lat_nsec
├── node2 -> ../../node/node2
├── phys_addr_base
├── phys_length_bytes
├── power
│ ├── async
│ ...
├── subsystem -> ../../../../bus/hmem
└── uevent
This attribute group surfaces latency and bandwidth performance for a memory
target and its local initiators. For example:
# grep . mem_tgt2/local_init/* 2>/dev/null
mem_tgt2/local_init/read_bw_MBps:30720
mem_tgt2/local_init/read_lat_nsec:100
mem_tgt2/local_init/write_bw_MBps:30720
mem_tgt2/local_init/write_lat_nsec:100
The local initiators have a symlink to the performance information which lives
in the target's attribute group:
# ls -l mem_init0/mem_tgt2
lrwxrwxrwx. 1 root root 0 Jul 5 14:38 mem_init0/mem_tgt2 ->
../mem_tgt2/local_init
We create performance attribute groups only for local (initiator,target)
pairings, where the the first local initiator for a given target is defined
by the "Processor Proximity Domain" field in the HMAT's Memory Subsystem
Address Range Structure table. After we have one local initiator we scan
the performance data to link to any other "local" initiators with the same
local performance to a given memory target.
A given target only has one set of local performance values, so each target
will have at most one "local_init" attribute group, though that group can
contain links to multiple initiators that all have local performance. A
given memory initiator may have multiple local memory targets, so multiple
"mem_tgtX" links may exist for a given initiator.
If a given memory target is cached we give performance numbers only for the
media itself, and rely on the "is_cached" attribute to represent the
fact that there is a caching layer.
The fact that we only expose a subset of the performance information
presented in the HMAT via sysfs as a compromise, driven by fact that those
usages will be the highest performing and because to represent all possible
paths could cause an unmanageable explosion of sysfs entries.
If we dump everything from the HMAT into sysfs we end up with
O(num_targets * num_initiators * num_caching_levels) attributes. Each of
these attributes only takes up 2 bytes in a System Locality Latency and
Bandwidth Information Structure, but if we have to create a directory entry
for each it becomes much more expensive.
For example, very large systems today can have on the order of thousands of
NUMA nodes. Say we have a system which used to have 1,000 NUMA nodes that
each had both a CPU and local memory. The HMAT allows us to separate the
CPUs and memory into separate NUMA nodes, so we can end up with 1,000 CPU
initiator NUMA nodes and 1,000 memory target NUMA nodes. If we represented
the performance information for each possible CPU/memory pair in sysfs we
would end up with 1,000,000 attribute groups.
This is a lot to pass in a set of packed data tables, but I think we'll
break sysfs if we try to create millions of attributes, regardless of how
we nest them in a directory hierarchy.
By only representing performance information for local (initiator,target)
pairings, we reduce the number of sysfs entries to O(num_targets).
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
drivers/acpi/hmem/Makefile | 2 +-
drivers/acpi/hmem/core.c | 268 +++++++++++++++++++++++++++++++++++-
drivers/acpi/hmem/hmem.h | 17 +++
drivers/acpi/hmem/perf_attributes.c | 56 ++++++++
4 files changed, 341 insertions(+), 2 deletions(-)
create mode 100644 drivers/acpi/hmem/perf_attributes.c
diff --git a/drivers/acpi/hmem/Makefile b/drivers/acpi/hmem/Makefile
index d2aa546..44e8304 100644
--- a/drivers/acpi/hmem/Makefile
+++ b/drivers/acpi/hmem/Makefile
@@ -1,2 +1,2 @@
obj-$(CONFIG_ACPI_HMEM) := hmem.o
-hmem-y := core.o initiator.o target.o
+hmem-y := core.o initiator.o target.o perf_attributes.o
diff --git a/drivers/acpi/hmem/core.c b/drivers/acpi/hmem/core.c
index f7638db..8f3ab35 100644
--- a/drivers/acpi/hmem/core.c
+++ b/drivers/acpi/hmem/core.c
@@ -23,11 +23,230 @@
#include <linux/slab.h>
#include "hmem.h"
+#define NO_VALUE -1
+#define LOCAL_INIT "local_init"
+
static LIST_HEAD(target_list);
static LIST_HEAD(initiator_list);
+LIST_HEAD(locality_list);
static bool bad_hmem;
+/* Performance attributes for an initiator/target pair. */
+static int get_performance_data(u32 init_pxm, u32 tgt_pxm,
+ struct acpi_hmat_locality *hmat_loc)
+{
+ int num_init = hmat_loc->number_of_initiator_Pds;
+ int num_tgt = hmat_loc->number_of_target_Pds;
+ int init_idx = NO_VALUE;
+ int tgt_idx = NO_VALUE;
+ u32 *initiators, *targets;
+ u16 *entries, val;
+ int i;
+
+ /* the initiator array is after the struct acpi_hmat_locality fields */
+ initiators = (u32 *)(hmat_loc + 1);
+ targets = &initiators[num_init];
+ entries = (u16 *)&targets[num_tgt];
+
+ for (i = 0; i < num_init; i++) {
+ if (initiators[i] == init_pxm) {
+ init_idx = i;
+ break;
+ }
+ }
+
+ if (init_idx == NO_VALUE)
+ return NO_VALUE;
+
+ for (i = 0; i < num_tgt; i++) {
+ if (targets[i] == tgt_pxm) {
+ tgt_idx = i;
+ break;
+ }
+ }
+
+ if (tgt_idx == NO_VALUE)
+ return NO_VALUE;
+
+ val = entries[init_idx*num_tgt + tgt_idx];
+ if (val < 10 || val == 0xFFFF)
+ return NO_VALUE;
+
+ return (val * hmat_loc->entry_base_unit) / 10;
+}
+
+/*
+ * 'direction' is either READ or WRITE
+ * Latency is reported in nanoseconds and bandwidth is reported in MB/s.
+ */
+static int hmem_get_attribute(int init_pxm, int tgt_pxm, int direction,
+ enum hmem_attr_type type)
+{
+ struct memory_locality *loc;
+ int value;
+
+ list_for_each_entry(loc, &locality_list, list) {
+ struct acpi_hmat_locality *hmat_loc = loc->hmat_loc;
+
+ if (direction == READ && type == LATENCY &&
+ (hmat_loc->data_type == ACPI_HMAT_ACCESS_LATENCY ||
+ hmat_loc->data_type == ACPI_HMAT_READ_LATENCY)) {
+ value = get_performance_data(init_pxm, tgt_pxm,
+ hmat_loc);
+ if (value != NO_VALUE)
+ return value;
+ }
+
+ if (direction == WRITE && type == LATENCY &&
+ (hmat_loc->data_type == ACPI_HMAT_ACCESS_LATENCY ||
+ hmat_loc->data_type == ACPI_HMAT_WRITE_LATENCY)) {
+ value = get_performance_data(init_pxm, tgt_pxm,
+ hmat_loc);
+ if (value != NO_VALUE)
+ return value;
+ }
+
+ if (direction == READ && type == BANDWIDTH &&
+ (hmat_loc->data_type == ACPI_HMAT_ACCESS_BANDWIDTH ||
+ hmat_loc->data_type == ACPI_HMAT_READ_BANDWIDTH)) {
+ value = get_performance_data(init_pxm, tgt_pxm,
+ hmat_loc);
+ if (value != NO_VALUE)
+ return value;
+ }
+
+ if (direction == WRITE && type == BANDWIDTH &&
+ (hmat_loc->data_type == ACPI_HMAT_ACCESS_BANDWIDTH ||
+ hmat_loc->data_type == ACPI_HMAT_WRITE_BANDWIDTH)) {
+ value = get_performance_data(init_pxm, tgt_pxm,
+ hmat_loc);
+ if (value != NO_VALUE)
+ return value;
+ }
+ }
+
+ return NO_VALUE;
+}
+
+/*
+ * 'direction' is either READ or WRITE
+ * Latency is reported in nanoseconds and bandwidth is reported in MB/s.
+ */
+int hmem_local_attribute(struct device *tgt_dev, int direction,
+ enum hmem_attr_type type)
+{
+ struct memory_target *tgt = to_memory_target(tgt_dev);
+ int tgt_pxm = tgt->ma->proximity_domain;
+ int init_pxm;
+
+ if (!tgt->local_init)
+ return NO_VALUE;
+
+ init_pxm = tgt->local_init->pxm;
+ return hmem_get_attribute(init_pxm, tgt_pxm, direction, type);
+}
+
+static bool is_local_init(int init_pxm, int tgt_pxm, int read_lat,
+ int write_lat, int read_bw, int write_bw)
+{
+ if (read_lat != hmem_get_attribute(init_pxm, tgt_pxm, READ, LATENCY))
+ return false;
+
+ if (write_lat != hmem_get_attribute(init_pxm, tgt_pxm, WRITE, LATENCY))
+ return false;
+
+ if (read_bw != hmem_get_attribute(init_pxm, tgt_pxm, READ, BANDWIDTH))
+ return false;
+
+ if (write_bw != hmem_get_attribute(init_pxm, tgt_pxm, WRITE, BANDWIDTH))
+ return false;
+
+ return true;
+}
+
+static const struct attribute_group performance_attribute_group = {
+ .attrs = performance_attributes,
+ .name = LOCAL_INIT,
+};
+
+static void remove_performance_attributes(struct memory_target *tgt)
+{
+ if (!tgt->local_init)
+ return;
+
+ /*
+ * FIXME: Need to enhance the core sysfs code to remove all the links
+ * in both the attribute group and in the device itself when those are
+ * removed.
+ */
+ sysfs_remove_group(&tgt->dev.kobj, &performance_attribute_group);
+}
+
+static int add_performance_attributes(struct memory_target *tgt)
+{
+ int read_lat, write_lat, read_bw, write_bw;
+ int tgt_pxm = tgt->ma->proximity_domain;
+ struct kobject *init_kobj, *tgt_kobj;
+ struct device *init_dev, *tgt_dev;
+ struct memory_initiator *init;
+ int ret;
+
+ if (!tgt->local_init)
+ return 0;
+
+ tgt_dev = &tgt->dev;
+ tgt_kobj = &tgt_dev->kobj;
+
+ /* Create entries for initiator/target pair in the target. */
+ ret = sysfs_create_group(tgt_kobj, &performance_attribute_group);
+ if (ret < 0)
+ return ret;
+
+ ret = sysfs_add_link_to_group(tgt_kobj, LOCAL_INIT, tgt_kobj,
+ dev_name(tgt_dev));
+ if (ret < 0)
+ goto err;
+
+ /*
+ * Iterate through initiators and find all the ones that have the same
+ * performance as the local initiator.
+ */
+ read_lat = hmem_local_attribute(tgt_dev, READ, LATENCY);
+ write_lat = hmem_local_attribute(tgt_dev, WRITE, LATENCY);
+ read_bw = hmem_local_attribute(tgt_dev, READ, BANDWIDTH);
+ write_bw = hmem_local_attribute(tgt_dev, WRITE, BANDWIDTH);
+
+ list_for_each_entry(init, &initiator_list, list) {
+ init_dev = &init->dev;
+ init_kobj = &init_dev->kobj;
+
+ if (init == tgt->local_init ||
+ is_local_init(init->pxm, tgt_pxm, read_lat,
+ write_lat, read_bw, write_bw)) {
+ ret = sysfs_add_link_to_group(tgt_kobj, LOCAL_INIT,
+ init_kobj, dev_name(init_dev));
+ if (ret < 0)
+ goto err;
+
+ /*
+ * Create a link in the initiator to the performance
+ * attributes.
+ */
+ ret = sysfs_add_group_link(init_kobj, tgt_kobj,
+ LOCAL_INIT, dev_name(tgt_dev));
+ if (ret < 0)
+ goto err;
+ }
+
+ }
+ tgt->has_perf_attributes = true;
+ return 0;
+err:
+ remove_performance_attributes(tgt);
+ return ret;
+}
+
static int link_node_for_kobj(unsigned int node, struct kobject *kobj)
{
if (node_devices[node])
@@ -178,6 +397,9 @@ static void release_memory_target(struct device *dev)
static void __init remove_memory_target(struct memory_target *tgt)
{
+ if (tgt->has_perf_attributes)
+ remove_performance_attributes(tgt);
+
if (tgt->is_registered) {
remove_node_for_kobj(pxm_to_node(tgt->ma->proximity_domain),
&tgt->dev.kobj);
@@ -309,6 +531,38 @@ hmat_parse_address_range(struct acpi_subtable_header *header,
return -EINVAL;
}
+static int __init hmat_parse_locality(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+ struct acpi_hmat_locality *hmat_loc;
+ struct memory_locality *loc;
+
+ if (bad_hmem)
+ return 0;
+
+ hmat_loc = (struct acpi_hmat_locality *)header;
+ if (!hmat_loc) {
+ pr_err("HMEM: NULL table entry\n");
+ bad_hmem = true;
+ return -EINVAL;
+ }
+
+ /* We don't report cached performance information in sysfs. */
+ if (hmat_loc->flags == ACPI_HMAT_MEMORY ||
+ hmat_loc->flags == ACPI_HMAT_LAST_LEVEL_CACHE) {
+ loc = kzalloc(sizeof(*loc), GFP_KERNEL);
+ if (!loc) {
+ bad_hmem = true;
+ return -ENOMEM;
+ }
+
+ loc->hmat_loc = hmat_loc;
+ list_add_tail(&loc->list, &locality_list);
+ }
+
+ return 0;
+}
+
static int __init hmat_parse_cache(struct acpi_subtable_header *header,
const unsigned long end)
{
@@ -464,6 +718,7 @@ srat_parse_memory_affinity(struct acpi_subtable_header *header,
static void hmem_cleanup(void)
{
struct memory_initiator *init, *init_iter;
+ struct memory_locality *loc, *loc_iter;
struct memory_target *tgt, *tgt_iter;
list_for_each_entry_safe(tgt, tgt_iter, &target_list, list)
@@ -471,6 +726,11 @@ static void hmem_cleanup(void)
list_for_each_entry_safe(init, init_iter, &initiator_list, list)
remove_memory_initiator(init);
+
+ list_for_each_entry_safe(loc, loc_iter, &locality_list, list) {
+ list_del(&loc->list);
+ kfree(loc);
+ }
}
static int __init hmem_init(void)
@@ -521,13 +781,15 @@ static int __init hmem_init(void)
}
if (!acpi_table_parse(ACPI_SIG_HMAT, hmem_noop_parse)) {
- struct acpi_subtable_proc hmat_proc[2];
+ struct acpi_subtable_proc hmat_proc[3];
memset(hmat_proc, 0, sizeof(hmat_proc));
hmat_proc[0].id = ACPI_HMAT_TYPE_ADDRESS_RANGE;
hmat_proc[0].handler = hmat_parse_address_range;
hmat_proc[1].id = ACPI_HMAT_TYPE_CACHE;
hmat_proc[1].handler = hmat_parse_cache;
+ hmat_proc[2].id = ACPI_HMAT_TYPE_LOCALITY;
+ hmat_proc[2].handler = hmat_parse_locality;
acpi_table_parse_entries_array(ACPI_SIG_HMAT,
sizeof(struct acpi_table_hmat),
@@ -549,6 +811,10 @@ static int __init hmem_init(void)
ret = register_memory_target(tgt);
if (ret)
goto err;
+
+ ret = add_performance_attributes(tgt);
+ if (ret)
+ goto err;
}
return 0;
diff --git a/drivers/acpi/hmem/hmem.h b/drivers/acpi/hmem/hmem.h
index 38ff540..71b10f0 100644
--- a/drivers/acpi/hmem/hmem.h
+++ b/drivers/acpi/hmem/hmem.h
@@ -16,6 +16,11 @@
#ifndef _ACPI_HMEM_H_
#define _ACPI_HMEM_H_
+enum hmem_attr_type {
+ LATENCY,
+ BANDWIDTH,
+};
+
struct memory_initiator {
struct list_head list;
struct device dev;
@@ -39,9 +44,21 @@ struct memory_target {
bool is_cached;
bool is_registered;
+ bool has_perf_attributes;
};
#define to_memory_target(d) container_of((d), struct memory_target, dev)
+struct memory_locality {
+ struct list_head list;
+ struct acpi_hmat_locality *hmat_loc;
+};
+
extern const struct attribute_group *memory_initiator_attribute_groups[];
extern const struct attribute_group *memory_target_attribute_groups[];
+extern struct attribute *performance_attributes[];
+
+extern struct list_head locality_list;
+
+int hmem_local_attribute(struct device *tgt_dev, int direction,
+ enum hmem_attr_type type);
#endif /* _ACPI_HMEM_H_ */
diff --git a/drivers/acpi/hmem/perf_attributes.c b/drivers/acpi/hmem/perf_attributes.c
new file mode 100644
index 0000000..72a710a
--- /dev/null
+++ b/drivers/acpi/hmem/perf_attributes.c
@@ -0,0 +1,56 @@
+/*
+ * Heterogeneous memory performance attributes
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+#include "hmem.h"
+
+static ssize_t read_lat_nsec_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", hmem_local_attribute(dev, READ, LATENCY));
+}
+static DEVICE_ATTR_RO(read_lat_nsec);
+
+static ssize_t write_lat_nsec_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", hmem_local_attribute(dev, WRITE, LATENCY));
+}
+static DEVICE_ATTR_RO(write_lat_nsec);
+
+static ssize_t read_bw_MBps_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", hmem_local_attribute(dev, READ, BANDWIDTH));
+}
+static DEVICE_ATTR_RO(read_bw_MBps);
+
+static ssize_t write_bw_MBps_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n",
+ hmem_local_attribute(dev, WRITE, BANDWIDTH));
+}
+static DEVICE_ATTR_RO(write_bw_MBps);
+
+struct attribute *performance_attributes[] = {
+ &dev_attr_read_lat_nsec.attr,
+ &dev_attr_write_lat_nsec.attr,
+ &dev_attr_read_bw_MBps.attr,
+ &dev_attr_write_bw_MBps.attr,
+ NULL
+};
--
2.9.4
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC v2 0/5] surface heterogeneous memory performance information
[not found] ` <20170706215233.11329-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
` (4 preceding siblings ...)
2017-07-06 21:52 ` [RFC v2 5/5] hmem: add performance attributes Ross Zwisler
@ 2017-07-07 5:30 ` John Hubbard
[not found] ` <7cb3b9c4-9082-97e9-ebfd-542243bf652b-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
5 siblings, 1 reply; 21+ messages in thread
From: John Hubbard @ 2017-07-07 5:30 UTC (permalink / raw)
To: Ross Zwisler, linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Box, David E, Dave Hansen, Zheng, Lv,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw, Rafael J. Wysocki,
Anaczkowski, Lukasz, Moore, Robert,
linux-acpi-u79uwXL29TY76Z2rM5mHXA, Odzioba, Lukasz,
Schmauss, Erik, Len Brown, Jerome Glisse,
devel-E0kO6a4B6psdnm+yROfE0A, Kogut, Jaroslaw,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Greg Kroah-Hartman,
Nachimuthu, Murugasamy, Rafael J. Wysocki, Lahtinen, Joonas,
Andrew Morton, Tim Chen
On 07/06/2017 02:52 PM, Ross Zwisler wrote:
[...]
>
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table. Suggestions for
> better naming are welcome.
>
Hi Ross,
Say, most of the places (file names, function and variable names, and even
print statements) where this patchset uses hmem or HMEM, it really seems to
mean, the Heterogeneous Memory Attribute Table. That's not *always* true, but
given that it's a pretty severe naming conflict, how about just changing:
hmem --> hmat
HMEM --> HMAT
...everywhere? Then you still have Heterogeneous Memory in the name, but
there is enough lexical distance (is that a thing? haha) between HMM and HMAT
to keep us all sane. :)
With or without the above suggestion, there are a few places (Kconfig, comments,
prints) where we can more easily make it clear that HMM != HMEM (or HMAT),
so for those I can just comment on them separately in the individual patches.
thanks,
john h
^ permalink raw reply [flat|nested] 21+ messages in thread