public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Add interfaces for ACPI MRRM table
@ 2025-02-10 21:12 Tony Luck
  2025-02-10 21:12 ` [PATCH 1/4] ACPICA: Define MRRM ACPI table Tony Luck
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Tony Luck @ 2025-02-10 21:12 UTC (permalink / raw)
  To: Robert Moore, Rafael J. Wysocki, Len Brown
  Cc: linux-acpi, acpica-devel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
	Danilo Krummrich, Andrew Morton, linux-kernel, Tony Luck

Memory used to be homogeneous. Then NUMA came along. Later different
types of memory (persistent memory, on-package high bandwidth memory,
CXL attached memory).

Each type of memory has its own performance characteristics, and users
will need to monitor and control access by type.

The MRRM solution is to tag physical address ranges with "region IDs"
so that platform firmware[1] can indicate the type of memory for each
range (with separate tags available for local vs. remote access to
each range).

The region IDs will be used to provide separate event counts for each
region for "perf" and for the "resctrl" file system to monitor and
control memory bandwidth in each region.

Users will need to know the address range(s) that are part of each
region. This patch series adds /sys/devices/memory/rangeX directories
to provide user space accessible enumeration.

-Tony

[1] MRRM definition allow for future expansion for the OS to assign
these region IDs.

Fenghua Yu (1):
  ACPICA: Define MRRM ACPI table

Tony Luck (3):
  ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI
  ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  ACPI/MRRM: ABI documentation for /sys/devices/system/memory/rangeX

 include/linux/memory.h                        |   9 +
 include/acpi/actbl3.h                         |  40 ++++
 drivers/acpi/acpi_mrrm.c                      | 188 ++++++++++++++++++
 drivers/base/memory.c                         |   9 +
 .../ABI/testing/sysfs-devices-memory          |  32 +++
 arch/x86/Kconfig                              |   1 +
 drivers/acpi/Kconfig                          |   4 +
 drivers/acpi/Makefile                         |   1 +
 8 files changed, 284 insertions(+)
 create mode 100644 drivers/acpi/acpi_mrrm.c


base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
-- 
2.48.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] ACPICA: Define MRRM ACPI table
  2025-02-10 21:12 [PATCH 0/4] Add interfaces for ACPI MRRM table Tony Luck
@ 2025-02-10 21:12 ` Tony Luck
  2025-02-11 12:16   ` Rafael J. Wysocki
  2025-02-10 21:12 ` [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI Tony Luck
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Tony Luck @ 2025-02-10 21:12 UTC (permalink / raw)
  To: Robert Moore, Rafael J. Wysocki, Len Brown
  Cc: linux-acpi, acpica-devel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
	Danilo Krummrich, Andrew Morton, linux-kernel, Fenghua Yu,
	Tony Luck

From: Fenghua Yu <fenghua.yu@intel.com>

The MRRM table describes association between physical address ranges
and "region numbers". This is used by:

1) The /sys/fs/resctrl filesystem to report memory traffic per-RMID for
each region.
2) Perf subsystem to report memory related uncore events per region.

Structure defined in the Intel Resource Director Technology (RDT)
Architecture specification downloadable from www.intel.com/sdm

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/acpi/actbl3.h | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/include/acpi/actbl3.h b/include/acpi/actbl3.h
index 5cd755143b7d..1b9a03ff73ba 100644
--- a/include/acpi/actbl3.h
+++ b/include/acpi/actbl3.h
@@ -42,6 +42,7 @@
 #define ACPI_SIG_WSMT           "WSMT"	/* Windows SMM Security Mitigations Table */
 #define ACPI_SIG_XENV           "XENV"	/* Xen Environment table */
 #define ACPI_SIG_XXXX           "XXXX"	/* Intermediate AML header for ASL/ASL+ converter */
+#define ACPI_SIG_MRRM           "MRRM"	/* Memory Range and Region Mapping table */
 
 /*
  * All tables must be byte-packed to match the ACPI specification, since
@@ -793,6 +794,45 @@ struct acpi_table_xenv {
 	u8 event_flags;
 };
 
+/*******************************************************************************
+ *
+ * MRRM - Memory Range and Region Mapping (MRRM) table
+ *
+ ******************************************************************************/
+
+struct acpi_table_mrrm {
+	struct acpi_table_header header;
+	u8 max_mem_region;	/* Max Memory Regions supported */
+	u8 flags;		/* Region assignment type */
+	u8 reserved[26];
+	/* Memory range entry array */
+};
+#define ACPI_MRRM_FLAGS_REGION_ASSIGNMENT_OS	(1<<0)
+
+/*******************************************************************************
+ *
+ * Memory Range entry - Memory Range entry in MRRM table
+ *
+ ******************************************************************************/
+
+struct acpi_table_mrrm_mem_range_entry {
+	u16 type;		/* Type 0="MRRM" */
+	u16 length;		/* 32B + sizeof(Region-ID Programming Reg[]) */
+	u32 reserved;		/* Reserved */
+	u32 base_addr_low;	/* Low 32 bits of base addr of the mem range */
+	u32 base_addr_high;	/* High 32 bits of base addr of the mem range */
+	u32 len_low;		/* Low 32 bits of length of the mem range */
+	u32 len_high;		/* High 32 bits of length of the mem range */
+	u16 region_id_flags;	/* Valid local or remote Region-ID */
+	u8  local_region_id;	/* Platform-assigned static local Region-ID */
+	u8  remote_region_id;	/* Platform-assigned static remote Region-ID */
+	u32 reserved1;		/* Reserved */
+	/* Region-ID Programming Registers[] */
+};
+
+#define ACPI_MRRM_VALID_REGION_ID_FLAGS_LOCAL 	(1<<0)
+#define ACPI_MRRM_VALID_REGION_ID_FLAGS_REMOTE	(1<<1)
+
 /* Reset to default packing */
 
 #pragma pack()
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI
  2025-02-10 21:12 [PATCH 0/4] Add interfaces for ACPI MRRM table Tony Luck
  2025-02-10 21:12 ` [PATCH 1/4] ACPICA: Define MRRM ACPI table Tony Luck
@ 2025-02-10 21:12 ` Tony Luck
  2025-02-11  0:21   ` Luck, Tony
  2025-02-11 13:08   ` David Hildenbrand
  2025-02-10 21:12 ` [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX Tony Luck
  2025-02-10 21:12 ` [PATCH 4/4] ACPI/MRRM: ABI documentation for /sys/devices/system/memory/rangeX Tony Luck
  3 siblings, 2 replies; 15+ messages in thread
From: Tony Luck @ 2025-02-10 21:12 UTC (permalink / raw)
  To: Robert Moore, Rafael J. Wysocki, Len Brown
  Cc: linux-acpi, acpica-devel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
	Danilo Krummrich, Andrew Morton, linux-kernel, Tony Luck

Perf and resctrl users need an enumeration of which memory addresses
are bound to which "region" tag.

Parse the ACPI MRRM table and add /sys entries for each memory range
describing base address, length, and which region tags apply for
same-socket and cross-socket access.

[Derived from code developed by Fenghua Yu <fenghua.yu@intel.com>]

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/memory.h   |   9 +++
 drivers/acpi/acpi_mrrm.c | 159 +++++++++++++++++++++++++++++++++++++++
 drivers/base/memory.c    |   9 +++
 arch/x86/Kconfig         |   1 +
 drivers/acpi/Kconfig     |   4 +
 drivers/acpi/Makefile    |   1 +
 6 files changed, 183 insertions(+)
 create mode 100644 drivers/acpi/acpi_mrrm.c

diff --git a/include/linux/memory.h b/include/linux/memory.h
index c0afee5d126e..0a21943ce44d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -189,4 +189,13 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
  */
 extern struct mutex text_mutex;
 
+#ifdef CONFIG_ACPI_MRRM
+int mrrm_max_mem_region(void);
+int memory_subsys_device_register(struct device *dev);
+#else
+static inline int mrrm_max_mem_region(void) { return -EONENT; }
+static inline int memory_subsys_device_register(struct device *dev) { return -EINVAL; }
+#define memory_subsys_device_register memory_subsys_device_register
+#endif
+
 #endif /* _LINUX_MEMORY_H_ */
diff --git a/drivers/acpi/acpi_mrrm.c b/drivers/acpi/acpi_mrrm.c
new file mode 100644
index 000000000000..51ed9064e025
--- /dev/null
+++ b/drivers/acpi/acpi_mrrm.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2025, Intel Corporation.
+ *
+ * Memory Range and Region Mapping (MRRM) structure
+ *
+ * Parse and report the platform's MRRM table in /sys.
+ */
+
+#define pr_fmt(fmt) "acpi/mrrm: " fmt
+
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/memory.h>
+#include <linux/sysfs.h>
+
+static int max_mem_region = -ENOENT;
+
+/* Access for use by resctrl file system */
+int mrrm_max_mem_region(void)
+{
+	return max_mem_region;
+}
+
+struct mrrm_mem_range_entry {
+	struct device dev;
+	u64 base;
+	u64 length;
+	u8  local_region_id;
+	u8  remote_region_id;
+};
+
+static struct mrrm_mem_range_entry *mrrm_mem_range_entry;
+static u32 mrrm_mem_entry_num;
+
+static __init int acpi_parse_mrrm(struct acpi_table_header *table)
+{
+	struct acpi_table_mrrm_mem_range_entry *mre_entry;
+	struct acpi_table_mrrm *mrrm;
+	void *mre, *mrrm_end;
+	int mre_count = 0;
+
+	mrrm = (struct acpi_table_mrrm *)table;
+	if (!mrrm)
+		return -ENODEV;
+
+	if (mrrm->flags & ACPI_MRRM_FLAGS_REGION_ASSIGNMENT_OS)
+		return -EOPNOTSUPP;
+
+	mrrm_end = (void *)mrrm + mrrm->header.length - 1;
+	mre = (void *)mrrm + sizeof(struct acpi_table_mrrm);
+	while (mre < mrrm_end) {
+		mre_entry = mre;
+		mre_count++;
+		mre += mre_entry->length;
+	}
+	if (!mre_count) {
+		pr_info(FW_BUG "No ranges listed in MRRM table\n");
+		return -EINVAL;
+	}
+
+	mrrm_mem_range_entry = kmalloc_array(mre_count, sizeof(*mrrm_mem_range_entry),
+					     GFP_KERNEL | __GFP_ZERO);
+	if (!mrrm_mem_range_entry)
+		return -ENOMEM;
+
+	mre = (void *)mrrm + sizeof(struct acpi_table_mrrm);
+	while (mre < mrrm_end) {
+		struct mrrm_mem_range_entry *e;
+
+		mre_entry = mre;
+		e = mrrm_mem_range_entry + mrrm_mem_entry_num;
+
+		e->base = ((u64)mre_entry->base_addr_high << 32) + mre_entry->base_addr_low;
+		e->length = ((u64)mre_entry->len_high << 32) + mre_entry->len_low;
+
+		if (mre_entry->region_id_flags & ACPI_MRRM_VALID_REGION_ID_FLAGS_LOCAL)
+			e->local_region_id = mre_entry->local_region_id;
+		else
+			e->local_region_id = -1;
+		if (mre_entry->region_id_flags & ACPI_MRRM_VALID_REGION_ID_FLAGS_REMOTE)
+			e->remote_region_id = mre_entry->remote_region_id;
+		else
+			e->remote_region_id = -1;
+
+		mrrm_mem_entry_num++;
+		mre += mre_entry->length;
+	}
+
+	max_mem_region = mrrm->max_mem_region;
+
+	return 0;
+}
+
+#define RANGE_ATTR(name)						\
+static ssize_t name##_show(struct device *dev,				\
+			  struct device_attribute *attr, char *buf)	\
+{									\
+	struct mrrm_mem_range_entry *mre;				\
+									\
+	mre = container_of(dev, struct mrrm_mem_range_entry, dev);	\
+	return sysfs_emit(buf, "0x%lx\n", (unsigned long)mre->name);	\
+}									\
+static DEVICE_ATTR_RO(name)
+
+RANGE_ATTR(base);
+RANGE_ATTR(length);
+RANGE_ATTR(local_region_id);
+RANGE_ATTR(remote_region_id);
+
+static struct attribute *memory_range_attrs[] = {
+	&dev_attr_base.attr,
+	&dev_attr_length.attr,
+	&dev_attr_local_region_id.attr,
+	&dev_attr_remote_region_id.attr,
+	NULL
+};
+
+ATTRIBUTE_GROUPS(memory_range);
+
+static __init int add_boot_memory_ranges(void)
+{
+	char name[16];
+	int i, ret;
+
+	for (i = 0; i < mrrm_mem_entry_num; i++) {
+		struct mrrm_mem_range_entry *entry;
+
+		entry = mrrm_mem_range_entry + i;
+
+		sprintf(name, "range%d", i);
+		entry->dev.init_name = name;
+
+		entry->dev.id = i;
+		entry->dev.groups = memory_range_groups;
+
+		ret = memory_subsys_device_register(&entry->dev);
+		if (ret) {
+			put_device(&entry->dev);
+			return ret;
+		}
+	}
+
+	return ret;
+}
+
+static __init int mrrm_init(void)
+{
+	int ret;
+
+	ret = acpi_table_parse(ACPI_SIG_MRRM, acpi_parse_mrrm);
+
+	if (ret < 0)
+		return ret;
+
+	return add_boot_memory_ranges();
+}
+device_initcall(mrrm_init);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 348c5dbbfa68..1f7853a4df5c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -699,6 +699,15 @@ static int __add_memory_block(struct memory_block *memory)
 	return ret;
 }
 
+#ifndef memory_subsys_device_register
+int memory_subsys_device_register(struct device *dev)
+{
+	dev->bus = &memory_subsys;
+
+	return device_register(dev);
+}
+#endif
+
 static struct zone *early_node_zone_for_memory_block(struct memory_block *mem,
 						     int nid)
 {
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 87198d957e2f..96aa73e8fb13 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -36,6 +36,7 @@ config X86_64
 	select ARCH_HAS_ELFCORE_COMPAT
 	select ZONE_DMA32
 	select EXECMEM if DYNAMIC_FTRACE
+	select ACPI_MRRM			if MEMORY_HOTPLUG
 
 config FORCE_DYNAMIC_FTRACE
 	def_bool y
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index d81b55f5068c..c3d1b0217e99 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -576,6 +576,10 @@ config ACPI_FFH
 	  Enable this feature if you want to set up and install the FFH Address
 	  Space handler to handle FFH OpRegion in the firmware.
 
+config ACPI_MRRM
+	bool
+	depends on MEMORY_HOTPLUG
+
 source "drivers/acpi/pmic/Kconfig"
 
 config ACPI_VIOT
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 40208a0f5dfb..5092b518fc9b 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -62,6 +62,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)	+= acpi_watchdog.o
 acpi-$(CONFIG_ACPI_PRMT)	+= prmt.o
 acpi-$(CONFIG_ACPI_PCC)		+= acpi_pcc.o
 acpi-$(CONFIG_ACPI_FFH)		+= acpi_ffh.o
+acpi-$(CONFIG_ACPI_MRRM)	+= acpi_mrrm.o
 
 # Address translation
 acpi-$(CONFIG_ACPI_ADXL)	+= acpi_adxl.o
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-10 21:12 [PATCH 0/4] Add interfaces for ACPI MRRM table Tony Luck
  2025-02-10 21:12 ` [PATCH 1/4] ACPICA: Define MRRM ACPI table Tony Luck
  2025-02-10 21:12 ` [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI Tony Luck
@ 2025-02-10 21:12 ` Tony Luck
  2025-02-11  6:51   ` Greg Kroah-Hartman
  2025-02-10 21:12 ` [PATCH 4/4] ACPI/MRRM: ABI documentation for /sys/devices/system/memory/rangeX Tony Luck
  3 siblings, 1 reply; 15+ messages in thread
From: Tony Luck @ 2025-02-10 21:12 UTC (permalink / raw)
  To: Robert Moore, Rafael J. Wysocki, Len Brown
  Cc: linux-acpi, acpica-devel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
	Danilo Krummrich, Andrew Morton, linux-kernel, Tony Luck

Users will likely want to know which node owns each memory range
and which CPUs are local to the range.

Add a symlink to the node directory to provide both pieces of information.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/acpi_mrrm.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/acpi/acpi_mrrm.c b/drivers/acpi/acpi_mrrm.c
index 51ed9064e025..28b484943bbd 100644
--- a/drivers/acpi/acpi_mrrm.c
+++ b/drivers/acpi/acpi_mrrm.c
@@ -119,6 +119,31 @@ static struct attribute *memory_range_attrs[] = {
 
 ATTRIBUTE_GROUPS(memory_range);
 
+static __init int add_node_link(struct mrrm_mem_range_entry *entry)
+{
+	struct node *node = NULL;
+	int ret = 0;
+	int nid;
+
+	for_each_online_node(nid) {
+		for (int z = 0; z < MAX_NR_ZONES; z++) {
+			struct zone *zone = NODE_DATA(nid)->node_zones + z;
+
+			if (!populated_zone(zone))
+				continue;
+			if (zone_intersects(zone, PHYS_PFN(entry->base), PHYS_PFN(entry->length))) {
+				node = node_devices[zone->node];
+				goto found;
+			}
+		}
+	}
+found:
+	if (node)
+		ret = sysfs_create_link(&entry->dev.kobj, &node->dev.kobj, "node");
+
+	return ret;
+}
+
 static __init int add_boot_memory_ranges(void)
 {
 	char name[16];
@@ -140,6 +165,10 @@ static __init int add_boot_memory_ranges(void)
 			put_device(&entry->dev);
 			return ret;
 		}
+
+		ret = add_node_link(entry);
+		if (ret)
+			break;
 	}
 
 	return ret;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] ACPI/MRRM: ABI documentation for /sys/devices/system/memory/rangeX
  2025-02-10 21:12 [PATCH 0/4] Add interfaces for ACPI MRRM table Tony Luck
                   ` (2 preceding siblings ...)
  2025-02-10 21:12 ` [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX Tony Luck
@ 2025-02-10 21:12 ` Tony Luck
  3 siblings, 0 replies; 15+ messages in thread
From: Tony Luck @ 2025-02-10 21:12 UTC (permalink / raw)
  To: Robert Moore, Rafael J. Wysocki, Len Brown
  Cc: linux-acpi, acpica-devel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
	Danilo Krummrich, Andrew Morton, linux-kernel, Tony Luck

ABI providing users with a mapping between physical address ranges
and "region id" used by perf for uncore memory events and the
resctrl file system for per-region memory monitoring and control.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../ABI/testing/sysfs-devices-memory          | 32 +++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
index cec65827e602..fb01739e016e 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -118,3 +118,35 @@ Description:
 		(RO) indicates whether or not the kernel updates relevant kexec
 		segments on memory hot un/plug and/or on/offline events, avoiding the
 		need to reload kdump kernel.
+
+What:           /sys/devices/system/memory/rangeX/base
+Date:           January 2025
+Contact:	Tony Luck <tony.luck@intel.com>
+Description:
+		On systems with the ACPI MRRM table reports the
+		base system physical address of memory range X.
+
+What:           /sys/devices/system/memory/rangeX/length
+Date:           January 2025
+Contact:	Tony Luck <tony.luck@intel.com>
+Description:
+		On systems with the ACPI MRRM table reports the
+		size of rangeX system physical memory.
+
+What:           /sys/devices/system/memory/rangeX/local_region_id
+Date:           January 2025
+Contact:	Tony Luck <tony.luck@intel.com>
+Description:
+		On systems with the ACPI MRRM table reports the
+		the region id associated with memory access by
+		agents local to this range of addresses. Reports
+		0xff when no region id has been assigned.
+
+What:           /sys/devices/system/memory/rangeX/remote_region_id
+Date:           January 2025
+Contact:	Tony Luck <tony.luck@intel.com>
+Description:
+		On systems with the ACPI MRRM table reports the
+		the region id associated with memory access by
+		agents non-local to this range of addresses. Reports
+		0xff when no region id has been assigned.
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI
  2025-02-10 21:12 ` [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI Tony Luck
@ 2025-02-11  0:21   ` Luck, Tony
  2025-02-11 13:08   ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2025-02-11  0:21 UTC (permalink / raw)
  To: Moore, Robert, Wysocki, Rafael J, Len Brown
  Cc: linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H. Peter Anvin, David Hildenbrand, Oscar Salvador,
	Greg Kroah-Hartman, Danilo Krummrich, Andrew Morton,
	linux-kernel@vger.kernel.org

> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index c0afee5d126e..0a21943ce44d 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -189,4 +189,13 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
>   */
>  extern struct mutex text_mutex;
>
> +#ifdef CONFIG_ACPI_MRRM
> +int mrrm_max_mem_region(void);
> +int memory_subsys_device_register(struct device *dev);
> +#else
> +static inline int mrrm_max_mem_region(void) { return -EONENT; }

The lkp robot just pointed out my spelling error. Should be ENOENT.

> +static inline int memory_subsys_device_register(struct device *dev) { return -EINVAL; }
> +#define memory_subsys_device_register memory_subsys_device_register
> +#endif
> +
>  #endif /* _LINUX_MEMORY_H_ */

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-10 21:12 ` [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX Tony Luck
@ 2025-02-11  6:51   ` Greg Kroah-Hartman
  2025-02-11 13:27     ` David Hildenbrand
  2025-02-11 17:02     ` Luck, Tony
  0 siblings, 2 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-11  6:51 UTC (permalink / raw)
  To: Tony Luck
  Cc: Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi,
	acpica-devel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, David Hildenbrand,
	Oscar Salvador, Danilo Krummrich, Andrew Morton, linux-kernel

On Mon, Feb 10, 2025 at 01:12:22PM -0800, Tony Luck wrote:
> Users will likely want to know which node owns each memory range
> and which CPUs are local to the range.
> 
> Add a symlink to the node directory to provide both pieces of information.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/acpi/acpi_mrrm.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/acpi/acpi_mrrm.c b/drivers/acpi/acpi_mrrm.c
> index 51ed9064e025..28b484943bbd 100644
> --- a/drivers/acpi/acpi_mrrm.c
> +++ b/drivers/acpi/acpi_mrrm.c
> @@ -119,6 +119,31 @@ static struct attribute *memory_range_attrs[] = {
>  
>  ATTRIBUTE_GROUPS(memory_range);
>  
> +static __init int add_node_link(struct mrrm_mem_range_entry *entry)
> +{
> +	struct node *node = NULL;
> +	int ret = 0;
> +	int nid;
> +
> +	for_each_online_node(nid) {
> +		for (int z = 0; z < MAX_NR_ZONES; z++) {
> +			struct zone *zone = NODE_DATA(nid)->node_zones + z;
> +
> +			if (!populated_zone(zone))
> +				continue;
> +			if (zone_intersects(zone, PHYS_PFN(entry->base), PHYS_PFN(entry->length))) {
> +				node = node_devices[zone->node];
> +				goto found;
> +			}
> +		}
> +	}
> +found:
> +	if (node)
> +		ret = sysfs_create_link(&entry->dev.kobj, &node->dev.kobj, "node");

What is going to remove this symlink if the memory goes away?  Or do
these never get removed?

symlinks in sysfs created like this always worry me.  What is going to
use it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] ACPICA: Define MRRM ACPI table
  2025-02-10 21:12 ` [PATCH 1/4] ACPICA: Define MRRM ACPI table Tony Luck
@ 2025-02-11 12:16   ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2025-02-11 12:16 UTC (permalink / raw)
  To: Tony Luck
  Cc: Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi,
	acpica-devel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, David Hildenbrand,
	Oscar Salvador, Greg Kroah-Hartman, Danilo Krummrich,
	Andrew Morton, linux-kernel, Fenghua Yu

On Mon, Feb 10, 2025 at 10:12 PM Tony Luck <tony.luck@intel.com> wrote:
>
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> The MRRM table describes association between physical address ranges
> and "region numbers". This is used by:
>
> 1) The /sys/fs/resctrl filesystem to report memory traffic per-RMID for
> each region.
> 2) Perf subsystem to report memory related uncore events per region.
>
> Structure defined in the Intel Resource Director Technology (RDT)
> Architecture specification downloadable from www.intel.com/sdm
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

So the process for ACPICA changes is that they first need to go into
the upstream ACPICA project on GitHub.

Once merged there, you can submit a corresponding Linux patch pointing
to the upstream commit, but submitting it is not mandatory because
upstream material lands in the Linux kernel eventually automatically.

> ---
>  include/acpi/actbl3.h | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/include/acpi/actbl3.h b/include/acpi/actbl3.h
> index 5cd755143b7d..1b9a03ff73ba 100644
> --- a/include/acpi/actbl3.h
> +++ b/include/acpi/actbl3.h
> @@ -42,6 +42,7 @@
>  #define ACPI_SIG_WSMT           "WSMT" /* Windows SMM Security Mitigations Table */
>  #define ACPI_SIG_XENV           "XENV" /* Xen Environment table */
>  #define ACPI_SIG_XXXX           "XXXX" /* Intermediate AML header for ASL/ASL+ converter */
> +#define ACPI_SIG_MRRM           "MRRM" /* Memory Range and Region Mapping table */
>
>  /*
>   * All tables must be byte-packed to match the ACPI specification, since
> @@ -793,6 +794,45 @@ struct acpi_table_xenv {
>         u8 event_flags;
>  };
>
> +/*******************************************************************************
> + *
> + * MRRM - Memory Range and Region Mapping (MRRM) table
> + *
> + ******************************************************************************/
> +
> +struct acpi_table_mrrm {
> +       struct acpi_table_header header;
> +       u8 max_mem_region;      /* Max Memory Regions supported */
> +       u8 flags;               /* Region assignment type */
> +       u8 reserved[26];
> +       /* Memory range entry array */
> +};
> +#define ACPI_MRRM_FLAGS_REGION_ASSIGNMENT_OS   (1<<0)
> +
> +/*******************************************************************************
> + *
> + * Memory Range entry - Memory Range entry in MRRM table
> + *
> + ******************************************************************************/
> +
> +struct acpi_table_mrrm_mem_range_entry {
> +       u16 type;               /* Type 0="MRRM" */
> +       u16 length;             /* 32B + sizeof(Region-ID Programming Reg[]) */
> +       u32 reserved;           /* Reserved */
> +       u32 base_addr_low;      /* Low 32 bits of base addr of the mem range */
> +       u32 base_addr_high;     /* High 32 bits of base addr of the mem range */
> +       u32 len_low;            /* Low 32 bits of length of the mem range */
> +       u32 len_high;           /* High 32 bits of length of the mem range */
> +       u16 region_id_flags;    /* Valid local or remote Region-ID */
> +       u8  local_region_id;    /* Platform-assigned static local Region-ID */
> +       u8  remote_region_id;   /* Platform-assigned static remote Region-ID */
> +       u32 reserved1;          /* Reserved */
> +       /* Region-ID Programming Registers[] */
> +};
> +
> +#define ACPI_MRRM_VALID_REGION_ID_FLAGS_LOCAL  (1<<0)
> +#define ACPI_MRRM_VALID_REGION_ID_FLAGS_REMOTE (1<<1)
> +
>  /* Reset to default packing */
>
>  #pragma pack()
> --
> 2.48.1
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI
  2025-02-10 21:12 ` [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI Tony Luck
  2025-02-11  0:21   ` Luck, Tony
@ 2025-02-11 13:08   ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-02-11 13:08 UTC (permalink / raw)
  To: Tony Luck, Robert Moore, Rafael J. Wysocki, Len Brown
  Cc: linux-acpi, acpica-devel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Oscar Salvador,
	Greg Kroah-Hartman, Danilo Krummrich, Andrew Morton, linux-kernel

On 10.02.25 22:12, Tony Luck wrote:
> Perf and resctrl users need an enumeration of which memory addresses
> are bound to which "region" tag.
> 
> Parse the ACPI MRRM table and add /sys entries for each memory range
> describing base address, length, and which region tags apply for
> same-socket and cross-socket access.

How does an example in /sys/devices/system/memory/ look like later?

 From a quick glimpse, I am not sure if this really belongs into 
/sys/devices/system/memory/, but I am missing some information in cover 
letter / patch.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-11  6:51   ` Greg Kroah-Hartman
@ 2025-02-11 13:27     ` David Hildenbrand
  2025-02-11 18:05       ` Luck, Tony
  2025-02-11 17:02     ` Luck, Tony
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-11 13:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tony Luck
  Cc: Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi,
	acpica-devel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Oscar Salvador,
	Danilo Krummrich, Andrew Morton, linux-kernel

On 11.02.25 07:51, Greg Kroah-Hartman wrote:
> On Mon, Feb 10, 2025 at 01:12:22PM -0800, Tony Luck wrote:
>> Users will likely want to know which node owns each memory range
>> and which CPUs are local to the range.
>>
>> Add a symlink to the node directory to provide both pieces of information.
>>
>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>> ---
>>   drivers/acpi/acpi_mrrm.c | 29 +++++++++++++++++++++++++++++
>>   1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/acpi/acpi_mrrm.c b/drivers/acpi/acpi_mrrm.c
>> index 51ed9064e025..28b484943bbd 100644
>> --- a/drivers/acpi/acpi_mrrm.c
>> +++ b/drivers/acpi/acpi_mrrm.c
>> @@ -119,6 +119,31 @@ static struct attribute *memory_range_attrs[] = {
>>   
>>   ATTRIBUTE_GROUPS(memory_range);
>>   
>> +static __init int add_node_link(struct mrrm_mem_range_entry *entry)
>> +{
>> +	struct node *node = NULL;
>> +	int ret = 0;
>> +	int nid;
>> +
>> +	for_each_online_node(nid) {
>> +		for (int z = 0; z < MAX_NR_ZONES; z++) {
>> +			struct zone *zone = NODE_DATA(nid)->node_zones + z;
>> +
>> +			if (!populated_zone(zone))
>> +				continue;
>> +			if (zone_intersects(zone, PHYS_PFN(entry->base), PHYS_PFN(entry->length))) {
>> +				node = node_devices[zone->node];
>> +				goto found;
>> +			}
>> +		}
>> +	}
>> +found:
>> +	if (node)
>> +		ret = sysfs_create_link(&entry->dev.kobj, &node->dev.kobj, "node");
> 
> What is going to remove this symlink if the memory goes away?  Or do
> these never get removed?
> 
> symlinks in sysfs created like this always worry me.  What is going to
> use it?

On top of that, we seem to be building a separate hierarchy here.

/sys/devices/system/memory/ operates in memory block granularity.

/sys/devices/system/node/nodeX/ links to memory blocks that belong to it.

Why is the memory-block granularity insufficient, and why do we have to 
squeeze in another range API here?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-11  6:51   ` Greg Kroah-Hartman
  2025-02-11 13:27     ` David Hildenbrand
@ 2025-02-11 17:02     ` Luck, Tony
  2025-02-12  7:48       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 15+ messages in thread
From: Luck, Tony @ 2025-02-11 17:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Moore, Robert, Wysocki, Rafael J, Len Brown,
	linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H. Peter Anvin, David Hildenbrand, Oscar Salvador,
	Danilo Krummrich, Andrew Morton, linux-kernel@vger.kernel.org

>> +	if (node)
>> +		ret = sysfs_create_link(&entry->dev.kobj, &node->dev.kobj, "node");
>
> What is going to remove this symlink if the memory goes away?  Or do
> these never get removed?

There's currently no method for runtime changes to these memory ranges. They
are described by a static ACPI table.  I need to poke the folks that came up
with this to ask how memory hotplug will be handled (since CXL seems to be
making that fashionable again).

> symlinks in sysfs created like this always worry me.  What is going to
> use it?

<hand waves>User space tools that want to understand what the "per-region"
monitoring and control features are actually operating on.</hand waves>

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-11 13:27     ` David Hildenbrand
@ 2025-02-11 18:05       ` Luck, Tony
  2025-02-13 13:30         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Luck, Tony @ 2025-02-11 18:05 UTC (permalink / raw)
  To: David Hildenbrand, Greg Kroah-Hartman
  Cc: Moore, Robert, Wysocki, Rafael J, Len Brown,
	linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H. Peter Anvin, Oscar Salvador, Danilo Krummrich,
	Andrew Morton, linux-kernel@vger.kernel.org

> > What is going to remove this symlink if the memory goes away?  Or do
> > these never get removed?
> >
> > symlinks in sysfs created like this always worry me.  What is going to
> > use it?
>
> On top of that, we seem to be building a separate hierarchy here.
>
> /sys/devices/system/memory/ operates in memory block granularity.

What defines the memory blocks? I'd initially assumed some connection
to the ACPI SRAT table. But on my test system there are only three
entries in SRAT that define non-zero sized memory blocks (two on
socket/node 0 and one on socket/node 1), yet there are:
    memory0 .. memory32 directories
in /sys/devices/system/memory.

The phys_device and phys_index files aren't helping me figure out
what each of them mean.

> /sys/devices/system/node/nodeX/ links to memory blocks that belong to it.
>
> Why is the memory-block granularity insufficient, and why do we have to
> squeeze in another range API here?

If an MRRM range consists of some set of memory blocks (making
sure that no memory block spans across MRRM range boundaries,
then I could add the {local,remote}_region_id files into the memory
block directories.

This could work now while the region assignments are done by the
BIOS. But in the future when OS gets the opportunity to change them
it might be weird if an MRRM range consists of multiple memory
block range, since the region_ids in each all update together.

/sys/devices/system/memory seemed like a logical place for
memory ranges. But should I jump up a level and make a new
/sys/devices/system/memory_regions directory to expose these
ranges?

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-11 17:02     ` Luck, Tony
@ 2025-02-12  7:48       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-12  7:48 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Moore, Robert, Wysocki, Rafael J, Len Brown,
	linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H. Peter Anvin, David Hildenbrand, Oscar Salvador,
	Danilo Krummrich, Andrew Morton, linux-kernel@vger.kernel.org

On Tue, Feb 11, 2025 at 05:02:11PM +0000, Luck, Tony wrote:
> >> +	if (node)
> >> +		ret = sysfs_create_link(&entry->dev.kobj, &node->dev.kobj, "node");
> >
> > What is going to remove this symlink if the memory goes away?  Or do
> > these never get removed?
> 
> There's currently no method for runtime changes to these memory ranges. They
> are described by a static ACPI table.  I need to poke the folks that came up
> with this to ask how memory hotplug will be handled (since CXL seems to be
> making that fashionable again).

ACPI should be supporting memory hotplug today, at the very least
"memory add", so surely you have some old boxes to test this with?

> > symlinks in sysfs created like this always worry me.  What is going to
> > use it?
> 
> <hand waves>User space tools that want to understand what the "per-region"
> monitoring and control features are actually operating on.</hand waves>

If you don't have a real user today, please don't include it now.  Wait
until it is actually needed.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-11 18:05       ` Luck, Tony
@ 2025-02-13 13:30         ` David Hildenbrand
  2025-02-13 19:05           ` Luck, Tony
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-13 13:30 UTC (permalink / raw)
  To: Luck, Tony, Greg Kroah-Hartman
  Cc: Moore, Robert, Wysocki, Rafael J, Len Brown,
	linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H. Peter Anvin, Oscar Salvador, Danilo Krummrich,
	Andrew Morton, linux-kernel@vger.kernel.org

On 11.02.25 19:05, Luck, Tony wrote:
>>> What is going to remove this symlink if the memory goes away?  Or do
>>> these never get removed?
>>>
>>> symlinks in sysfs created like this always worry me.  What is going to
>>> use it?
>>
>> On top of that, we seem to be building a separate hierarchy here.
>>
>> /sys/devices/system/memory/ operates in memory block granularity.
> 
> What defines the memory blocks? I'd initially assumed some connection
> to the ACPI SRAT table. But on my test system there are only three
> entries in SRAT that define non-zero sized memory blocks (two on
> socket/node 0 and one on socket/node 1), yet there are:
>      memory0 .. memory32 directories
> in /sys/devices/system/memory.

Each memory block is the same size (e.g., 128 MiB .. 2 GiB on x86-64).

The default is memory section granularity (e.g., 128 MiB on x86-64), but 
some configs allow for increasing it: see 
arch/x86/mm/init_64.c:memory_block_size_bytes(), and in particular 
probe_memory_block_size().

They define in the granularity in which we can online/offline/add/remove 
physical memory managed by the buddy.

We create these block during boot/during hotplug, and link them to the 
relevant nodes.

They do not reflect the HW state, but the state Linux manages that 
memory (through the buddy).

> 
> The phys_device and phys_index files aren't helping me figure out
> what each of them mean.

Yes, see Documentation/admin-guide/mm/memory-hotplug.rst

phys_device is a legacy thing for s390x, and phys_index is just the 
memory block ID.

You can derive the address range corresponding to a memory block using 
the ID.

/sys/devices/system/memory/block_size_bytes tells you the size of each 
block.

Address range of block X:
   [ X*block_size_bytes .. (X+1)*block_size_bytes )


Now, the whole interface her is designed for handling memory hotplug:

obj-$(CONFIG_MEMORY_HOTPLUG) += memory.o

It's worth noting that

1) Blocks might not be all-memory (e.g., memory holes). In that case,
    offlining/unplug is not supported.
2) Blocks might span multiple NUMA nodes (e.g., node ends / starts in
    the middle of a block). Similarly, in that case
    offlining/unplug is not supported.

I assume 1) is not a problem. I assume 2) could be a problem for your 
use case.

> 
>> /sys/devices/system/node/nodeX/ links to memory blocks that belong to it.
>>
>> Why is the memory-block granularity insufficient, and why do we have to
>> squeeze in another range API here?
> 
> If an MRRM range consists of some set of memory blocks (making
> sure that no memory block spans across MRRM range boundaries,
> then I could add the {local,remote}_region_id files into the memory
> block directories.
> 
> This could work now while the region assignments are done by the
> BIOS. But in the future when OS gets the opportunity to change them
> it might be weird if an MRRM range consists of multiple memory
> block range, since the region_ids in each all update together.

What about memory ranges not managed by the buddy (e.g., dax/pmem ranges 
not exposed to the buddy through dax/kmem driver, memory hidden from 
Linux using mem=X etc.)?

> 
> /sys/devices/system/memory seemed like a logical place for
> memory ranges. But should I jump up a level and make a new
> /sys/devices/system/memory_regions directory to expose these
> ranges?

Let's take one step back. We do have

1) /proc/iomem to list physical device ranges, without a notion of nodes 
/ other information. Maybe we could extend it, but it might be hard. 
Depending on *what* information we need to expose and for which memory.

/proc/iomem also doesn't indicate "System RAM" for memory not managed by 
the buddy.

2) /sys/devices/system/memory/memoryX and /sys/devices/system/node/

Again, the memory part is more hotplugged focused, and we treat 
individual memory blocks as "memory block devices".


Reading:

"
The MRRM solution is to tag physical address ranges with "region IDs"
so that platform firmware[1] can indicate the type of memory for each
range (with separate tags available for local vs. remote access to
each range).

The region IDs will be used to provide separate event counts for each
region for "perf" and for the "resctrl" file system to monitor and
control memory bandwidth in each region.

Users will need to know the address range(s) that are part of each
region."

A couple of questions:

a) How volatile is that information at runtime? Can ranges / IDs change?
    I read above that user space might in the future be able to
    reconfigure the ranges.

b) How is hotplug/unplug handled?

c) How are memory ranges not managed by Linux handled?

It might make sense to expose what you need in a more specialized, 
acpi/MRRM/perf specific form, and not as generic as you currently 
envision it.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX
  2025-02-13 13:30         ` David Hildenbrand
@ 2025-02-13 19:05           ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2025-02-13 19:05 UTC (permalink / raw)
  To: David Hildenbrand, Greg Kroah-Hartman
  Cc: Moore, Robert, Wysocki, Rafael J, Len Brown,
	linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H. Peter Anvin, Oscar Salvador, Danilo Krummrich,
	Andrew Morton, linux-kernel@vger.kernel.org

> A couple of questions:
>
> a) How volatile is that information at runtime? Can ranges / IDs change?
>     I read above that user space might in the future be able to
>     reconfigure the ranges.

Initial implementation has BIOS making all the choices. When there
is a system that supports OS changes I envision making the "local_region_id"
and "remote_region_id" files writeable for the sysadmin to make changes.

Note that the address ranges are fixed (and this isn't going to change).

> b) How is hotplug/unplug handled?

I'm looking for answers to this very good question. Plausibly systems
might reserve address ranges for later hotplug. Those ranges could be
enumerated in the MRRM table. But that is just a guess.

> c) How are memory ranges not managed by Linux handled?

It appears that all system memory is included in the range information.
so access by BIOS to reserved memory ranges would be counted
and controlled (unless SMI were to disable on entry). I'll ask this
question.

> It might make sense to expose what you need in a more specialized,
> acpi/MRRM/perf specific form, and not as generic as you currently
> envision it.

Agreed. The only thing going for /sys/devices/system/memory is the name.
The actual semantics of everything below there don't match well with this
usage.

Rafael: How do you feel about this (not implemented yet, just looking for
a new spot to expose this)?

$ cd /sys/firmware/acpi
$ tree memory_ranges/
memory_ranges/
├── range0
│   ├── base
│   ├── length
│   ├── local_region_id
│   └── remote_region_id
├── range1
│   ├── base
│   ├── length
│   ├── local_region_id
│   └── remote_region_id
└── range2
    ├── base
    ├── length
    ├── local_region_id
    └── remote_region_id

4 directories, 12 files

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-13 19:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10 21:12 [PATCH 0/4] Add interfaces for ACPI MRRM table Tony Luck
2025-02-10 21:12 ` [PATCH 1/4] ACPICA: Define MRRM ACPI table Tony Luck
2025-02-11 12:16   ` Rafael J. Wysocki
2025-02-10 21:12 ` [PATCH 2/4] ACPI/MRRM: Create /sys/devices/system/memory/rangeX ABI Tony Luck
2025-02-11  0:21   ` Luck, Tony
2025-02-11 13:08   ` David Hildenbrand
2025-02-10 21:12 ` [PATCH 3/4] ACPI/MRRM: Add "node" symlink to /sys/devices/system/memory/rangeX Tony Luck
2025-02-11  6:51   ` Greg Kroah-Hartman
2025-02-11 13:27     ` David Hildenbrand
2025-02-11 18:05       ` Luck, Tony
2025-02-13 13:30         ` David Hildenbrand
2025-02-13 19:05           ` Luck, Tony
2025-02-11 17:02     ` Luck, Tony
2025-02-12  7:48       ` Greg Kroah-Hartman
2025-02-10 21:12 ` [PATCH 4/4] ACPI/MRRM: ABI documentation for /sys/devices/system/memory/rangeX Tony Luck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox