* [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 2/7] mm/memory_hotplug: Allow memmap on memory hotplug request to fallback Aneesh Kumar K.V
` (6 subsequent siblings)
7 siblings, 0 replies; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.
No functional change in this patch.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/arm64/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
mm/Kconfig | 3 +++
3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1573257a4d6..0f749cfab8e6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_KEEP_MEMBLOCK
+ select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_GNU_PROPERTY
select ARCH_USE_MEMTEST
@@ -347,9 +348,6 @@ config GENERIC_CSUM
config GENERIC_CALIBRATE_DELAY
def_bool y
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
- def_bool y
-
config SMP
def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+ select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
def_bool y
depends on ARCH_ENABLE_MEMORY_HOTPLUG
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
- def_bool y
-
menu "Power management and ACPI options"
config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 5fe49c030961..721dc88423c7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -571,6 +571,9 @@ config MHP_MEMMAP_ON_MEMORY
endif # MEMORY_HOTPLUG
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+ bool
+
# Heavily threaded applications may benefit from splitting the mm-wide
# page_table_lock, so that faults on different parts of the user address
# space can be handled with less contention: split it at this NR_CPUS.
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v7 2/7] mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 3/7] mm/memory_hotplug: Allow architecture to override memmap on memory support check Aneesh Kumar K.V
` (5 subsequent siblings)
7 siblings, 0 replies; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
drivers/acpi/acpi_memhotplug.c | 3 +--
include/linux/memory_hotplug.h | 3 ++-
mm/memory_hotplug.c | 13 ++++++-------
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24f662d8bd39..d0c1a71007d0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -211,8 +211,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
if (!info->length)
continue;
- if (mhp_supports_memmap_on_memory(info->length))
- mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+ mhp_flags |= MHP_MEMMAP_ON_MEMORY;
result = __add_memory(mgid, info->start_addr, info->length,
mhp_flags);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 013c69753c91..7d2076583494 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -97,6 +97,8 @@ typedef int __bitwise mhp_t;
* To do so, we will use the beginning of the hot-added range to build
* the page tables for the memmap array that describes the entire range.
* Only selected architectures support it with SPARSE_VMEMMAP.
+ * This is only a hint, the core kernel can decide to not do this based on
+ * different alignment checks.
*/
#define MHP_MEMMAP_ON_MEMORY ((__force mhp_t)BIT(1))
/*
@@ -354,7 +356,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid,
extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
struct mhp_params *params);
void arch_remove_linear_mapping(u64 start, u64 size);
-extern bool mhp_supports_memmap_on_memory(unsigned long size);
#endif /* CONFIG_MEMORY_HOTPLUG */
#endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7cfd13c91568..eca32ccd45cc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,7 +1247,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}
-bool mhp_supports_memmap_on_memory(unsigned long size)
+static bool mhp_supports_memmap_on_memory(unsigned long size)
{
unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
@@ -1339,13 +1339,12 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
* Self hosted memmap array
*/
if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
- if (!mhp_supports_memmap_on_memory(size)) {
- ret = -EINVAL;
- goto error;
+ if (mhp_supports_memmap_on_memory(size)) {
+ mhp_altmap.free = PHYS_PFN(size);
+ mhp_altmap.base_pfn = PHYS_PFN(start);
+ params.altmap = &mhp_altmap;
}
- mhp_altmap.free = PHYS_PFN(size);
- mhp_altmap.base_pfn = PHYS_PFN(start);
- params.altmap = &mhp_altmap;
+ /* fallback to not using altmap */
}
/* call arch's memory hotadd */
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v7 3/7] mm/memory_hotplug: Allow architecture to override memmap on memory support check
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 2/7] mm/memory_hotplug: Allow memmap on memory hotplug request to fallback Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Aneesh Kumar K.V
` (4 subsequent siblings)
7 siblings, 0 replies; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
Some architectures would want different restrictions. Hence add an
architecture-specific override.
The PMD_SIZE check is moved there.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/memory_hotplug.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eca32ccd45cc..746cb7c08c64 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1247,10 +1247,26 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}
+static inline unsigned long memory_block_memmap_size(void)
+{
+ return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+}
+
+#ifndef arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
+{
+ /*
+ * As default, we want the vmemmap to span a complete PMD such that we
+ * can map the vmemmap using a single PMD if supported by the
+ * architecture.
+ */
+ return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+}
+#endif
+
static bool mhp_supports_memmap_on_memory(unsigned long size)
{
- unsigned long nr_vmemmap_pages = size / PAGE_SIZE;
- unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
+ unsigned long vmemmap_size = memory_block_memmap_size();
unsigned long remaining_size = size - vmemmap_size;
/*
@@ -1281,8 +1297,8 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
*/
return mhp_memmap_on_memory() &&
size == memory_block_size_bytes() &&
- IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
- IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT));
+ IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
+ arch_supports_memmap_on_memory(vmemmap_size);
}
/*
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
` (2 preceding siblings ...)
2023-08-01 4:41 ` [PATCH v7 3/7] mm/memory_hotplug: Allow architecture to override memmap on memory support check Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 9:04 ` Michal Hocko
2023-08-01 4:41 ` [PATCH v7 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
` (3 subsequent siblings)
7 siblings, 1 reply; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.
This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page
block aligned with different memory block sizes. Using this facility
implies the kernel will be reserving some pages for every memoryblock.
This allows the memmap on memory feature to be widely useful with
different memory block size values.
For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
.../admin-guide/mm/memory-hotplug.rst | 12 ++
mm/memory_hotplug.c | 120 +++++++++++++++---
2 files changed, 113 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index bd77841041af..2994958c7ce8 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -433,6 +433,18 @@ The following module parameters are currently defined:
memory in a way that huge pages in bigger
granularity cannot be formed on hotplugged
memory.
+
+ With value "force" it could result in memory
+ wastage due to memmap size limitations. For
+ example, if the memmap for a memory block
+ requires 1 MiB, but the pageblock size is 2
+ MiB, 1 MiB of hotplugged memory will be wasted.
+ Note that there are still cases where the
+ feature cannot be enforced: for example, if the
+ memmap is smaller than a single page, or if the
+ architecture does not support the forced mode
+ in all configurations.
+
``online_policy`` read-write: Set the basic policy used for
automatic zone selection when onlining memory
blocks without specifying a target zone.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 746cb7c08c64..76b813991bdc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -41,17 +41,83 @@
#include "internal.h"
#include "shuffle.h"
+enum {
+ MEMMAP_ON_MEMORY_DISABLE = 0,
+ MEMMAP_ON_MEMORY_ENABLE,
+ MEMMAP_ON_MEMORY_FORCE,
+};
+
+static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE;
+
+static inline unsigned long memory_block_memmap_size(void)
+{
+ return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
+}
+
+static inline unsigned long memory_block_memmap_on_memory_pages(void)
+{
+ unsigned long nr_pages = PFN_UP(memory_block_memmap_size());
+
+ /*
+ * In "forced" memmap_on_memory mode, we add extra pages to align the
+ * vmemmap size to cover full pageblocks. That way, we can add memory
+ * even if the vmemmap size is not properly aligned, however, we might waste
+ * memory.
+ */
+ if (memmap_mode == MEMMAP_ON_MEMORY_FORCE)
+ return pageblock_align(nr_pages);
+ return nr_pages;
+}
+
#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
/*
* memory_hotplug.memmap_on_memory parameter
*/
-static bool memmap_on_memory __ro_after_init;
-module_param(memmap_on_memory, bool, 0444);
-MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
+static int set_memmap_mode(const char *val, const struct kernel_param *kp)
+{
+ int ret, mode;
+ bool enabled;
+
+ if (sysfs_streq(val, "force") || sysfs_streq(val, "FORCE")) {
+ mode = MEMMAP_ON_MEMORY_FORCE;
+ } else {
+ ret = kstrtobool(val, &enabled);
+ if (ret < 0)
+ return ret;
+ if (enabled)
+ mode = MEMMAP_ON_MEMORY_ENABLE;
+ else
+ mode = MEMMAP_ON_MEMORY_DISABLE;
+ }
+ *((int *)kp->arg) = mode;
+ if (mode == MEMMAP_ON_MEMORY_FORCE) {
+ unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
+
+ pr_info_once("Memory hotplug will waste %ld pages in each memory block\n",
+ memmap_pages - PFN_UP(memory_block_memmap_size()));
+ }
+ return 0;
+}
+
+static int get_memmap_mode(char *buffer, const struct kernel_param *kp)
+{
+ if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE)
+ return sprintf(buffer, "force\n");
+ return param_get_bool(buffer, kp);
+}
+
+static const struct kernel_param_ops memmap_mode_ops = {
+ .set = set_memmap_mode,
+ .get = get_memmap_mode,
+};
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
+ "With value \"force\" it could result in memory wastage due "
+ "to memmap size limitations (Y/N/force)");
static inline bool mhp_memmap_on_memory(void)
{
- return memmap_on_memory;
+ return memmap_mode != MEMMAP_ON_MEMORY_DISABLE;
}
#else
static inline bool mhp_memmap_on_memory(void)
@@ -1247,11 +1313,6 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}
-static inline unsigned long memory_block_memmap_size(void)
-{
- return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page);
-}
-
#ifndef arch_supports_memmap_on_memory
static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
{
@@ -1267,7 +1328,7 @@ static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
static bool mhp_supports_memmap_on_memory(unsigned long size)
{
unsigned long vmemmap_size = memory_block_memmap_size();
- unsigned long remaining_size = size - vmemmap_size;
+ unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
/*
* Besides having arch support and the feature enabled at runtime, we
@@ -1295,10 +1356,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size)
* altmap as an alternative source of memory, and we do not exactly
* populate a single PMD.
*/
- return mhp_memmap_on_memory() &&
- size == memory_block_size_bytes() &&
- IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) &&
- arch_supports_memmap_on_memory(vmemmap_size);
+ if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+ return false;
+
+ /*
+ * Make sure the vmemmap allocation is fully contained
+ * so that we always allocate vmemmap memory from altmap area.
+ */
+ if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE))
+ return false;
+
+ /*
+ * start pfn should be pageblock_nr_pages aligned for correctly
+ * setting migrate types
+ */
+ if (!pageblock_aligned(memmap_pages))
+ return false;
+
+ if (memmap_pages == PHYS_PFN(memory_block_size_bytes()))
+ /* No effective hotplugged memory doesn't make sense. */
+ return false;
+
+ return arch_supports_memmap_on_memory(vmemmap_size);
}
/*
@@ -1311,7 +1390,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
{
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
- struct vmem_altmap mhp_altmap = {};
+ struct vmem_altmap mhp_altmap = {
+ .base_pfn = PHYS_PFN(res->start),
+ .end_pfn = PHYS_PFN(res->end),
+ };
struct memory_group *group = NULL;
u64 start, size;
bool new_node = false;
@@ -1356,8 +1438,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
*/
if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
if (mhp_supports_memmap_on_memory(size)) {
- mhp_altmap.free = PHYS_PFN(size);
- mhp_altmap.base_pfn = PHYS_PFN(start);
+ mhp_altmap.free = memory_block_memmap_on_memory_pages();
params.altmap = &mhp_altmap;
}
/* fallback to not using altmap */
@@ -1369,8 +1450,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
goto error;
/* create memory block devices after memory was added */
- ret = create_memory_block_devices(start, size, mhp_altmap.alloc,
- group);
+ ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
if (ret) {
arch_remove_memory(start, size, NULL);
goto error;
@@ -2096,6 +2176,8 @@ static int __ref try_remove_memory(u64 start, u64 size)
* right thing if we used vmem_altmap when hot-adding
* the range.
*/
+ mhp_altmap.base_pfn = PHYS_PFN(start);
+ mhp_altmap.free = nr_vmemmap_pages;
mhp_altmap.alloc = nr_vmemmap_pages;
altmap = &mhp_altmap;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
2023-08-01 4:41 ` [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Aneesh Kumar K.V
@ 2023-08-01 9:04 ` Michal Hocko
0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2023-08-01 9:04 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On Tue 01-08-23 10:11:13, Aneesh Kumar K.V wrote:
[...]
> + if (mode == MEMMAP_ON_MEMORY_FORCE) {
> + unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
unsigned long wastage = memmap_pages - PFN_UP(memory_block_memmap_size());
if (wastage)
pr_info_once("memmap_on_memory=force will waste %ld pages in each memory block %ld\n",
wastage, memory_block_size_bytes() >> PAGE_SHIFT);
would be more useful IMHO.
> +
> + pr_info_once("Memory hotplug will waste %ld pages in each memory block\n",
> + memmap_pages - PFN_UP(memory_block_memmap_size()));
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v7 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
` (3 preceding siblings ...)
2023-08-01 4:41 ` [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
` (2 subsequent siblings)
7 siblings, 0 replies; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB
We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/pgtable.h | 21 +++++++++++++++++++
.../platforms/pseries/hotplug-memory.c | 2 +-
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0497d13f5b4..938294c996dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_KEEP_MEMBLOCK
+ select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index a4893b17705a..33464e6d6431 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -161,6 +161,27 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
+{
+ if (!radix_enabled())
+ return false;
+ /*
+ * With 4K page size and 2M PMD_SIZE, we can align
+ * things better with memory block size value
+ * starting from 128MB. Hence align things with PMD_SIZE.
+ */
+ if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+ return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+ return true;
+}
+
#endif /* CONFIG_PPC64 */
#endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..4f3d6a2f9065 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -637,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = first_online_node;
/* Add the memory */
- rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+ rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);
if (rc) {
invalidate_lmb_associativity_index(lmb);
return rc;
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
` (4 preceding siblings ...)
2023-08-01 4:41 ` [PATCH v7 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 23:10 ` Verma, Vishal L
2023-08-01 4:41 ` [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
2023-08-01 9:07 ` [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Michal Hocko
7 siblings, 1 reply; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
drivers/base/memory.c | 27 +++++++++++++--------
include/linux/memory.h | 8 ++----
mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++----------------
3 files changed, 53 insertions(+), 37 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..8191709c9ad2 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -105,7 +105,8 @@ EXPORT_SYMBOL(unregister_memory_notifier);
static void memory_block_release(struct device *dev)
{
struct memory_block *mem = to_memory_block(dev);
-
+ /* Verify that the altmap is freed */
+ WARN_ON(mem->altmap);
kfree(mem);
}
@@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
- unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+ unsigned long nr_vmemmap_pages = 0;
struct zone *zone;
int ret;
@@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem)
* stage helps to keep accounting easier to follow - e.g vmemmaps
* belong to the same zone as the memory they backed.
*/
+ if (mem->altmap)
+ nr_vmemmap_pages = mem->altmap->free;
+
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
@@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
- unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+ unsigned long nr_vmemmap_pages = 0;
int ret;
if (!mem->zone)
@@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem)
* Unaccount before offlining, such that unpopulated zone and kthreads
* can properly be torn down in offline_pages().
*/
+ if (mem->altmap)
+ nr_vmemmap_pages = mem->altmap->free;
+
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-nr_vmemmap_pages);
@@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
#endif
static int add_memory_block(unsigned long block_id, unsigned long state,
- unsigned long nr_vmemmap_pages,
+ struct vmem_altmap *altmap,
struct memory_group *group)
{
struct memory_block *mem;
@@ -744,7 +751,7 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
mem->start_section_nr = block_id * sections_per_block;
mem->state = state;
mem->nid = NUMA_NO_NODE;
- mem->nr_vmemmap_pages = nr_vmemmap_pages;
+ mem->altmap = altmap;
INIT_LIST_HEAD(&mem->group_next);
#ifndef CONFIG_NUMA
@@ -783,14 +790,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr)
if (section_count == 0)
return 0;
return add_memory_block(memory_block_id(base_section_nr),
- MEM_ONLINE, 0, NULL);
+ MEM_ONLINE, NULL, NULL);
}
static int add_hotplug_memory_block(unsigned long block_id,
- unsigned long nr_vmemmap_pages,
+ struct vmem_altmap *altmap,
struct memory_group *group)
{
- return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
+ return add_memory_block(block_id, MEM_OFFLINE, altmap, group);
}
static void remove_memory_block(struct memory_block *memory)
@@ -818,7 +825,7 @@ static void remove_memory_block(struct memory_block *memory)
* Called under device_hotplug_lock.
*/
int create_memory_block_devices(unsigned long start, unsigned long size,
- unsigned long vmemmap_pages,
+ struct vmem_altmap *altmap,
struct memory_group *group)
{
const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
@@ -832,7 +839,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
return -EINVAL;
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
- ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
+ ret = add_hotplug_memory_block(block_id, altmap, group);
if (ret)
break;
}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..f53cfdaaaa41 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -77,11 +77,7 @@ struct memory_block {
*/
struct zone *zone;
struct device dev;
- /*
- * Number of vmemmap pages. These pages
- * lay at the beginning of the memory block.
- */
- unsigned long nr_vmemmap_pages;
+ struct vmem_altmap *altmap;
struct memory_group *group; /* group (if any) for this block */
struct list_head group_next; /* next block inside memory group */
#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
@@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
extern int register_memory_notifier(struct notifier_block *nb);
extern void unregister_memory_notifier(struct notifier_block *nb);
int create_memory_block_devices(unsigned long start, unsigned long size,
- unsigned long vmemmap_pages,
+ struct vmem_altmap *altmap,
struct memory_group *group);
void remove_memory_block_devices(unsigned long start, unsigned long size);
extern void memory_dev_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 76b813991bdc..1ce8ad04a980 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1439,7 +1439,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
if (mhp_supports_memmap_on_memory(size)) {
mhp_altmap.free = memory_block_memmap_on_memory_pages();
- params.altmap = &mhp_altmap;
+ params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+ if (!params.altmap)
+ goto error;
+
+ memcpy(params.altmap, &mhp_altmap, sizeof(mhp_altmap));
}
/* fallback to not using altmap */
}
@@ -1447,13 +1451,13 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
/* call arch's memory hotadd */
ret = arch_add_memory(nid, start, size, ¶ms);
if (ret < 0)
- goto error;
+ goto error_free;
/* create memory block devices after memory was added */
- ret = create_memory_block_devices(start, size, mhp_altmap.free, group);
+ ret = create_memory_block_devices(start, size, params.altmap, group);
if (ret) {
arch_remove_memory(start, size, NULL);
- goto error;
+ goto error_free;
}
if (new_node) {
@@ -1490,6 +1494,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
walk_memory_blocks(start, size, NULL, online_memory_block);
return ret;
+error_free:
+ kfree(params.altmap);
error:
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
memblock_remove(start, size);
@@ -2056,12 +2062,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
return 0;
}
-static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg)
+static int test_has_altmap_cb(struct memory_block *mem, void *arg)
{
+ struct memory_block **mem_ptr = (struct memory_block **)arg;
/*
- * If not set, continue with the next block.
+ * return the memblock if we have altmap
+ * and break callback.
*/
- return mem->nr_vmemmap_pages;
+ if (mem->altmap) {
+ *mem_ptr = mem;
+ return 1;
+ }
+ return 0;
}
static int check_cpu_on_node(int nid)
@@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
static int __ref try_remove_memory(u64 start, u64 size)
{
- struct vmem_altmap mhp_altmap = {};
- struct vmem_altmap *altmap = NULL;
- unsigned long nr_vmemmap_pages;
+ int ret;
+ struct memory_block *mem;
int rc = 0, nid = NUMA_NO_NODE;
+ struct vmem_altmap *altmap = NULL;
BUG_ON(check_hotplug_memory_range(start, size));
@@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
* the same granularity it was added - a single memory block.
*/
if (mhp_memmap_on_memory()) {
- nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
- get_nr_vmemmap_pages_cb);
- if (nr_vmemmap_pages) {
+ ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
+ if (ret) {
if (size != memory_block_size_bytes()) {
pr_warn("Refuse to remove %#llx - %#llx,"
"wrong granularity\n",
start, start + size);
return -EINVAL;
}
-
+ altmap = mem->altmap;
/*
- * Let remove_pmd_table->free_hugepage_table do the
- * right thing if we used vmem_altmap when hot-adding
- * the range.
+ * Mark altmap NULL so that we can add a debug
+ * check on memblock free.
*/
- mhp_altmap.base_pfn = PHYS_PFN(start);
- mhp_altmap.free = nr_vmemmap_pages;
- mhp_altmap.alloc = nr_vmemmap_pages;
- altmap = &mhp_altmap;
+ mem->altmap = NULL;
}
}
@@ -2196,6 +2203,12 @@ static int __ref try_remove_memory(u64 start, u64 size)
arch_remove_memory(start, size, altmap);
+ /* Verify that all vmemmap pages have actually been freed. */
+ if (altmap) {
+ WARN(altmap->alloc, "Altmap not fully unmapped");
+ kfree(altmap);
+ }
+
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
memblock_phys_free(start, size);
memblock_remove(start, size);
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
2023-08-01 4:41 ` [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
@ 2023-08-01 23:10 ` Verma, Vishal L
2023-08-02 4:47 ` Aneesh Kumar K V
0 siblings, 1 reply; 32+ messages in thread
From: Verma, Vishal L @ 2023-08-01 23:10 UTC (permalink / raw)
To: aneesh.kumar@linux.ibm.com, linux-mm@kvack.org,
mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org,
npiggin@gmail.com, christophe.leroy@csgroup.eu,
akpm@linux-foundation.org
Cc: Hocko, Michal, david@redhat.com, osalvador@suse.de
On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> drivers/base/memory.c | 27 +++++++++++++--------
> include/linux/memory.h | 8 ++----
> mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++----------------
> 3 files changed, 53 insertions(+), 37 deletions(-)
>
<snip>
> @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
>
> static int __ref try_remove_memory(u64 start, u64 size)
> {
> - struct vmem_altmap mhp_altmap = {};
> - struct vmem_altmap *altmap = NULL;
> - unsigned long nr_vmemmap_pages;
> + int ret;
Minor nit - there is already an 'int rc' below - just use that, or
rename it to 'ret' if that's better for consistency.
> + struct memory_block *mem;
> int rc = 0, nid = NUMA_NO_NODE;
> + struct vmem_altmap *altmap = NULL;
>
> BUG_ON(check_hotplug_memory_range(start, size));
>
> @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
> * the same granularity it was added - a single memory block.
> */
> if (mhp_memmap_on_memory()) {
> - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> - get_nr_vmemmap_pages_cb);
> - if (nr_vmemmap_pages) {
> + ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
> + if (ret) {
> if (size != memory_block_size_bytes()) {
> pr_warn("Refuse to remove %#llx - %#llx,"
> "wrong granularity\n",
> start, start + size);
> return -EINVAL;
> }
> -
> + altmap = mem->altmap;
> /*
> - * Let remove_pmd_table->free_hugepage_table do the
> - * right thing if we used vmem_altmap when hot-adding
> - * the range.
> + * Mark altmap NULL so that we can add a debug
> + * check on memblock free.
> */
> - mhp_altmap.base_pfn = PHYS_PFN(start);
> - mhp_altmap.free = nr_vmemmap_pages;
> - mhp_altmap.alloc = nr_vmemmap_pages;
> - altmap = &mhp_altmap;
> + mem->altmap = NULL;
> }
> }
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
2023-08-01 23:10 ` Verma, Vishal L
@ 2023-08-02 4:47 ` Aneesh Kumar K V
0 siblings, 0 replies; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-02 4:47 UTC (permalink / raw)
To: Verma, Vishal L, linux-mm@kvack.org, mpe@ellerman.id.au,
linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com,
christophe.leroy@csgroup.eu, akpm@linux-foundation.org
Cc: Hocko, Michal, david@redhat.com, osalvador@suse.de
On 8/2/23 4:40 AM, Verma, Vishal L wrote:
> On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>> computing them again when we remove a memory block, embed vmem_altmap
>> details in struct memory_block if we are using memmap on memory block
>> feature.
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>> drivers/base/memory.c | 27 +++++++++++++--------
>> include/linux/memory.h | 8 ++----
>> mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++----------------
>> 3 files changed, 53 insertions(+), 37 deletions(-)
>>
> <snip>
>
>> @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
>>
>> static int __ref try_remove_memory(u64 start, u64 size)
>> {
>> - struct vmem_altmap mhp_altmap = {};
>> - struct vmem_altmap *altmap = NULL;
>> - unsigned long nr_vmemmap_pages;
>> + int ret;
>
> Minor nit - there is already an 'int rc' below - just use that, or
> rename it to 'ret' if that's better for consistency.
>
I reused the existing rc variable.
>> + struct memory_block *mem;
>> int rc = 0, nid = NUMA_NO_NODE;
>> + struct vmem_altmap *altmap = NULL;
>>
>> BUG_ON(check_hotplug_memory_range(start, size));
>>
>> @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
>> * the same granularity it was added - a single memory block.
>> */
>> if (mhp_memmap_on_memory()) {
>> - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>> - get_nr_vmemmap_pages_cb);
>> - if (nr_vmemmap_pages) {
>> + ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
>> + if (ret) {
>> if (size != memory_block_size_bytes()) {
>> pr_warn("Refuse to remove %#llx - %#llx,"
>> "wrong granularity\n",
>> start, start + size);
>> return -EINVAL;
>> }
>> -
>> + altmap = mem->altmap;
>> /*
>> - * Let remove_pmd_table->free_hugepage_table do the
>> - * right thing if we used vmem_altmap when hot-adding
>> - * the range.
>> + * Mark altmap NULL so that we can add a debug
>> + * check on memblock free.
>> */
>> - mhp_altmap.base_pfn = PHYS_PFN(start);
>> - mhp_altmap.free = nr_vmemmap_pages;
>> - mhp_altmap.alloc = nr_vmemmap_pages;
>> - altmap = &mhp_altmap;
>> + mem->altmap = NULL;
>> }
>> }
>>
Thank you for taking the time to review the patch.
-aneesh
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
` (5 preceding siblings ...)
2023-08-01 4:41 ` [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 8:58 ` Michal Hocko
2023-08-01 9:07 ` [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Michal Hocko
7 siblings, 1 reply; 32+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Vishal Verma, David Hildenbrand, Michal Hocko, Aneesh Kumar K.V,
Oscar Salvador
Allow updating memmap_on_memory mode after the kernel boot. Memory
hotplug done after the mode update will use the new mmemap_on_memory
value.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/memory_hotplug.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1ce8ad04a980..d282664f558e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -89,7 +89,10 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp)
else
mode = MEMMAP_ON_MEMORY_DISABLE;
}
+ /* Avoid changing memmap mode during hotplug. */
+ get_online_mems();
*((int *)kp->arg) = mode;
+ put_online_mems();
if (mode == MEMMAP_ON_MEMORY_FORCE) {
unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
@@ -110,7 +113,7 @@ static const struct kernel_param_ops memmap_mode_ops = {
.set = set_memmap_mode,
.get = get_memmap_mode,
};
-module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444);
+module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0644);
MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n"
"With value \"force\" it could result in memory wastage due "
"to memmap size limitations (Y/N/force)");
@@ -2172,22 +2175,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
* We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
* the same granularity it was added - a single memory block.
*/
- if (mhp_memmap_on_memory()) {
- ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
- if (ret) {
- if (size != memory_block_size_bytes()) {
- pr_warn("Refuse to remove %#llx - %#llx,"
- "wrong granularity\n",
- start, start + size);
- return -EINVAL;
- }
- altmap = mem->altmap;
- /*
- * Mark altmap NULL so that we can add a debug
- * check on memblock free.
- */
- mem->altmap = NULL;
+ ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
+ if (ret) {
+ if (size != memory_block_size_bytes()) {
+ pr_warn("Refuse to remove %#llx - %#llx,"
+ "wrong granularity\n",
+ start, start + size);
+ return -EINVAL;
}
+ altmap = mem->altmap;
+ /*
+ * Mark altmap NULL so that we can add a debug
+ * check on memblock free.
+ */
+ mem->altmap = NULL;
}
/* remove memmap entry */
--
2.41.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-01 4:41 ` [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
@ 2023-08-01 8:58 ` Michal Hocko
2023-08-01 9:28 ` Aneesh Kumar K V
0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2023-08-01 8:58 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> Allow updating memmap_on_memory mode after the kernel boot. Memory
> hotplug done after the mode update will use the new mmemap_on_memory
> value.
Well, this is a user space kABI extension and as such you should spend
more words about the usecase. Why we could live with this static and now
need dynamic?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-01 8:58 ` Michal Hocko
@ 2023-08-01 9:28 ` Aneesh Kumar K V
2023-08-01 10:50 ` Michal Hocko
0 siblings, 1 reply; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-01 9:28 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On 8/1/23 2:28 PM, Michal Hocko wrote:
> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>> hotplug done after the mode update will use the new mmemap_on_memory
>> value.
>
> Well, this is a user space kABI extension and as such you should spend
> more words about the usecase. Why we could live with this static and now
> need dynamic?
>
This enables easy testing of memmap_on_memory feature without a kernel reboot.
I also expect people wanting to use that when they find dax kmem memory online
failing because of struct page allocation failures[1]. User could reboot back with
memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
the feature much more useful.
-aneesh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-01 9:28 ` Aneesh Kumar K V
@ 2023-08-01 10:50 ` Michal Hocko
2023-08-02 4:45 ` Aneesh Kumar K V
0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2023-08-01 10:50 UTC (permalink / raw)
To: Aneesh Kumar K V
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> On 8/1/23 2:28 PM, Michal Hocko wrote:
> > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> >> Allow updating memmap_on_memory mode after the kernel boot. Memory
> >> hotplug done after the mode update will use the new mmemap_on_memory
> >> value.
> >
> > Well, this is a user space kABI extension and as such you should spend
> > more words about the usecase. Why we could live with this static and now
> > need dynamic?
> >
>
> This enables easy testing of memmap_on_memory feature without a kernel reboot.
Testing alone is rather weak argument to be honest.
> I also expect people wanting to use that when they find dax kmem memory online
> failing because of struct page allocation failures[1]. User could reboot back with
> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> the feature much more useful.
Sure it can be useful but that holds for any feature, right. The main
question is whether this is worth maintaing. The current implementation
seems rather trivial which is an argument to have it but are there any
risks long term? Have you evaluated a potential long term maintenance
cost? There is no easy way to go back and disable it later on without
breaking some userspace.
All that should be in the changelog!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-01 10:50 ` Michal Hocko
@ 2023-08-02 4:45 ` Aneesh Kumar K V
2023-08-02 15:50 ` Michal Hocko
0 siblings, 1 reply; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-02 4:45 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On 8/1/23 4:20 PM, Michal Hocko wrote:
> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>> value.
>>>
>>> Well, this is a user space kABI extension and as such you should spend
>>> more words about the usecase. Why we could live with this static and now
>>> need dynamic?
>>>
>>
>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>
> Testing alone is rather weak argument to be honest.
>
>> I also expect people wanting to use that when they find dax kmem memory online
>> failing because of struct page allocation failures[1]. User could reboot back with
>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>> the feature much more useful.
>
> Sure it can be useful but that holds for any feature, right. The main
> question is whether this is worth maintaing. The current implementation
> seems rather trivial which is an argument to have it but are there any
> risks long term? Have you evaluated a potential long term maintenance
> cost? There is no easy way to go back and disable it later on without
> breaking some userspace.
>
> All that should be in the changelog!
I updated it as below.
mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
Allow updating memmap_on_memory mode after the kernel boot. Memory
hotplug done after the mode update will use the new mmemap_on_memory
value.
It is now possible to test the memmap_on_memory feature easily without
the need for a kernel reboot. Additionally, for those encountering
struct page allocation failures while using dax kmem memory online, this
feature may prove useful. Instead of rebooting with the
memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
which greatly enhances its usefulness. Making the necessary changes to
support runtime updates is a simple change that should not affect the
addition of new features or result in long-term maintenance problems.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 4:45 ` Aneesh Kumar K V
@ 2023-08-02 15:50 ` Michal Hocko
2023-08-02 15:54 ` David Hildenbrand
0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2023-08-02 15:50 UTC (permalink / raw)
To: Aneesh Kumar K V
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> On 8/1/23 4:20 PM, Michal Hocko wrote:
> > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> >> On 8/1/23 2:28 PM, Michal Hocko wrote:
> >>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> >>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
> >>>> hotplug done after the mode update will use the new mmemap_on_memory
> >>>> value.
> >>>
> >>> Well, this is a user space kABI extension and as such you should spend
> >>> more words about the usecase. Why we could live with this static and now
> >>> need dynamic?
> >>>
> >>
> >> This enables easy testing of memmap_on_memory feature without a kernel reboot.
> >
> > Testing alone is rather weak argument to be honest.
> >
> >> I also expect people wanting to use that when they find dax kmem memory online
> >> failing because of struct page allocation failures[1]. User could reboot back with
> >> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> >> the feature much more useful.
> >
> > Sure it can be useful but that holds for any feature, right. The main
> > question is whether this is worth maintaing. The current implementation
> > seems rather trivial which is an argument to have it but are there any
> > risks long term? Have you evaluated a potential long term maintenance
> > cost? There is no easy way to go back and disable it later on without
> > breaking some userspace.
> >
> > All that should be in the changelog!
>
> I updated it as below.
>
> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>
> Allow updating memmap_on_memory mode after the kernel boot. Memory
> hotplug done after the mode update will use the new mmemap_on_memory
> value.
>
> It is now possible to test the memmap_on_memory feature easily without
> the need for a kernel reboot. Additionally, for those encountering
> struct page allocation failures while using dax kmem memory online, this
> feature may prove useful. Instead of rebooting with the
> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> which greatly enhances its usefulness.
I do not really see a solid argument why rebooting is really a problem
TBH. Also is the global policy knob really the right fit for existing
hotplug usecases? In other words, if we really want to make
memmap_on_memory more flexible would it make more sense to have it per
memory block property instead (the global knob being the default if
admin doesn't specify it differently).
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:50 ` Michal Hocko
@ 2023-08-02 15:54 ` David Hildenbrand
2023-08-02 15:57 ` Aneesh Kumar K V
2023-08-02 16:59 ` Michal Hocko
0 siblings, 2 replies; 32+ messages in thread
From: David Hildenbrand @ 2023-08-02 15:54 UTC (permalink / raw)
To: Michal Hocko, Aneesh Kumar K V
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 02.08.23 17:50, Michal Hocko wrote:
> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
>> On 8/1/23 4:20 PM, Michal Hocko wrote:
>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>>>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>>> value.
>>>>>
>>>>> Well, this is a user space kABI extension and as such you should spend
>>>>> more words about the usecase. Why we could live with this static and now
>>>>> need dynamic?
>>>>>
>>>>
>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>>>
>>> Testing alone is rather weak argument to be honest.
>>>
>>>> I also expect people wanting to use that when they find dax kmem memory online
>>>> failing because of struct page allocation failures[1]. User could reboot back with
>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>>>> the feature much more useful.
>>>
>>> Sure it can be useful but that holds for any feature, right. The main
>>> question is whether this is worth maintaing. The current implementation
>>> seems rather trivial which is an argument to have it but are there any
>>> risks long term? Have you evaluated a potential long term maintenance
>>> cost? There is no easy way to go back and disable it later on without
>>> breaking some userspace.
>>>
>>> All that should be in the changelog!
>>
>> I updated it as below.
>>
>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>>
>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>> hotplug done after the mode update will use the new mmemap_on_memory
>> value.
>>
>> It is now possible to test the memmap_on_memory feature easily without
>> the need for a kernel reboot. Additionally, for those encountering
>> struct page allocation failures while using dax kmem memory online, this
>> feature may prove useful. Instead of rebooting with the
>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
>> which greatly enhances its usefulness.
>
>
> I do not really see a solid argument why rebooting is really a problem
> TBH. Also is the global policy knob really the right fit for existing
> hotplug usecases? In other words, if we really want to make
> memmap_on_memory more flexible would it make more sense to have it per
> memory block property instead (the global knob being the default if
> admin doesn't specify it differently).
Per memory block isn't possible, due to the creation order. Also, I
think it's not the right approach.
I thought about driver toggles. At least for dax/kmem people are looking
into that:
https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
Where that can also be toggled per device.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:54 ` David Hildenbrand
@ 2023-08-02 15:57 ` Aneesh Kumar K V
2023-08-02 16:02 ` Verma, Vishal L
2023-08-02 16:59 ` Michal Hocko
1 sibling, 1 reply; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-02 15:57 UTC (permalink / raw)
To: David Hildenbrand, Michal Hocko
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 8/2/23 9:24 PM, David Hildenbrand wrote:
> On 02.08.23 17:50, Michal Hocko wrote:
>> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
>>> On 8/1/23 4:20 PM, Michal Hocko wrote:
>>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>>>>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>>>> value.
>>>>>>
>>>>>> Well, this is a user space kABI extension and as such you should spend
>>>>>> more words about the usecase. Why we could live with this static and now
>>>>>> need dynamic?
>>>>>>
>>>>>
>>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>>>>
>>>> Testing alone is rather weak argument to be honest.
>>>>
>>>>> I also expect people wanting to use that when they find dax kmem memory online
>>>>> failing because of struct page allocation failures[1]. User could reboot back with
>>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>>>>> the feature much more useful.
>>>>
>>>> Sure it can be useful but that holds for any feature, right. The main
>>>> question is whether this is worth maintaing. The current implementation
>>>> seems rather trivial which is an argument to have it but are there any
>>>> risks long term? Have you evaluated a potential long term maintenance
>>>> cost? There is no easy way to go back and disable it later on without
>>>> breaking some userspace.
>>>>
>>>> All that should be in the changelog!
>>>
>>> I updated it as below.
>>>
>>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>>>
>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>> hotplug done after the mode update will use the new mmemap_on_memory
>>> value.
>>>
>>> It is now possible to test the memmap_on_memory feature easily without
>>> the need for a kernel reboot. Additionally, for those encountering
>>> struct page allocation failures while using dax kmem memory online, this
>>> feature may prove useful. Instead of rebooting with the
>>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
>>> which greatly enhances its usefulness.
>>
>>
>> I do not really see a solid argument why rebooting is really a problem
>> TBH. Also is the global policy knob really the right fit for existing
>> hotplug usecases? In other words, if we really want to make
>> memmap_on_memory more flexible would it make more sense to have it per
>> memory block property instead (the global knob being the default if
>> admin doesn't specify it differently).
>
> Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach.
>
> I thought about driver toggles. At least for dax/kmem people are looking into that:
>
> https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
>
> Where that can also be toggled per device.
>
That still is dependent on the global memmap_on_memory enabled right? We could make the dax facility independent of the
global feature toggle?
-aneesh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:57 ` Aneesh Kumar K V
@ 2023-08-02 16:02 ` Verma, Vishal L
0 siblings, 0 replies; 32+ messages in thread
From: Verma, Vishal L @ 2023-08-02 16:02 UTC (permalink / raw)
To: aneesh.kumar@linux.ibm.com, Hocko, Michal, david@redhat.com
Cc: npiggin@gmail.com, linux-mm@kvack.org, akpm@linux-foundation.org,
linuxppc-dev@lists.ozlabs.org, osalvador@suse.de
On Wed, 2023-08-02 at 21:27 +0530, Aneesh Kumar K V wrote:
> On 8/2/23 9:24 PM, David Hildenbrand wrote:
> > On 02.08.23 17:50, Michal Hocko wrote:
> > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > > > > > value.
> > > > > > >
> > > > > > > Well, this is a user space kABI extension and as such you should spend
> > > > > > > more words about the usecase. Why we could live with this static and now
> > > > > > > need dynamic?
> > > > > > >
> > > > > >
> > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot.
> > > > >
> > > > > Testing alone is rather weak argument to be honest.
> > > > >
> > > > > > I also expect people wanting to use that when they find dax kmem memory online
> > > > > > failing because of struct page allocation failures[1]. User could reboot back with
> > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> > > > > > the feature much more useful.
> > > > >
> > > > > Sure it can be useful but that holds for any feature, right. The main
> > > > > question is whether this is worth maintaing. The current implementation
> > > > > seems rather trivial which is an argument to have it but are there any
> > > > > risks long term? Have you evaluated a potential long term maintenance
> > > > > cost? There is no easy way to go back and disable it later on without
> > > > > breaking some userspace.
> > > > >
> > > > > All that should be in the changelog!
> > > >
> > > > I updated it as below.
> > > >
> > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > > >
> > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > value.
> > > >
> > > > It is now possible to test the memmap_on_memory feature easily without
> > > > the need for a kernel reboot. Additionally, for those encountering
> > > > struct page allocation failures while using dax kmem memory online, this
> > > > feature may prove useful. Instead of rebooting with the
> > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > > which greatly enhances its usefulness.
> > >
> > >
> > > I do not really see a solid argument why rebooting is really a problem
> > > TBH. Also is the global policy knob really the right fit for existing
> > > hotplug usecases? In other words, if we really want to make
> > > memmap_on_memory more flexible would it make more sense to have it per
> > > memory block property instead (the global knob being the default if
> > > admin doesn't specify it differently).
> >
> > Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach.
> >
> > I thought about driver toggles. At least for dax/kmem people are looking into that:
> >
> > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
> >
> > Where that can also be toggled per device.
> >
>
> That still is dependent on the global memmap_on_memory enabled right? We could make the dax facility independent of the
> global feature toggle?
Correct - the latest version of those David linked does depend on the
global memmap_on_memory param. Since kmem's behavior for dax devices is
set up to be turned on / off dynamically via sysfs, it would be nice to
have a similar facility for this flag.
I did try the making dax independent of memmap_on_memory approach in my
first posting:
https://lore.kernel.org/nvdimm/20230613-vv-kmem_memmap-v1-1-f6de9c6af2c6@intel.com/
We can certainly revisit that if it looks more suitable.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:54 ` David Hildenbrand
2023-08-02 15:57 ` Aneesh Kumar K V
@ 2023-08-02 16:59 ` Michal Hocko
2023-08-03 8:52 ` Michal Hocko
1 sibling, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2023-08-02 16:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: Vishal Verma, Aneesh Kumar K V, npiggin, linux-mm, akpm,
linuxppc-dev, Oscar Salvador
On Wed 02-08-23 17:54:04, David Hildenbrand wrote:
> On 02.08.23 17:50, Michal Hocko wrote:
> > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > > > > value.
> > > > > >
> > > > > > Well, this is a user space kABI extension and as such you should spend
> > > > > > more words about the usecase. Why we could live with this static and now
> > > > > > need dynamic?
> > > > > >
> > > > >
> > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot.
> > > >
> > > > Testing alone is rather weak argument to be honest.
> > > >
> > > > > I also expect people wanting to use that when they find dax kmem memory online
> > > > > failing because of struct page allocation failures[1]. User could reboot back with
> > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> > > > > the feature much more useful.
> > > >
> > > > Sure it can be useful but that holds for any feature, right. The main
> > > > question is whether this is worth maintaing. The current implementation
> > > > seems rather trivial which is an argument to have it but are there any
> > > > risks long term? Have you evaluated a potential long term maintenance
> > > > cost? There is no easy way to go back and disable it later on without
> > > > breaking some userspace.
> > > >
> > > > All that should be in the changelog!
> > >
> > > I updated it as below.
> > >
> > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > >
> > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > hotplug done after the mode update will use the new mmemap_on_memory
> > > value.
> > >
> > > It is now possible to test the memmap_on_memory feature easily without
> > > the need for a kernel reboot. Additionally, for those encountering
> > > struct page allocation failures while using dax kmem memory online, this
> > > feature may prove useful. Instead of rebooting with the
> > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > which greatly enhances its usefulness.
> >
> >
> > I do not really see a solid argument why rebooting is really a problem
> > TBH. Also is the global policy knob really the right fit for existing
> > hotplug usecases? In other words, if we really want to make
> > memmap_on_memory more flexible would it make more sense to have it per
> > memory block property instead (the global knob being the default if
> > admin doesn't specify it differently).
>
> Per memory block isn't possible, due to the creation order.
I am not sure I follow. Could you elaborate more?
> Also, I think it's not the right approach.
>
> I thought about driver toggles. At least for dax/kmem people are looking
> into that:
>
> https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
>
> Where that can also be toggled per device.
Per device flag makes sense to me as well. But what we should do about
hotplugged memory with a backing device (like regular RAM). I can see
some reason to have large hotplugged memory ranges to be self vmemap
hosted while smaller blocks to be backed by external vmemmaps.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 16:59 ` Michal Hocko
@ 2023-08-03 8:52 ` Michal Hocko
2023-08-03 9:24 ` David Hildenbrand
0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2023-08-03 8:52 UTC (permalink / raw)
To: David Hildenbrand
Cc: Vishal Verma, Aneesh Kumar K V, npiggin, linux-mm, akpm,
linuxppc-dev, Oscar Salvador
On Wed 02-08-23 18:59:04, Michal Hocko wrote:
> On Wed 02-08-23 17:54:04, David Hildenbrand wrote:
> > On 02.08.23 17:50, Michal Hocko wrote:
> > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > > > > > value.
> > > > > > >
> > > > > > > Well, this is a user space kABI extension and as such you should spend
> > > > > > > more words about the usecase. Why we could live with this static and now
> > > > > > > need dynamic?
> > > > > > >
> > > > > >
> > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot.
> > > > >
> > > > > Testing alone is rather weak argument to be honest.
> > > > >
> > > > > > I also expect people wanting to use that when they find dax kmem memory online
> > > > > > failing because of struct page allocation failures[1]. User could reboot back with
> > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> > > > > > the feature much more useful.
> > > > >
> > > > > Sure it can be useful but that holds for any feature, right. The main
> > > > > question is whether this is worth maintaing. The current implementation
> > > > > seems rather trivial which is an argument to have it but are there any
> > > > > risks long term? Have you evaluated a potential long term maintenance
> > > > > cost? There is no easy way to go back and disable it later on without
> > > > > breaking some userspace.
> > > > >
> > > > > All that should be in the changelog!
> > > >
> > > > I updated it as below.
> > > >
> > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > > >
> > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > value.
> > > >
> > > > It is now possible to test the memmap_on_memory feature easily without
> > > > the need for a kernel reboot. Additionally, for those encountering
> > > > struct page allocation failures while using dax kmem memory online, this
> > > > feature may prove useful. Instead of rebooting with the
> > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > > which greatly enhances its usefulness.
> > >
> > >
> > > I do not really see a solid argument why rebooting is really a problem
> > > TBH. Also is the global policy knob really the right fit for existing
> > > hotplug usecases? In other words, if we really want to make
> > > memmap_on_memory more flexible would it make more sense to have it per
> > > memory block property instead (the global knob being the default if
> > > admin doesn't specify it differently).
> >
> > Per memory block isn't possible, due to the creation order.
>
> I am not sure I follow. Could you elaborate more?
Must have been a tired brain. Now I see what you mean of course. vmemmap
is allocated at the time the range is registered and therefore memory
block sysfs created. So you are right that it is too late in some sense
but I still believe that this would be doable. The memmap_on_memory file
would be readable only when the block is offline and it would reallocate
vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
walkers?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 8:52 ` Michal Hocko
@ 2023-08-03 9:24 ` David Hildenbrand
2023-08-03 11:30 ` Michal Hocko
0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2023-08-03 9:24 UTC (permalink / raw)
To: Michal Hocko
Cc: Vishal Verma, Aneesh Kumar K V, npiggin, linux-mm, akpm,
linuxppc-dev, Oscar Salvador
On 03.08.23 10:52, Michal Hocko wrote:
> On Wed 02-08-23 18:59:04, Michal Hocko wrote:
>> On Wed 02-08-23 17:54:04, David Hildenbrand wrote:
>>> On 02.08.23 17:50, Michal Hocko wrote:
>>>> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
>>>>> On 8/1/23 4:20 PM, Michal Hocko wrote:
>>>>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>>>>>>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>>>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>>>>>> value.
>>>>>>>>
>>>>>>>> Well, this is a user space kABI extension and as such you should spend
>>>>>>>> more words about the usecase. Why we could live with this static and now
>>>>>>>> need dynamic?
>>>>>>>>
>>>>>>>
>>>>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>>>>>>
>>>>>> Testing alone is rather weak argument to be honest.
>>>>>>
>>>>>>> I also expect people wanting to use that when they find dax kmem memory online
>>>>>>> failing because of struct page allocation failures[1]. User could reboot back with
>>>>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>>>>>>> the feature much more useful.
>>>>>>
>>>>>> Sure it can be useful but that holds for any feature, right. The main
>>>>>> question is whether this is worth maintaing. The current implementation
>>>>>> seems rather trivial which is an argument to have it but are there any
>>>>>> risks long term? Have you evaluated a potential long term maintenance
>>>>>> cost? There is no easy way to go back and disable it later on without
>>>>>> breaking some userspace.
>>>>>>
>>>>>> All that should be in the changelog!
>>>>>
>>>>> I updated it as below.
>>>>>
>>>>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>>>>>
>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>> value.
>>>>>
>>>>> It is now possible to test the memmap_on_memory feature easily without
>>>>> the need for a kernel reboot. Additionally, for those encountering
>>>>> struct page allocation failures while using dax kmem memory online, this
>>>>> feature may prove useful. Instead of rebooting with the
>>>>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
>>>>> which greatly enhances its usefulness.
>>>>
>>>>
>>>> I do not really see a solid argument why rebooting is really a problem
>>>> TBH. Also is the global policy knob really the right fit for existing
>>>> hotplug usecases? In other words, if we really want to make
>>>> memmap_on_memory more flexible would it make more sense to have it per
>>>> memory block property instead (the global knob being the default if
>>>> admin doesn't specify it differently).
>>>
>>> Per memory block isn't possible, due to the creation order.
>>
>> I am not sure I follow. Could you elaborate more?
>
> Must have been a tired brain. Now I see what you mean of course. vmemmap
> is allocated at the time the range is registered and therefore memory
> block sysfs created. So you are right that it is too late in some sense
> but I still believe that this would be doable. The memmap_on_memory file
Exactly.
> would be readable only when the block is offline and it would reallocate
> vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
> walkers?
The question is: is it of any real value such that it would be worth the
cost and risk?
One of the primary reasons for memmap_on_memory is that *memory hotplug*
succeeds even in low-memory situations (including, low on ZONE_NORMAL
situations). So you want that behavior already when hotplugging such
devices. While there might be value to relocate it later, I'm not sure
if that is really worth it, and it does not solve the main use case.
For dax/kmem memory hotplug this is controlled by user space, so user
space can easily configure it. For DIMMs it's just a hotplug event that
triggers all that in the kernel, and one has to plan ahead (e.g., set
the global kernel parameter).
Besides, devices like virtio-mem really cannot universally support
memmap_on_memory.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 9:24 ` David Hildenbrand
@ 2023-08-03 11:30 ` Michal Hocko
2023-08-05 14:24 ` Aneesh Kumar K V
2023-08-07 12:54 ` David Hildenbrand
0 siblings, 2 replies; 32+ messages in thread
From: Michal Hocko @ 2023-08-03 11:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: Vishal Verma, Aneesh Kumar K V, npiggin, linux-mm, akpm,
linuxppc-dev, Oscar Salvador
On Thu 03-08-23 11:24:08, David Hildenbrand wrote:
[...]
> > would be readable only when the block is offline and it would reallocate
> > vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
> > walkers?
>
> The question is: is it of any real value such that it would be worth the
> cost and risk?
>
>
> One of the primary reasons for memmap_on_memory is that *memory hotplug*
> succeeds even in low-memory situations (including, low on ZONE_NORMAL
> situations).
One usecase I would have in mind is a mix of smaller and larger memory
blocks. For larger ones you want to have memmap_on_memory in general
because they do not eat memory from outside but small(er) ones might be
more tricky because now you can add a lot of blocks that would be
internally fragmented to prevent larger allocations to form.
> So you want that behavior already when hotplugging such
> devices. While there might be value to relocate it later, I'm not sure if
> that is really worth it, and it does not solve the main use case.
Is it worth it? TBH I am not sure same as I am not sure the global
default should be writable after boot. If we want to make it more
dynamic we should however talk about the proper layer this is
implemented on.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 11:30 ` Michal Hocko
@ 2023-08-05 14:24 ` Aneesh Kumar K V
2023-08-07 12:27 ` Michal Hocko
2023-08-07 12:54 ` David Hildenbrand
1 sibling, 1 reply; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-05 14:24 UTC (permalink / raw)
To: Michal Hocko, David Hildenbrand
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 8/3/23 5:00 PM, Michal Hocko wrote:
> On Thu 03-08-23 11:24:08, David Hildenbrand wrote:
> [...]
>>> would be readable only when the block is offline and it would reallocate
>>> vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
>>> walkers?
>>
>> The question is: is it of any real value such that it would be worth the
>> cost and risk?
>>
>>
>> One of the primary reasons for memmap_on_memory is that *memory hotplug*
>> succeeds even in low-memory situations (including, low on ZONE_NORMAL
>> situations).
>
> One usecase I would have in mind is a mix of smaller and larger memory
> blocks. For larger ones you want to have memmap_on_memory in general
> because they do not eat memory from outside but small(er) ones might be
> more tricky because now you can add a lot of blocks that would be
> internally fragmented to prevent larger allocations to form.
>
I guess that closely aligns with device memory and being able to add
device memory via dax/kmem using a larger memory block size.
We can then make sure we enable the memmap_on_memory feature
at the device level for this device memory. Do you see a need for
firmware-managed memory to be hotplugged in with different memory block sizes?
>> So you want that behavior already when hotplugging such
>> devices. While there might be value to relocate it later, I'm not sure if
>> that is really worth it, and it does not solve the main use case.
>
> Is it worth it? TBH I am not sure same as I am not sure the global
> default should be writable after boot. If we want to make it more
> dynamic we should however talk about the proper layer this is
> implemented on.
-aneesh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-05 14:24 ` Aneesh Kumar K V
@ 2023-08-07 12:27 ` Michal Hocko
2023-08-07 12:41 ` David Hildenbrand
0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2023-08-07 12:27 UTC (permalink / raw)
To: Aneesh Kumar K V
Cc: David Hildenbrand, npiggin, linux-mm, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
[...]
> Do you see a need for firmware-managed memory to be hotplugged in with
> different memory block sizes?
In short. Yes. Slightly longer, a fixed size memory block semantic is
just standing in the way and I would even argue it is actively harmful.
Just have a look at ridicously small memory blocks on ppc. I do
understand that it makes some sense to be aligned to the memory model
(so sparsmem section aligned). In an ideal world, memory hotplug v2
interface (if we ever go that path) should be physical memory range based.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 12:27 ` Michal Hocko
@ 2023-08-07 12:41 ` David Hildenbrand
2023-08-07 18:35 ` David Hildenbrand
0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2023-08-07 12:41 UTC (permalink / raw)
To: Michal Hocko, Aneesh Kumar K V
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 07.08.23 14:27, Michal Hocko wrote:
> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
> [...]
>> Do you see a need for firmware-managed memory to be hotplugged in with
>> different memory block sizes?
>
> In short. Yes. Slightly longer, a fixed size memory block semantic is
> just standing in the way and I would even argue it is actively harmful.
> Just have a look at ridicously small memory blocks on ppc. I do
> understand that it makes some sense to be aligned to the memory model
> (so sparsmem section aligned). In an ideal world, memory hotplug v2
> interface (if we ever go that path) should be physical memory range based.
Yes, we discussed that a couple of times already (and so far nobody
cared to implement any of that).
Small memory block sizes are very beneficial for use cases like PPC
dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
environments where you might want to add/remove memory in very small
granularity. I don't see that changing any time soon. Rather the opposite.
Small memory block sizes are suboptimal for large machines where you
might never end up removing such memory (boot memory), or when dealing
with devices that can only be removed in one piece (DIMM/kmem). We
already have memory groups in place to model that.
For the latter it might be beneficial to have memory blocks of larger
size that correspond to the physical memory ranges. That might also make
a memmap (re-)configuration easier.
Not sure if that is standing in any way or is harmful, though.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 12:41 ` David Hildenbrand
@ 2023-08-07 18:35 ` David Hildenbrand
2023-08-08 6:09 ` Aneesh Kumar K V
2023-08-08 6:29 ` Aneesh Kumar K V
0 siblings, 2 replies; 32+ messages in thread
From: David Hildenbrand @ 2023-08-07 18:35 UTC (permalink / raw)
To: Michal Hocko, Aneesh Kumar K V
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 07.08.23 14:41, David Hildenbrand wrote:
> On 07.08.23 14:27, Michal Hocko wrote:
>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>> [...]
>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>> different memory block sizes?
>>
>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>> just standing in the way and I would even argue it is actively harmful.
>> Just have a look at ridicously small memory blocks on ppc. I do
>> understand that it makes some sense to be aligned to the memory model
>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>> interface (if we ever go that path) should be physical memory range based.
>
> Yes, we discussed that a couple of times already (and so far nobody
> cared to implement any of that).
>
> Small memory block sizes are very beneficial for use cases like PPC
> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
> environments where you might want to add/remove memory in very small
> granularity. I don't see that changing any time soon. Rather the opposite.
>
> Small memory block sizes are suboptimal for large machines where you
> might never end up removing such memory (boot memory), or when dealing
> with devices that can only be removed in one piece (DIMM/kmem). We
> already have memory groups in place to model that.
>
> For the latter it might be beneficial to have memory blocks of larger
> size that correspond to the physical memory ranges. That might also make
> a memmap (re-)configuration easier.
>
> Not sure if that is standing in any way or is harmful, though.
>
Just because I thought of something right now, I'll share it, maybe it
makes sense.
Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled
by the admin:
1) We create a single altmap at the beginning of the memory
2) We create the existing fixed-size memory block devices, but flag them
to be part of a single "altmap" unit.
3) Whenever we trigger offlining of a single such memory block, we
offline *all* memory blocks belonging to that altmap, essentially
using a single offline_pages() call and updating all memory block
states accordingly.
4) Whenever we trigger onlining of a single such memory block, we
online *all* memory blocks belonging to that altmap, using a single
online_pages() call.
5) We fail remove_memory() if it doesn't cover the same (altmap) range.
So we can avoid having a memory block v2 (and all that comes with that
...) for now and still get that altmap stuff sorted out. As that altmap
behavior can be controlled by the admin, we should be fine for now.
I think all memory notifiers should already be able to handle bigger
granularity, but it would be easy to check. Some internal things might
require a bit of tweaking.
Just a thought.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 18:35 ` David Hildenbrand
@ 2023-08-08 6:09 ` Aneesh Kumar K V
2023-08-08 6:29 ` Aneesh Kumar K V
1 sibling, 0 replies; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-08 6:09 UTC (permalink / raw)
To: David Hildenbrand, Michal Hocko
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 8/8/23 12:05 AM, David Hildenbrand wrote:
> On 07.08.23 14:41, David Hildenbrand wrote:
>> On 07.08.23 14:27, Michal Hocko wrote:
>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>>> [...]
>>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>>> different memory block sizes?
>>>
>>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>>> just standing in the way and I would even argue it is actively harmful.
>>> Just have a look at ridicously small memory blocks on ppc. I do
>>> understand that it makes some sense to be aligned to the memory model
>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>>> interface (if we ever go that path) should be physical memory range based.
>>
>> Yes, we discussed that a couple of times already (and so far nobody
>> cared to implement any of that).
>>
>> Small memory block sizes are very beneficial for use cases like PPC
>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
>> environments where you might want to add/remove memory in very small
>> granularity. I don't see that changing any time soon. Rather the opposite.
>>
>> Small memory block sizes are suboptimal for large machines where you
>> might never end up removing such memory (boot memory), or when dealing
>> with devices that can only be removed in one piece (DIMM/kmem). We
>> already have memory groups in place to model that.
>>
>> For the latter it might be beneficial to have memory blocks of larger
>> size that correspond to the physical memory ranges. That might also make
>> a memmap (re-)configuration easier.
>>
>> Not sure if that is standing in any way or is harmful, though.
>>
>
> Just because I thought of something right now, I'll share it, maybe it makes sense.
>
> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin:
>
> 1) We create a single altmap at the beginning of the memory
>
> 2) We create the existing fixed-size memory block devices, but flag them
> to be part of a single "altmap" unit.
>
> 3) Whenever we trigger offlining of a single such memory block, we
> offline *all* memory blocks belonging to that altmap, essentially
> using a single offline_pages() call and updating all memory block
> states accordingly.
>
> 4) Whenever we trigger onlining of a single such memory block, we
> online *all* memory blocks belonging to that altmap, using a single
> online_pages() call.
>
> 5) We fail remove_memory() if it doesn't cover the same (altmap) range.
>
> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now.
>
> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking.
>
> Just a thought.
>
W.r.t enabling memmap_on_memory for ppc64, I guess I will drop patch 7 and resend the series so that Andrew can pick rest of the patches?
-aneesh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 18:35 ` David Hildenbrand
2023-08-08 6:09 ` Aneesh Kumar K V
@ 2023-08-08 6:29 ` Aneesh Kumar K V
2023-08-08 7:46 ` David Hildenbrand
1 sibling, 1 reply; 32+ messages in thread
From: Aneesh Kumar K V @ 2023-08-08 6:29 UTC (permalink / raw)
To: David Hildenbrand, Michal Hocko
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 8/8/23 12:05 AM, David Hildenbrand wrote:
> On 07.08.23 14:41, David Hildenbrand wrote:
>> On 07.08.23 14:27, Michal Hocko wrote:
>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>>> [...]
>>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>>> different memory block sizes?
>>>
>>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>>> just standing in the way and I would even argue it is actively harmful.
>>> Just have a look at ridicously small memory blocks on ppc. I do
>>> understand that it makes some sense to be aligned to the memory model
>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>>> interface (if we ever go that path) should be physical memory range based.
>>
>> Yes, we discussed that a couple of times already (and so far nobody
>> cared to implement any of that).
>>
>> Small memory block sizes are very beneficial for use cases like PPC
>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
>> environments where you might want to add/remove memory in very small
>> granularity. I don't see that changing any time soon. Rather the opposite.
>>
>> Small memory block sizes are suboptimal for large machines where you
>> might never end up removing such memory (boot memory), or when dealing
>> with devices that can only be removed in one piece (DIMM/kmem). We
>> already have memory groups in place to model that.
>>
>> For the latter it might be beneficial to have memory blocks of larger
>> size that correspond to the physical memory ranges. That might also make
>> a memmap (re-)configuration easier.
>>
>> Not sure if that is standing in any way or is harmful, though.
>>
>
> Just because I thought of something right now, I'll share it, maybe it makes sense.
>
> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin:
>
> 1) We create a single altmap at the beginning of the memory
>
> 2) We create the existing fixed-size memory block devices, but flag them
> to be part of a single "altmap" unit.
>
> 3) Whenever we trigger offlining of a single such memory block, we
> offline *all* memory blocks belonging to that altmap, essentially
> using a single offline_pages() call and updating all memory block
> states accordingly.
>
> 4) Whenever we trigger onlining of a single such memory block, we
> online *all* memory blocks belonging to that altmap, using a single
> online_pages() call.
>
> 5) We fail remove_memory() if it doesn't cover the same (altmap) range.
>
> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now.
>
> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking.
>
We can look at the possibility of using the altmap space reserved for a namespace (via option -M dev) for allocating struct page memory even with dax/kmem.
-aneesh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-08 6:29 ` Aneesh Kumar K V
@ 2023-08-08 7:46 ` David Hildenbrand
0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2023-08-08 7:46 UTC (permalink / raw)
To: Aneesh Kumar K V, Michal Hocko
Cc: linux-mm, npiggin, Vishal Verma, akpm, linuxppc-dev,
Oscar Salvador
On 08.08.23 08:29, Aneesh Kumar K V wrote:
> On 8/8/23 12:05 AM, David Hildenbrand wrote:
>> On 07.08.23 14:41, David Hildenbrand wrote:
>>> On 07.08.23 14:27, Michal Hocko wrote:
>>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>>>> [...]
>>>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>>>> different memory block sizes?
>>>>
>>>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>>>> just standing in the way and I would even argue it is actively harmful.
>>>> Just have a look at ridicously small memory blocks on ppc. I do
>>>> understand that it makes some sense to be aligned to the memory model
>>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>>>> interface (if we ever go that path) should be physical memory range based.
>>>
>>> Yes, we discussed that a couple of times already (and so far nobody
>>> cared to implement any of that).
>>>
>>> Small memory block sizes are very beneficial for use cases like PPC
>>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
>>> environments where you might want to add/remove memory in very small
>>> granularity. I don't see that changing any time soon. Rather the opposite.
>>>
>>> Small memory block sizes are suboptimal for large machines where you
>>> might never end up removing such memory (boot memory), or when dealing
>>> with devices that can only be removed in one piece (DIMM/kmem). We
>>> already have memory groups in place to model that.
>>>
>>> For the latter it might be beneficial to have memory blocks of larger
>>> size that correspond to the physical memory ranges. That might also make
>>> a memmap (re-)configuration easier.
>>>
>>> Not sure if that is standing in any way or is harmful, though.
>>>
>>
>> Just because I thought of something right now, I'll share it, maybe it makes sense.
>>
>> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin:
>>
>> 1) We create a single altmap at the beginning of the memory
>>
>> 2) We create the existing fixed-size memory block devices, but flag them
>> to be part of a single "altmap" unit.
>>
>> 3) Whenever we trigger offlining of a single such memory block, we
>> offline *all* memory blocks belonging to that altmap, essentially
>> using a single offline_pages() call and updating all memory block
>> states accordingly.
>>
>> 4) Whenever we trigger onlining of a single such memory block, we
>> online *all* memory blocks belonging to that altmap, using a single
>> online_pages() call.
>>
>> 5) We fail remove_memory() if it doesn't cover the same (altmap) range.
>>
>> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now.
>>
>> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking.
>>
>
> We can look at the possibility of using the altmap space reserved for a namespace (via option -M dev) for allocating struct page memory even with dax/kmem.
Right, an alternative would also be for the caller to pass in the
altmap. Individual memory blocks can then get onlined/offlined as is.
One issue might be, how to get that altmap considered online memory
(e.g., pfn_to_online_page(), kdump, ...). Right now, the altmap always
falls into an online section once the memory block is online, so
pfn_to_online_page() and especially online_section() holds as soon as
the altmap has reasonable content -- when the memory block is online and
initialized the memmap. Maybe that can be worked around, just pointing
it out.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 11:30 ` Michal Hocko
2023-08-05 14:24 ` Aneesh Kumar K V
@ 2023-08-07 12:54 ` David Hildenbrand
1 sibling, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2023-08-07 12:54 UTC (permalink / raw)
To: Michal Hocko
Cc: Vishal Verma, Aneesh Kumar K V, npiggin, linux-mm, akpm,
linuxppc-dev, Oscar Salvador
On 03.08.23 13:30, Michal Hocko wrote:
> On Thu 03-08-23 11:24:08, David Hildenbrand wrote:
> [...]
>>> would be readable only when the block is offline and it would reallocate
>>> vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
>>> walkers?
>>
>> The question is: is it of any real value such that it would be worth the
>> cost and risk?
>>
>>
>> One of the primary reasons for memmap_on_memory is that *memory hotplug*
>> succeeds even in low-memory situations (including, low on ZONE_NORMAL
>> situations).
Sorry for the late reply, I'm busy with 100 other things.
>
> One usecase I would have in mind is a mix of smaller and larger memory
> blocks. For larger ones you want to have memmap_on_memory in general
> because they do not eat memory from outside but small(er) ones might be
> more tricky because now you can add a lot of blocks that would be
> internally fragmented to prevent larger allocations to form.
Okay, I see what you mean.
The internal fragmentation might become an issue at some point: for
x86-64 with 128 MiB blocks / 2 MiB THP it's not a real issue right now.
For a arm64 64k with 512 MiB blocks and 512 MiB THP / hugelb it could be
one.
I recall discussing that with Oscar back when he added memmap_on_memory,
where we also discussed the variable-sized memory blocks to avoid such
internal fragmentation.
For small ones you probably want to only use memmap_on_memory when
unavoidable: for example, when adding without memmap_on_memory would
fail / already failed. Possibly some later memmap relocation might make
sense in some scenarios.
>
>> So you want that behavior already when hotplugging such
>> devices. While there might be value to relocate it later, I'm not sure if
>> that is really worth it, and it does not solve the main use case.
>
> Is it worth it? TBH I am not sure same as I am not sure the global
> default should be writable after boot. If we want to make it more
> dynamic we should however talk about the proper layer this is
> implemented on.
Agreed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v7 0/7] Add support for memmap on memory feature on ppc64
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
` (6 preceding siblings ...)
2023-08-01 4:41 ` [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Aneesh Kumar K.V
@ 2023-08-01 9:07 ` Michal Hocko
7 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2023-08-01 9:07 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: David Hildenbrand, linux-mm, npiggin, Vishal Verma, akpm,
linuxppc-dev, Oscar Salvador
I cannot really judge the ppc specific part but other patches seem
reasonable. Patch 4 could print a more useful information about the
wastage but this is nothing really earth shattering. I am not sure about
the last patch which makes the on-memory property dynamic. This needs
much more justification and use case description IMHO.
That being said for patches 1 - 4 and 6 feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
On Tue 01-08-23 10:11:09, Aneesh Kumar K.V wrote:
> This patch series update memmap on memory feature to fall back to
> memmap allocation outside the memory block if the alignment rules are
> not met. This makes the feature more useful on architectures like
> ppc64 where alignment rules are different with 64K page size.
>
> This patch series is dependent on dax vmemmap optimization series
> posted here
> https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/
>
> Changes from v6:
> * Update comments in the code
> * Update commit message for patch 7
>
> Changes from v5:
> * Update commit message
> * Move memory alloc/free to the callers in patch 6
> * Address review feedback w.r.t patch 4
>
> Changes from v4:
> * Use altmap.free instead of altmap.reserve
> * Address review feedback
>
> Changes from v3:
> * Extend the module parameter memmap_on_memory to force allocation even
> though we can waste hotplug memory.
>
> Changes from v2:
> * Rebase to latest linus tree
> * Redo the series based on review feedback. Multiple changes to the patchset.
>
> Changes from v1:
> * update the memblock to store vmemmap_altmap details. This is required
> so that when we remove the memory we can find the altmap details which
> is needed on some architectures.
> * rebase to latest linus tree
>
>
>
> Aneesh Kumar K.V (7):
> mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
> mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
> mm/memory_hotplug: Allow architecture to override memmap on memory
> support check
> mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned
> to pageblocks
> powerpc/book3s64/memhotplug: Enable memmap on memory for radix
> mm/memory_hotplug: Embed vmem_altmap details in memory block
> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>
> .../admin-guide/mm/memory-hotplug.rst | 12 +
> arch/arm64/Kconfig | 4 +-
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/include/asm/pgtable.h | 21 ++
> .../platforms/pseries/hotplug-memory.c | 2 +-
> arch/x86/Kconfig | 4 +-
> drivers/acpi/acpi_memhotplug.c | 3 +-
> drivers/base/memory.c | 27 ++-
> include/linux/memory.h | 8 +-
> include/linux/memory_hotplug.h | 3 +-
> mm/Kconfig | 3 +
> mm/memory_hotplug.c | 205 ++++++++++++++----
> 12 files changed, 220 insertions(+), 73 deletions(-)
>
> --
> 2.41.0
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 32+ messages in thread