* [PATCH v7 0/7] Add support for memmap on memory feature on ppc64
@ 2023-08-01 4:41 Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
` (5 more replies)
0 siblings, 6 replies; 25+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
Aneesh Kumar K.V
This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.
This patch series is dependent on dax vmemmap optimization series
posted here
https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/
Changes from v6:
* Update comments in the code
* Update commit message for patch 7
Changes from v5:
* Update commit message
* Move memory alloc/free to the callers in patch 6
* Address review feedback w.r.t patch 4
Changes from v4:
* Use altmap.free instead of altmap.reserve
* Address review feedback
Changes from v3:
* Extend the module parameter memmap_on_memory to force allocation even
though we can waste hotplug memory.
Changes from v2:
* Rebase to latest linus tree
* Redo the series based on review feedback. Multiple changes to the patchset.
Changes from v1:
* update the memblock to store vmemmap_altmap details. This is required
so that when we remove the memory we can find the altmap details which
is needed on some architectures.
* rebase to latest linus tree
Aneesh Kumar K.V (7):
mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
mm/memory_hotplug: Allow architecture to override memmap on memory
support check
mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned
to pageblocks
powerpc/book3s64/memhotplug: Enable memmap on memory for radix
mm/memory_hotplug: Embed vmem_altmap details in memory block
mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
.../admin-guide/mm/memory-hotplug.rst | 12 +
arch/arm64/Kconfig | 4 +-
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/pgtable.h | 21 ++
.../platforms/pseries/hotplug-memory.c | 2 +-
arch/x86/Kconfig | 4 +-
drivers/acpi/acpi_memhotplug.c | 3 +-
drivers/base/memory.c | 27 ++-
include/linux/memory.h | 8 +-
include/linux/memory_hotplug.h | 3 +-
mm/Kconfig | 3 +
mm/memory_hotplug.c | 205 ++++++++++++++----
12 files changed, 220 insertions(+), 73 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
` (4 subsequent siblings)
5 siblings, 0 replies; 25+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
Aneesh Kumar K.V
Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.
No functional change in this patch.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/arm64/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
mm/Kconfig | 3 +++
3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1573257a4d6..0f749cfab8e6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -78,6 +78,7 @@ config ARM64
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_KEEP_MEMBLOCK
+ select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_GNU_PROPERTY
select ARCH_USE_MEMTEST
@@ -347,9 +348,6 @@ config GENERIC_CSUM
config GENERIC_CALIBRATE_DELAY
def_bool y
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
- def_bool y
-
config SMP
def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78224aa76409..d0258e92a8af 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+ select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
@@ -2610,9 +2611,6 @@ config ARCH_HAS_ADD_PAGES
def_bool y
depends on ARCH_ENABLE_MEMORY_HOTPLUG
-config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
- def_bool y
-
menu "Power management and ACPI options"
config ARCH_HIBERNATION_HEADER
diff --git a/mm/Kconfig b/mm/Kconfig
index 5fe49c030961..721dc88423c7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -571,6 +571,9 @@ config MHP_MEMMAP_ON_MEMORY
endif # MEMORY_HOTPLUG
+config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
+ bool
+
# Heavily threaded applications may benefit from splitting the mm-wide
# page_table_lock, so that faults on different parts of the user address
# space can be handled with less contention: split it at this NR_CPUS.
--
2.41.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v7 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
@ 2023-08-01 4:41 ` Aneesh Kumar K.V
[not found] ` <20230801044116.10674-8-aneesh.kumar@linux.ibm.com>
` (3 subsequent siblings)
5 siblings, 0 replies; 25+ messages in thread
From: Aneesh Kumar K.V @ 2023-08-01 4:41 UTC (permalink / raw)
To: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy
Cc: Oscar Salvador, David Hildenbrand, Michal Hocko, Vishal Verma,
Aneesh Kumar K.V
Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE. This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB
We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature
can work with a memory block size of 256MB. Using altmap.reserve feature
to align things correctly at pageblock granularity. We can end up
losing some pages in memory with this. For ex: with a 256MiB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096
pages 28 pages get reserved.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/pgtable.h | 21 +++++++++++++++++++
.../platforms/pseries/hotplug-memory.c | 2 +-
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0497d13f5b4..938294c996dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@ config PPC
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_KEEP_MEMBLOCK
+ select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index a4893b17705a..33464e6d6431 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -161,6 +161,27 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_map_size);
bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start,
unsigned long page_size);
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+#define arch_supports_memmap_on_memory arch_supports_memmap_on_memory
+static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
+{
+ if (!radix_enabled())
+ return false;
+ /*
+ * With 4K page size and 2M PMD_SIZE, we can align
+ * things better with memory block size value
+ * starting from 128MB. Hence align things with PMD_SIZE.
+ */
+ if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
+ return IS_ALIGNED(vmemmap_size, PMD_SIZE);
+ return true;
+}
+
#endif /* CONFIG_PPC64 */
#endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..4f3d6a2f9065 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -637,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = first_online_node;
/* Add the memory */
- rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+ rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_MEMMAP_ON_MEMORY);
if (rc) {
invalidate_lmb_associativity_index(lmb);
return rc;
--
2.41.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
[not found] ` <20230801044116.10674-8-aneesh.kumar@linux.ibm.com>
@ 2023-08-01 8:58 ` Michal Hocko
[not found] ` <a32fe748-fa18-bd92-3a10-5da8dbad96e6@linux.ibm.com>
0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2023-08-01 8:58 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, David Hildenbrand, Vishal Verma
On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> Allow updating memmap_on_memory mode after the kernel boot. Memory
> hotplug done after the mode update will use the new mmemap_on_memory
> value.
Well, this is a user space kABI extension and as such you should spend
more words about the usecase. Why we could live with this static and now
need dynamic?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
[not found] ` <20230801044116.10674-5-aneesh.kumar@linux.ibm.com>
@ 2023-08-01 9:04 ` Michal Hocko
0 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2023-08-01 9:04 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, David Hildenbrand, Vishal Verma
On Tue 01-08-23 10:11:13, Aneesh Kumar K.V wrote:
[...]
> + if (mode == MEMMAP_ON_MEMORY_FORCE) {
> + unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
unsigned long wastage = memmap_pages - PFN_UP(memory_block_memmap_size());
if (wastage)
pr_info_once("memmap_on_memory=force will waste %ld pages in each memory block %ld\n",
wastage, memory_block_size_bytes() >> PAGE_SHIFT);
would be more useful IMHO.
> +
> + pr_info_once("Memory hotplug will waste %ld pages in each memory block\n",
> + memmap_pages - PFN_UP(memory_block_memmap_size()));
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 0/7] Add support for memmap on memory feature on ppc64
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
` (3 preceding siblings ...)
[not found] ` <20230801044116.10674-5-aneesh.kumar@linux.ibm.com>
@ 2023-08-01 9:07 ` Michal Hocko
[not found] ` <20230801044116.10674-7-aneesh.kumar@linux.ibm.com>
5 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2023-08-01 9:07 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, David Hildenbrand, Vishal Verma
I cannot really judge the ppc specific part but other patches seem
reasonable. Patch 4 could print a more useful information about the
wastage but this is nothing really earth shattering. I am not sure about
the last patch which makes the on-memory property dynamic. This needs
much more justification and use case description IMHO.
That being said for patches 1 - 4 and 6 feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
On Tue 01-08-23 10:11:09, Aneesh Kumar K.V wrote:
> This patch series update memmap on memory feature to fall back to
> memmap allocation outside the memory block if the alignment rules are
> not met. This makes the feature more useful on architectures like
> ppc64 where alignment rules are different with 64K page size.
>
> This patch series is dependent on dax vmemmap optimization series
> posted here
> https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/
>
> Changes from v6:
> * Update comments in the code
> * Update commit message for patch 7
>
> Changes from v5:
> * Update commit message
> * Move memory alloc/free to the callers in patch 6
> * Address review feedback w.r.t patch 4
>
> Changes from v4:
> * Use altmap.free instead of altmap.reserve
> * Address review feedback
>
> Changes from v3:
> * Extend the module parameter memmap_on_memory to force allocation even
> though we can waste hotplug memory.
>
> Changes from v2:
> * Rebase to latest linus tree
> * Redo the series based on review feedback. Multiple changes to the patchset.
>
> Changes from v1:
> * update the memblock to store vmemmap_altmap details. This is required
> so that when we remove the memory we can find the altmap details which
> is needed on some architectures.
> * rebase to latest linus tree
>
>
>
> Aneesh Kumar K.V (7):
> mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig
> mm/memory_hotplug: Allow memmap on memory hotplug request to fallback
> mm/memory_hotplug: Allow architecture to override memmap on memory
> support check
> mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned
> to pageblocks
> powerpc/book3s64/memhotplug: Enable memmap on memory for radix
> mm/memory_hotplug: Embed vmem_altmap details in memory block
> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>
> .../admin-guide/mm/memory-hotplug.rst | 12 +
> arch/arm64/Kconfig | 4 +-
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/include/asm/pgtable.h | 21 ++
> .../platforms/pseries/hotplug-memory.c | 2 +-
> arch/x86/Kconfig | 4 +-
> drivers/acpi/acpi_memhotplug.c | 3 +-
> drivers/base/memory.c | 27 ++-
> include/linux/memory.h | 8 +-
> include/linux/memory_hotplug.h | 3 +-
> mm/Kconfig | 3 +
> mm/memory_hotplug.c | 205 ++++++++++++++----
> 12 files changed, 220 insertions(+), 73 deletions(-)
>
> --
> 2.41.0
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
[not found] ` <a32fe748-fa18-bd92-3a10-5da8dbad96e6@linux.ibm.com>
@ 2023-08-01 10:50 ` Michal Hocko
2023-08-02 4:45 ` Aneesh Kumar K V
0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2023-08-01 10:50 UTC (permalink / raw)
To: Aneesh Kumar K V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, David Hildenbrand, Vishal Verma
On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> On 8/1/23 2:28 PM, Michal Hocko wrote:
> > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> >> Allow updating memmap_on_memory mode after the kernel boot. Memory
> >> hotplug done after the mode update will use the new mmemap_on_memory
> >> value.
> >
> > Well, this is a user space kABI extension and as such you should spend
> > more words about the usecase. Why we could live with this static and now
> > need dynamic?
> >
>
> This enables easy testing of memmap_on_memory feature without a kernel reboot.
Testing alone is rather weak argument to be honest.
> I also expect people wanting to use that when they find dax kmem memory online
> failing because of struct page allocation failures[1]. User could reboot back with
> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> the feature much more useful.
Sure it can be useful but that holds for any feature, right. The main
question is whether this is worth maintaing. The current implementation
seems rather trivial which is an argument to have it but are there any
risks long term? Have you evaluated a potential long term maintenance
cost? There is no easy way to go back and disable it later on without
breaking some userspace.
All that should be in the changelog!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
[not found] ` <20230801044116.10674-7-aneesh.kumar@linux.ibm.com>
@ 2023-08-01 23:10 ` Verma, Vishal L
2023-08-02 4:47 ` Aneesh Kumar K V
0 siblings, 1 reply; 25+ messages in thread
From: Verma, Vishal L @ 2023-08-01 23:10 UTC (permalink / raw)
To: aneesh.kumar@linux.ibm.com, linux-mm@kvack.org,
mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org,
npiggin@gmail.com, christophe.leroy@csgroup.eu,
akpm@linux-foundation.org
Cc: Hocko, Michal, osalvador@suse.de, david@redhat.com
On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> drivers/base/memory.c | 27 +++++++++++++--------
> include/linux/memory.h | 8 ++----
> mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++----------------
> 3 files changed, 53 insertions(+), 37 deletions(-)
>
<snip>
> @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
>
> static int __ref try_remove_memory(u64 start, u64 size)
> {
> - struct vmem_altmap mhp_altmap = {};
> - struct vmem_altmap *altmap = NULL;
> - unsigned long nr_vmemmap_pages;
> + int ret;
Minor nit - there is already an 'int rc' below - just use that, or
rename it to 'ret' if that's better for consistency.
> + struct memory_block *mem;
> int rc = 0, nid = NUMA_NO_NODE;
> + struct vmem_altmap *altmap = NULL;
>
> BUG_ON(check_hotplug_memory_range(start, size));
>
> @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
> * the same granularity it was added - a single memory block.
> */
> if (mhp_memmap_on_memory()) {
> - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> - get_nr_vmemmap_pages_cb);
> - if (nr_vmemmap_pages) {
> + ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
> + if (ret) {
> if (size != memory_block_size_bytes()) {
> pr_warn("Refuse to remove %#llx - %#llx,"
> "wrong granularity\n",
> start, start + size);
> return -EINVAL;
> }
> -
> + altmap = mem->altmap;
> /*
> - * Let remove_pmd_table->free_hugepage_table do the
> - * right thing if we used vmem_altmap when hot-adding
> - * the range.
> + * Mark altmap NULL so that we can add a debug
> + * check on memblock free.
> */
> - mhp_altmap.base_pfn = PHYS_PFN(start);
> - mhp_altmap.free = nr_vmemmap_pages;
> - mhp_altmap.alloc = nr_vmemmap_pages;
> - altmap = &mhp_altmap;
> + mem->altmap = NULL;
> }
> }
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-01 10:50 ` Michal Hocko
@ 2023-08-02 4:45 ` Aneesh Kumar K V
2023-08-02 15:50 ` Michal Hocko
0 siblings, 1 reply; 25+ messages in thread
From: Aneesh Kumar K V @ 2023-08-02 4:45 UTC (permalink / raw)
To: Michal Hocko
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, David Hildenbrand, Vishal Verma
On 8/1/23 4:20 PM, Michal Hocko wrote:
> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>> value.
>>>
>>> Well, this is a user space kABI extension and as such you should spend
>>> more words about the usecase. Why we could live with this static and now
>>> need dynamic?
>>>
>>
>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>
> Testing alone is rather weak argument to be honest.
>
>> I also expect people wanting to use that when they find dax kmem memory online
>> failing because of struct page allocation failures[1]. User could reboot back with
>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>> the feature much more useful.
>
> Sure it can be useful but that holds for any feature, right. The main
> question is whether this is worth maintaing. The current implementation
> seems rather trivial which is an argument to have it but are there any
> risks long term? Have you evaluated a potential long term maintenance
> cost? There is no easy way to go back and disable it later on without
> breaking some userspace.
>
> All that should be in the changelog!
I updated it as below.
mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
Allow updating memmap_on_memory mode after the kernel boot. Memory
hotplug done after the mode update will use the new mmemap_on_memory
value.
It is now possible to test the memmap_on_memory feature easily without
the need for a kernel reboot. Additionally, for those encountering
struct page allocation failures while using dax kmem memory online, this
feature may prove useful. Instead of rebooting with the
memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
which greatly enhances its usefulness. Making the necessary changes to
support runtime updates is a simple change that should not affect the
addition of new features or result in long-term maintenance problems.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
2023-08-01 23:10 ` [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block Verma, Vishal L
@ 2023-08-02 4:47 ` Aneesh Kumar K V
0 siblings, 0 replies; 25+ messages in thread
From: Aneesh Kumar K V @ 2023-08-02 4:47 UTC (permalink / raw)
To: Verma, Vishal L, linux-mm@kvack.org, mpe@ellerman.id.au,
linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com,
christophe.leroy@csgroup.eu, akpm@linux-foundation.org
Cc: Hocko, Michal, osalvador@suse.de, david@redhat.com
On 8/2/23 4:40 AM, Verma, Vishal L wrote:
> On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote:
>> With memmap on memory, some architecture needs more details w.r.t altmap
>> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
>> computing them again when we remove a memory block, embed vmem_altmap
>> details in struct memory_block if we are using memmap on memory block
>> feature.
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>> drivers/base/memory.c | 27 +++++++++++++--------
>> include/linux/memory.h | 8 ++----
>> mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++----------------
>> 3 files changed, 53 insertions(+), 37 deletions(-)
>>
> <snip>
>
>> @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
>>
>> static int __ref try_remove_memory(u64 start, u64 size)
>> {
>> - struct vmem_altmap mhp_altmap = {};
>> - struct vmem_altmap *altmap = NULL;
>> - unsigned long nr_vmemmap_pages;
>> + int ret;
>
> Minor nit - there is already an 'int rc' below - just use that, or
> rename it to 'ret' if that's better for consistency.
>
I reused the existing rc variable.
>> + struct memory_block *mem;
>> int rc = 0, nid = NUMA_NO_NODE;
>> + struct vmem_altmap *altmap = NULL;
>>
>> BUG_ON(check_hotplug_memory_range(start, size));
>>
>> @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size)
>> * the same granularity it was added - a single memory block.
>> */
>> if (mhp_memmap_on_memory()) {
>> - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>> - get_nr_vmemmap_pages_cb);
>> - if (nr_vmemmap_pages) {
>> + ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb);
>> + if (ret) {
>> if (size != memory_block_size_bytes()) {
>> pr_warn("Refuse to remove %#llx - %#llx,"
>> "wrong granularity\n",
>> start, start + size);
>> return -EINVAL;
>> }
>> -
>> + altmap = mem->altmap;
>> /*
>> - * Let remove_pmd_table->free_hugepage_table do the
>> - * right thing if we used vmem_altmap when hot-adding
>> - * the range.
>> + * Mark altmap NULL so that we can add a debug
>> + * check on memblock free.
>> */
>> - mhp_altmap.base_pfn = PHYS_PFN(start);
>> - mhp_altmap.free = nr_vmemmap_pages;
>> - mhp_altmap.alloc = nr_vmemmap_pages;
>> - altmap = &mhp_altmap;
>> + mem->altmap = NULL;
>> }
>> }
>>
Thank you for taking the time to review the patch.
-aneesh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 4:45 ` Aneesh Kumar K V
@ 2023-08-02 15:50 ` Michal Hocko
2023-08-02 15:54 ` David Hildenbrand
0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2023-08-02 15:50 UTC (permalink / raw)
To: Aneesh Kumar K V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, David Hildenbrand, Vishal Verma
On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> On 8/1/23 4:20 PM, Michal Hocko wrote:
> > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> >> On 8/1/23 2:28 PM, Michal Hocko wrote:
> >>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> >>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
> >>>> hotplug done after the mode update will use the new mmemap_on_memory
> >>>> value.
> >>>
> >>> Well, this is a user space kABI extension and as such you should spend
> >>> more words about the usecase. Why we could live with this static and now
> >>> need dynamic?
> >>>
> >>
> >> This enables easy testing of memmap_on_memory feature without a kernel reboot.
> >
> > Testing alone is rather weak argument to be honest.
> >
> >> I also expect people wanting to use that when they find dax kmem memory online
> >> failing because of struct page allocation failures[1]. User could reboot back with
> >> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> >> the feature much more useful.
> >
> > Sure it can be useful but that holds for any feature, right. The main
> > question is whether this is worth maintaing. The current implementation
> > seems rather trivial which is an argument to have it but are there any
> > risks long term? Have you evaluated a potential long term maintenance
> > cost? There is no easy way to go back and disable it later on without
> > breaking some userspace.
> >
> > All that should be in the changelog!
>
> I updated it as below.
>
> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>
> Allow updating memmap_on_memory mode after the kernel boot. Memory
> hotplug done after the mode update will use the new mmemap_on_memory
> value.
>
> It is now possible to test the memmap_on_memory feature easily without
> the need for a kernel reboot. Additionally, for those encountering
> struct page allocation failures while using dax kmem memory online, this
> feature may prove useful. Instead of rebooting with the
> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> which greatly enhances its usefulness.
I do not really see a solid argument why rebooting is really a problem
TBH. Also is the global policy knob really the right fit for existing
hotplug usecases? In other words, if we really want to make
memmap_on_memory more flexible would it make more sense to have it per
memory block property instead (the global knob being the default if
admin doesn't specify it differently).
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:50 ` Michal Hocko
@ 2023-08-02 15:54 ` David Hildenbrand
2023-08-02 15:57 ` Aneesh Kumar K V
2023-08-02 16:59 ` Michal Hocko
0 siblings, 2 replies; 25+ messages in thread
From: David Hildenbrand @ 2023-08-02 15:54 UTC (permalink / raw)
To: Michal Hocko, Aneesh Kumar K V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 02.08.23 17:50, Michal Hocko wrote:
> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
>> On 8/1/23 4:20 PM, Michal Hocko wrote:
>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>>>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>>> value.
>>>>>
>>>>> Well, this is a user space kABI extension and as such you should spend
>>>>> more words about the usecase. Why we could live with this static and now
>>>>> need dynamic?
>>>>>
>>>>
>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>>>
>>> Testing alone is rather weak argument to be honest.
>>>
>>>> I also expect people wanting to use that when they find dax kmem memory online
>>>> failing because of struct page allocation failures[1]. User could reboot back with
>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>>>> the feature much more useful.
>>>
>>> Sure it can be useful but that holds for any feature, right. The main
>>> question is whether this is worth maintaing. The current implementation
>>> seems rather trivial which is an argument to have it but are there any
>>> risks long term? Have you evaluated a potential long term maintenance
>>> cost? There is no easy way to go back and disable it later on without
>>> breaking some userspace.
>>>
>>> All that should be in the changelog!
>>
>> I updated it as below.
>>
>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>>
>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>> hotplug done after the mode update will use the new mmemap_on_memory
>> value.
>>
>> It is now possible to test the memmap_on_memory feature easily without
>> the need for a kernel reboot. Additionally, for those encountering
>> struct page allocation failures while using dax kmem memory online, this
>> feature may prove useful. Instead of rebooting with the
>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
>> which greatly enhances its usefulness.
>
>
> I do not really see a solid argument why rebooting is really a problem
> TBH. Also is the global policy knob really the right fit for existing
> hotplug usecases? In other words, if we really want to make
> memmap_on_memory more flexible would it make more sense to have it per
> memory block property instead (the global knob being the default if
> admin doesn't specify it differently).
Per memory block isn't possible, due to the creation order. Also, I
think it's not the right approach.
I thought about driver toggles. At least for dax/kmem people are looking
into that:
https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
Where that can also be toggled per device.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:54 ` David Hildenbrand
@ 2023-08-02 15:57 ` Aneesh Kumar K V
2023-08-02 16:02 ` Verma, Vishal L
2023-08-02 16:59 ` Michal Hocko
1 sibling, 1 reply; 25+ messages in thread
From: Aneesh Kumar K V @ 2023-08-02 15:57 UTC (permalink / raw)
To: David Hildenbrand, Michal Hocko
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 8/2/23 9:24 PM, David Hildenbrand wrote:
> On 02.08.23 17:50, Michal Hocko wrote:
>> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
>>> On 8/1/23 4:20 PM, Michal Hocko wrote:
>>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>>>>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>>>> value.
>>>>>>
>>>>>> Well, this is a user space kABI extension and as such you should spend
>>>>>> more words about the usecase. Why we could live with this static and now
>>>>>> need dynamic?
>>>>>>
>>>>>
>>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>>>>
>>>> Testing alone is rather weak argument to be honest.
>>>>
>>>>> I also expect people wanting to use that when they find dax kmem memory online
>>>>> failing because of struct page allocation failures[1]. User could reboot back with
>>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>>>>> the feature much more useful.
>>>>
>>>> Sure it can be useful but that holds for any feature, right. The main
>>>> question is whether this is worth maintaing. The current implementation
>>>> seems rather trivial which is an argument to have it but are there any
>>>> risks long term? Have you evaluated a potential long term maintenance
>>>> cost? There is no easy way to go back and disable it later on without
>>>> breaking some userspace.
>>>>
>>>> All that should be in the changelog!
>>>
>>> I updated it as below.
>>>
>>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>>>
>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>> hotplug done after the mode update will use the new mmemap_on_memory
>>> value.
>>>
>>> It is now possible to test the memmap_on_memory feature easily without
>>> the need for a kernel reboot. Additionally, for those encountering
>>> struct page allocation failures while using dax kmem memory online, this
>>> feature may prove useful. Instead of rebooting with the
>>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
>>> which greatly enhances its usefulness.
>>
>>
>> I do not really see a solid argument why rebooting is really a problem
>> TBH. Also is the global policy knob really the right fit for existing
>> hotplug usecases? In other words, if we really want to make
>> memmap_on_memory more flexible would it make more sense to have it per
>> memory block property instead (the global knob being the default if
>> admin doesn't specify it differently).
>
> Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach.
>
> I thought about driver toggles. At least for dax/kmem people are looking into that:
>
> https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
>
> Where that can also be toggled per device.
>
That still is dependent on the global memmap_on_memory enabled right? We could make the dax facility independent of the
global feature toggle?
-aneesh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:57 ` Aneesh Kumar K V
@ 2023-08-02 16:02 ` Verma, Vishal L
0 siblings, 0 replies; 25+ messages in thread
From: Verma, Vishal L @ 2023-08-02 16:02 UTC (permalink / raw)
To: aneesh.kumar@linux.ibm.com, Hocko, Michal, david@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
mpe@ellerman.id.au, christophe.leroy@csgroup.eu,
npiggin@gmail.com, osalvador@suse.de, akpm@linux-foundation.org
On Wed, 2023-08-02 at 21:27 +0530, Aneesh Kumar K V wrote:
> On 8/2/23 9:24 PM, David Hildenbrand wrote:
> > On 02.08.23 17:50, Michal Hocko wrote:
> > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > > > > > value.
> > > > > > >
> > > > > > > Well, this is a user space kABI extension and as such you should spend
> > > > > > > more words about the usecase. Why we could live with this static and now
> > > > > > > need dynamic?
> > > > > > >
> > > > > >
> > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot.
> > > > >
> > > > > Testing alone is rather weak argument to be honest.
> > > > >
> > > > > > I also expect people wanting to use that when they find dax kmem memory online
> > > > > > failing because of struct page allocation failures[1]. User could reboot back with
> > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> > > > > > the feature much more useful.
> > > > >
> > > > > Sure it can be useful but that holds for any feature, right. The main
> > > > > question is whether this is worth maintaing. The current implementation
> > > > > seems rather trivial which is an argument to have it but are there any
> > > > > risks long term? Have you evaluated a potential long term maintenance
> > > > > cost? There is no easy way to go back and disable it later on without
> > > > > breaking some userspace.
> > > > >
> > > > > All that should be in the changelog!
> > > >
> > > > I updated it as below.
> > > >
> > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > > >
> > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > value.
> > > >
> > > > It is now possible to test the memmap_on_memory feature easily without
> > > > the need for a kernel reboot. Additionally, for those encountering
> > > > struct page allocation failures while using dax kmem memory online, this
> > > > feature may prove useful. Instead of rebooting with the
> > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > > which greatly enhances its usefulness.
> > >
> > >
> > > I do not really see a solid argument why rebooting is really a problem
> > > TBH. Also is the global policy knob really the right fit for existing
> > > hotplug usecases? In other words, if we really want to make
> > > memmap_on_memory more flexible would it make more sense to have it per
> > > memory block property instead (the global knob being the default if
> > > admin doesn't specify it differently).
> >
> > Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach.
> >
> > I thought about driver toggles. At least for dax/kmem people are looking into that:
> >
> > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
> >
> > Where that can also be toggled per device.
> >
>
> That still is dependent on the global memmap_on_memory enabled right? We could make the dax facility independent of the
> global feature toggle?
Correct - the latest version of those David linked does depend on the
global memmap_on_memory param. Since kmem's behavior for dax devices is
set up to be turned on / off dynamically via sysfs, it would be nice to
have a similar facility for this flag.
I did try the making dax independent of memmap_on_memory approach in my
first posting:
https://lore.kernel.org/nvdimm/20230613-vv-kmem_memmap-v1-1-f6de9c6af2c6@intel.com/
We can certainly revisit that if it looks more suitable.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 15:54 ` David Hildenbrand
2023-08-02 15:57 ` Aneesh Kumar K V
@ 2023-08-02 16:59 ` Michal Hocko
2023-08-03 8:52 ` Michal Hocko
1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2023-08-02 16:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
christophe.leroy, Oscar Salvador, Vishal Verma
On Wed 02-08-23 17:54:04, David Hildenbrand wrote:
> On 02.08.23 17:50, Michal Hocko wrote:
> > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > > > > value.
> > > > > >
> > > > > > Well, this is a user space kABI extension and as such you should spend
> > > > > > more words about the usecase. Why we could live with this static and now
> > > > > > need dynamic?
> > > > > >
> > > > >
> > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot.
> > > >
> > > > Testing alone is rather weak argument to be honest.
> > > >
> > > > > I also expect people wanting to use that when they find dax kmem memory online
> > > > > failing because of struct page allocation failures[1]. User could reboot back with
> > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> > > > > the feature much more useful.
> > > >
> > > > Sure it can be useful but that holds for any feature, right. The main
> > > > question is whether this is worth maintaing. The current implementation
> > > > seems rather trivial which is an argument to have it but are there any
> > > > risks long term? Have you evaluated a potential long term maintenance
> > > > cost? There is no easy way to go back and disable it later on without
> > > > breaking some userspace.
> > > >
> > > > All that should be in the changelog!
> > >
> > > I updated it as below.
> > >
> > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > >
> > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > hotplug done after the mode update will use the new mmemap_on_memory
> > > value.
> > >
> > > It is now possible to test the memmap_on_memory feature easily without
> > > the need for a kernel reboot. Additionally, for those encountering
> > > struct page allocation failures while using dax kmem memory online, this
> > > feature may prove useful. Instead of rebooting with the
> > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > which greatly enhances its usefulness.
> >
> >
> > I do not really see a solid argument why rebooting is really a problem
> > TBH. Also is the global policy knob really the right fit for existing
> > hotplug usecases? In other words, if we really want to make
> > memmap_on_memory more flexible would it make more sense to have it per
> > memory block property instead (the global knob being the default if
> > admin doesn't specify it differently).
>
> Per memory block isn't possible, due to the creation order.
I am not sure I follow. Could you elaborate more?
> Also, I think it's not the right approach.
>
> I thought about driver toggles. At least for dax/kmem people are looking
> into that:
>
> https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com
>
> Where that can also be toggled per device.
Per device flag makes sense to me as well. But what we should do about
hotplugged memory with a backing device (like regular RAM). I can see
some reason to have large hotplugged memory ranges to be self vmemap
hosted while smaller blocks to be backed by external vmemmaps.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-02 16:59 ` Michal Hocko
@ 2023-08-03 8:52 ` Michal Hocko
2023-08-03 9:24 ` David Hildenbrand
0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2023-08-03 8:52 UTC (permalink / raw)
To: David Hildenbrand
Cc: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
christophe.leroy, Oscar Salvador, Vishal Verma
On Wed 02-08-23 18:59:04, Michal Hocko wrote:
> On Wed 02-08-23 17:54:04, David Hildenbrand wrote:
> > On 02.08.23 17:50, Michal Hocko wrote:
> > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > > > > > value.
> > > > > > >
> > > > > > > Well, this is a user space kABI extension and as such you should spend
> > > > > > > more words about the usecase. Why we could live with this static and now
> > > > > > > need dynamic?
> > > > > > >
> > > > > >
> > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot.
> > > > >
> > > > > Testing alone is rather weak argument to be honest.
> > > > >
> > > > > > I also expect people wanting to use that when they find dax kmem memory online
> > > > > > failing because of struct page allocation failures[1]. User could reboot back with
> > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
> > > > > > the feature much more useful.
> > > > >
> > > > > Sure it can be useful but that holds for any feature, right. The main
> > > > > question is whether this is worth maintaing. The current implementation
> > > > > seems rather trivial which is an argument to have it but are there any
> > > > > risks long term? Have you evaluated a potential long term maintenance
> > > > > cost? There is no easy way to go back and disable it later on without
> > > > > breaking some userspace.
> > > > >
> > > > > All that should be in the changelog!
> > > >
> > > > I updated it as below.
> > > >
> > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > > >
> > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > value.
> > > >
> > > > It is now possible to test the memmap_on_memory feature easily without
> > > > the need for a kernel reboot. Additionally, for those encountering
> > > > struct page allocation failures while using dax kmem memory online, this
> > > > feature may prove useful. Instead of rebooting with the
> > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > > which greatly enhances its usefulness.
> > >
> > >
> > > I do not really see a solid argument why rebooting is really a problem
> > > TBH. Also is the global policy knob really the right fit for existing
> > > hotplug usecases? In other words, if we really want to make
> > > memmap_on_memory more flexible would it make more sense to have it per
> > > memory block property instead (the global knob being the default if
> > > admin doesn't specify it differently).
> >
> > Per memory block isn't possible, due to the creation order.
>
> I am not sure I follow. Could you elaborate more?
Must have been a tired brain. Now I see what you mean of course. vmemmap
is allocated at the time the range is registered and therefore memory
block sysfs created. So you are right that it is too late in some sense
but I still believe that this would be doable. The memmap_on_memory file
would be readable only when the block is offline and it would reallocate
vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
walkers?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 8:52 ` Michal Hocko
@ 2023-08-03 9:24 ` David Hildenbrand
2023-08-03 11:30 ` Michal Hocko
0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2023-08-03 9:24 UTC (permalink / raw)
To: Michal Hocko
Cc: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
christophe.leroy, Oscar Salvador, Vishal Verma
On 03.08.23 10:52, Michal Hocko wrote:
> On Wed 02-08-23 18:59:04, Michal Hocko wrote:
>> On Wed 02-08-23 17:54:04, David Hildenbrand wrote:
>>> On 02.08.23 17:50, Michal Hocko wrote:
>>>> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
>>>>> On 8/1/23 4:20 PM, Michal Hocko wrote:
>>>>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
>>>>>>> On 8/1/23 2:28 PM, Michal Hocko wrote:
>>>>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
>>>>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>>>>>> value.
>>>>>>>>
>>>>>>>> Well, this is a user space kABI extension and as such you should spend
>>>>>>>> more words about the usecase. Why we could live with this static and now
>>>>>>>> need dynamic?
>>>>>>>>
>>>>>>>
>>>>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot.
>>>>>>
>>>>>> Testing alone is rather weak argument to be honest.
>>>>>>
>>>>>>> I also expect people wanting to use that when they find dax kmem memory online
>>>>>>> failing because of struct page allocation failures[1]. User could reboot back with
>>>>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes
>>>>>>> the feature much more useful.
>>>>>>
>>>>>> Sure it can be useful but that holds for any feature, right. The main
>>>>>> question is whether this is worth maintaing. The current implementation
>>>>>> seems rather trivial which is an argument to have it but are there any
>>>>>> risks long term? Have you evaluated a potential long term maintenance
>>>>>> cost? There is no easy way to go back and disable it later on without
>>>>>> breaking some userspace.
>>>>>>
>>>>>> All that should be in the changelog!
>>>>>
>>>>> I updated it as below.
>>>>>
>>>>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
>>>>>
>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory
>>>>> hotplug done after the mode update will use the new mmemap_on_memory
>>>>> value.
>>>>>
>>>>> It is now possible to test the memmap_on_memory feature easily without
>>>>> the need for a kernel reboot. Additionally, for those encountering
>>>>> struct page allocation failures while using dax kmem memory online, this
>>>>> feature may prove useful. Instead of rebooting with the
>>>>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
>>>>> which greatly enhances its usefulness.
>>>>
>>>>
>>>> I do not really see a solid argument why rebooting is really a problem
>>>> TBH. Also is the global policy knob really the right fit for existing
>>>> hotplug usecases? In other words, if we really want to make
>>>> memmap_on_memory more flexible would it make more sense to have it per
>>>> memory block property instead (the global knob being the default if
>>>> admin doesn't specify it differently).
>>>
>>> Per memory block isn't possible, due to the creation order.
>>
>> I am not sure I follow. Could you elaborate more?
>
> Must have been a tired brain. Now I see what you mean of course. vmemmap
> is allocated at the time the range is registered and therefore memory
> block sysfs created. So you are right that it is too late in some sense
> but I still believe that this would be doable. The memmap_on_memory file
Exactly.
> would be readable only when the block is offline and it would reallocate
> vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
> walkers?
The question is: is it of any real value such that it would be worth the
cost and risk?
One of the primary reasons for memmap_on_memory is that *memory hotplug*
succeeds even in low-memory situations (including, low on ZONE_NORMAL
situations). So you want that behavior already when hotplugging such
devices. While there might be value to relocate it later, I'm not sure
if that is really worth it, and it does not solve the main use case.
For dax/kmem memory hotplug this is controlled by user space, so user
space can easily configure it. For DIMMs it's just a hotplug event that
triggers all that in the kernel, and one has to plan ahead (e.g., set
the global kernel parameter).
Besides, devices like virtio-mem really cannot universally support
memmap_on_memory.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 9:24 ` David Hildenbrand
@ 2023-08-03 11:30 ` Michal Hocko
[not found] ` <d71a85b1-c0ea-6451-d65c-d7c5040caf77@linux.ibm.com>
2023-08-07 12:54 ` David Hildenbrand
0 siblings, 2 replies; 25+ messages in thread
From: Michal Hocko @ 2023-08-03 11:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
christophe.leroy, Oscar Salvador, Vishal Verma
On Thu 03-08-23 11:24:08, David Hildenbrand wrote:
[...]
> > would be readable only when the block is offline and it would reallocate
> > vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
> > walkers?
>
> The question is: is it of any real value such that it would be worth the
> cost and risk?
>
>
> One of the primary reasons for memmap_on_memory is that *memory hotplug*
> succeeds even in low-memory situations (including, low on ZONE_NORMAL
> situations).
One usecase I would have in mind is a mix of smaller and larger memory
blocks. For larger ones you want to have memmap_on_memory in general
because they do not eat memory from outside but small(er) ones might be
more tricky because now you can add a lot of blocks that would be
internally fragmented to prevent larger allocations to form.
> So you want that behavior already when hotplugging such
> devices. While there might be value to relocate it later, I'm not sure if
> that is really worth it, and it does not solve the main use case.
Is it worth it? TBH I am not sure same as I am not sure the global
default should be writable after boot. If we want to make it more
dynamic we should however talk about the proper layer this is
implemented on.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
[not found] ` <d71a85b1-c0ea-6451-d65c-d7c5040caf77@linux.ibm.com>
@ 2023-08-07 12:27 ` Michal Hocko
2023-08-07 12:41 ` David Hildenbrand
0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2023-08-07 12:27 UTC (permalink / raw)
To: Aneesh Kumar K V
Cc: David Hildenbrand, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
christophe.leroy, Oscar Salvador, Vishal Verma
On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
[...]
> Do you see a need for firmware-managed memory to be hotplugged in with
> different memory block sizes?
In short. Yes. Slightly longer, a fixed size memory block semantic is
just standing in the way and I would even argue it is actively harmful.
Just have a look at ridicously small memory blocks on ppc. I do
understand that it makes some sense to be aligned to the memory model
(so sparsmem section aligned). In an ideal world, memory hotplug v2
interface (if we ever go that path) should be physical memory range based.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 12:27 ` Michal Hocko
@ 2023-08-07 12:41 ` David Hildenbrand
2023-08-07 18:35 ` David Hildenbrand
0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2023-08-07 12:41 UTC (permalink / raw)
To: Michal Hocko, Aneesh Kumar K V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 07.08.23 14:27, Michal Hocko wrote:
> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
> [...]
>> Do you see a need for firmware-managed memory to be hotplugged in with
>> different memory block sizes?
>
> In short. Yes. Slightly longer, a fixed size memory block semantic is
> just standing in the way and I would even argue it is actively harmful.
> Just have a look at ridicously small memory blocks on ppc. I do
> understand that it makes some sense to be aligned to the memory model
> (so sparsmem section aligned). In an ideal world, memory hotplug v2
> interface (if we ever go that path) should be physical memory range based.
Yes, we discussed that a couple of times already (and so far nobody
cared to implement any of that).
Small memory block sizes are very beneficial for use cases like PPC
dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
environments where you might want to add/remove memory in very small
granularity. I don't see that changing any time soon. Rather the opposite.
Small memory block sizes are suboptimal for large machines where you
might never end up removing such memory (boot memory), or when dealing
with devices that can only be removed in one piece (DIMM/kmem). We
already have memory groups in place to model that.
For the latter it might be beneficial to have memory blocks of larger
size that correspond to the physical memory ranges. That might also make
a memmap (re-)configuration easier.
Not sure if that is standing in any way or is harmful, though.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-03 11:30 ` Michal Hocko
[not found] ` <d71a85b1-c0ea-6451-d65c-d7c5040caf77@linux.ibm.com>
@ 2023-08-07 12:54 ` David Hildenbrand
1 sibling, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2023-08-07 12:54 UTC (permalink / raw)
To: Michal Hocko
Cc: Aneesh Kumar K V, linux-mm, akpm, mpe, linuxppc-dev, npiggin,
christophe.leroy, Oscar Salvador, Vishal Verma
On 03.08.23 13:30, Michal Hocko wrote:
> On Thu 03-08-23 11:24:08, David Hildenbrand wrote:
> [...]
>>> would be readable only when the block is offline and it would reallocate
>>> vmemmap on the change. Makes sense? Are there any risks? Maybe pfn
>>> walkers?
>>
>> The question is: is it of any real value such that it would be worth the
>> cost and risk?
>>
>>
>> One of the primary reasons for memmap_on_memory is that *memory hotplug*
>> succeeds even in low-memory situations (including, low on ZONE_NORMAL
>> situations).
Sorry for the late reply, I'm busy with 100 other things.
>
> One usecase I would have in mind is a mix of smaller and larger memory
> blocks. For larger ones you want to have memmap_on_memory in general
> because they do not eat memory from outside but small(er) ones might be
> more tricky because now you can add a lot of blocks that would be
> internally fragmented to prevent larger allocations to form.
Okay, I see what you mean.
The internal fragmentation might become an issue at some point: for
x86-64 with 128 MiB blocks / 2 MiB THP it's not a real issue right now.
For a arm64 64k with 512 MiB blocks and 512 MiB THP / hugelb it could be
one.
I recall discussing that with Oscar back when he added memmap_on_memory,
where we also discussed the variable-sized memory blocks to avoid such
internal fragmentation.
For small ones you probably want to only use memmap_on_memory when
unavoidable: for example, when adding without memmap_on_memory would
fail / already failed. Possibly some later memmap relocation might make
sense in some scenarios.
>
>> So you want that behavior already when hotplugging such
>> devices. While there might be value to relocate it later, I'm not sure if
>> that is really worth it, and it does not solve the main use case.
>
> Is it worth it? TBH I am not sure same as I am not sure the global
> default should be writable after boot. If we want to make it more
> dynamic we should however talk about the proper layer this is
> implemented on.
Agreed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 12:41 ` David Hildenbrand
@ 2023-08-07 18:35 ` David Hildenbrand
2023-08-08 6:09 ` Aneesh Kumar K V
2023-08-08 6:29 ` Aneesh Kumar K V
0 siblings, 2 replies; 25+ messages in thread
From: David Hildenbrand @ 2023-08-07 18:35 UTC (permalink / raw)
To: Michal Hocko, Aneesh Kumar K V
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 07.08.23 14:41, David Hildenbrand wrote:
> On 07.08.23 14:27, Michal Hocko wrote:
>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>> [...]
>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>> different memory block sizes?
>>
>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>> just standing in the way and I would even argue it is actively harmful.
>> Just have a look at ridicously small memory blocks on ppc. I do
>> understand that it makes some sense to be aligned to the memory model
>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>> interface (if we ever go that path) should be physical memory range based.
>
> Yes, we discussed that a couple of times already (and so far nobody
> cared to implement any of that).
>
> Small memory block sizes are very beneficial for use cases like PPC
> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
> environments where you might want to add/remove memory in very small
> granularity. I don't see that changing any time soon. Rather the opposite.
>
> Small memory block sizes are suboptimal for large machines where you
> might never end up removing such memory (boot memory), or when dealing
> with devices that can only be removed in one piece (DIMM/kmem). We
> already have memory groups in place to model that.
>
> For the latter it might be beneficial to have memory blocks of larger
> size that correspond to the physical memory ranges. That might also make
> a memmap (re-)configuration easier.
>
> Not sure if that is standing in any way or is harmful, though.
>
Just because I thought of something right now, I'll share it, maybe it
makes sense.
Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled
by the admin:
1) We create a single altmap at the beginning of the memory
2) We create the existing fixed-size memory block devices, but flag them
to be part of a single "altmap" unit.
3) Whenever we trigger offlining of a single such memory block, we
offline *all* memory blocks belonging to that altmap, essentially
using a single offline_pages() call and updating all memory block
states accordingly.
4) Whenever we trigger onlining of a single such memory block, we
online *all* memory blocks belonging to that altmap, using a single
online_pages() call.
5) We fail remove_memory() if it doesn't cover the same (altmap) range.
So we can avoid having a memory block v2 (and all that comes with that
...) for now and still get that altmap stuff sorted out. As that altmap
behavior can be controlled by the admin, we should be fine for now.
I think all memory notifiers should already be able to handle bigger
granularity, but it would be easy to check. Some internal things might
require a bit of tweaking.
Just a thought.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 18:35 ` David Hildenbrand
@ 2023-08-08 6:09 ` Aneesh Kumar K V
2023-08-08 6:29 ` Aneesh Kumar K V
1 sibling, 0 replies; 25+ messages in thread
From: Aneesh Kumar K V @ 2023-08-08 6:09 UTC (permalink / raw)
To: David Hildenbrand, Michal Hocko
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 8/8/23 12:05 AM, David Hildenbrand wrote:
> On 07.08.23 14:41, David Hildenbrand wrote:
>> On 07.08.23 14:27, Michal Hocko wrote:
>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>>> [...]
>>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>>> different memory block sizes?
>>>
>>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>>> just standing in the way and I would even argue it is actively harmful.
>>> Just have a look at ridicously small memory blocks on ppc. I do
>>> understand that it makes some sense to be aligned to the memory model
>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>>> interface (if we ever go that path) should be physical memory range based.
>>
>> Yes, we discussed that a couple of times already (and so far nobody
>> cared to implement any of that).
>>
>> Small memory block sizes are very beneficial for use cases like PPC
>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
>> environments where you might want to add/remove memory in very small
>> granularity. I don't see that changing any time soon. Rather the opposite.
>>
>> Small memory block sizes are suboptimal for large machines where you
>> might never end up removing such memory (boot memory), or when dealing
>> with devices that can only be removed in one piece (DIMM/kmem). We
>> already have memory groups in place to model that.
>>
>> For the latter it might be beneficial to have memory blocks of larger
>> size that correspond to the physical memory ranges. That might also make
>> a memmap (re-)configuration easier.
>>
>> Not sure if that is standing in any way or is harmful, though.
>>
>
> Just because I thought of something right now, I'll share it, maybe it makes sense.
>
> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin:
>
> 1) We create a single altmap at the beginning of the memory
>
> 2) We create the existing fixed-size memory block devices, but flag them
> to be part of a single "altmap" unit.
>
> 3) Whenever we trigger offlining of a single such memory block, we
> offline *all* memory blocks belonging to that altmap, essentially
> using a single offline_pages() call and updating all memory block
> states accordingly.
>
> 4) Whenever we trigger onlining of a single such memory block, we
> online *all* memory blocks belonging to that altmap, using a single
> online_pages() call.
>
> 5) We fail remove_memory() if it doesn't cover the same (altmap) range.
>
> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now.
>
> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking.
>
> Just a thought.
>
W.r.t enabling memmap_on_memory for ppc64, I guess I will drop patch 7 and resend the series so that Andrew can pick rest of the patches?
-aneesh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-07 18:35 ` David Hildenbrand
2023-08-08 6:09 ` Aneesh Kumar K V
@ 2023-08-08 6:29 ` Aneesh Kumar K V
2023-08-08 7:46 ` David Hildenbrand
1 sibling, 1 reply; 25+ messages in thread
From: Aneesh Kumar K V @ 2023-08-08 6:29 UTC (permalink / raw)
To: David Hildenbrand, Michal Hocko
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 8/8/23 12:05 AM, David Hildenbrand wrote:
> On 07.08.23 14:41, David Hildenbrand wrote:
>> On 07.08.23 14:27, Michal Hocko wrote:
>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>>> [...]
>>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>>> different memory block sizes?
>>>
>>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>>> just standing in the way and I would even argue it is actively harmful.
>>> Just have a look at ridicously small memory blocks on ppc. I do
>>> understand that it makes some sense to be aligned to the memory model
>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>>> interface (if we ever go that path) should be physical memory range based.
>>
>> Yes, we discussed that a couple of times already (and so far nobody
>> cared to implement any of that).
>>
>> Small memory block sizes are very beneficial for use cases like PPC
>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
>> environments where you might want to add/remove memory in very small
>> granularity. I don't see that changing any time soon. Rather the opposite.
>>
>> Small memory block sizes are suboptimal for large machines where you
>> might never end up removing such memory (boot memory), or when dealing
>> with devices that can only be removed in one piece (DIMM/kmem). We
>> already have memory groups in place to model that.
>>
>> For the latter it might be beneficial to have memory blocks of larger
>> size that correspond to the physical memory ranges. That might also make
>> a memmap (re-)configuration easier.
>>
>> Not sure if that is standing in any way or is harmful, though.
>>
>
> Just because I thought of something right now, I'll share it, maybe it makes sense.
>
> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin:
>
> 1) We create a single altmap at the beginning of the memory
>
> 2) We create the existing fixed-size memory block devices, but flag them
> to be part of a single "altmap" unit.
>
> 3) Whenever we trigger offlining of a single such memory block, we
> offline *all* memory blocks belonging to that altmap, essentially
> using a single offline_pages() call and updating all memory block
> states accordingly.
>
> 4) Whenever we trigger onlining of a single such memory block, we
> online *all* memory blocks belonging to that altmap, using a single
> online_pages() call.
>
> 5) We fail remove_memory() if it doesn't cover the same (altmap) range.
>
> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now.
>
> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking.
>
We can look at the possibility of using the altmap space reserved for a namespace (via option -M dev) for allocating struct page memory even with dax/kmem.
-aneesh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
2023-08-08 6:29 ` Aneesh Kumar K V
@ 2023-08-08 7:46 ` David Hildenbrand
0 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2023-08-08 7:46 UTC (permalink / raw)
To: Aneesh Kumar K V, Michal Hocko
Cc: linux-mm, akpm, mpe, linuxppc-dev, npiggin, christophe.leroy,
Oscar Salvador, Vishal Verma
On 08.08.23 08:29, Aneesh Kumar K V wrote:
> On 8/8/23 12:05 AM, David Hildenbrand wrote:
>> On 07.08.23 14:41, David Hildenbrand wrote:
>>> On 07.08.23 14:27, Michal Hocko wrote:
>>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote:
>>>> [...]
>>>>> Do you see a need for firmware-managed memory to be hotplugged in with
>>>>> different memory block sizes?
>>>>
>>>> In short. Yes. Slightly longer, a fixed size memory block semantic is
>>>> just standing in the way and I would even argue it is actively harmful.
>>>> Just have a look at ridicously small memory blocks on ppc. I do
>>>> understand that it makes some sense to be aligned to the memory model
>>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2
>>>> interface (if we ever go that path) should be physical memory range based.
>>>
>>> Yes, we discussed that a couple of times already (and so far nobody
>>> cared to implement any of that).
>>>
>>> Small memory block sizes are very beneficial for use cases like PPC
>>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual
>>> environments where you might want to add/remove memory in very small
>>> granularity. I don't see that changing any time soon. Rather the opposite.
>>>
>>> Small memory block sizes are suboptimal for large machines where you
>>> might never end up removing such memory (boot memory), or when dealing
>>> with devices that can only be removed in one piece (DIMM/kmem). We
>>> already have memory groups in place to model that.
>>>
>>> For the latter it might be beneficial to have memory blocks of larger
>>> size that correspond to the physical memory ranges. That might also make
>>> a memmap (re-)configuration easier.
>>>
>>> Not sure if that is standing in any way or is harmful, though.
>>>
>>
>> Just because I thought of something right now, I'll share it, maybe it makes sense.
>>
>> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin:
>>
>> 1) We create a single altmap at the beginning of the memory
>>
>> 2) We create the existing fixed-size memory block devices, but flag them
>> to be part of a single "altmap" unit.
>>
>> 3) Whenever we trigger offlining of a single such memory block, we
>> offline *all* memory blocks belonging to that altmap, essentially
>> using a single offline_pages() call and updating all memory block
>> states accordingly.
>>
>> 4) Whenever we trigger onlining of a single such memory block, we
>> online *all* memory blocks belonging to that altmap, using a single
>> online_pages() call.
>>
>> 5) We fail remove_memory() if it doesn't cover the same (altmap) range.
>>
>> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now.
>>
>> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking.
>>
>
> We can look at the possibility of using the altmap space reserved for a namespace (via option -M dev) for allocating struct page memory even with dax/kmem.
Right, an alternative would also be for the caller to pass in the
altmap. Individual memory blocks can then get onlined/offlined as is.
One issue might be, how to get that altmap considered online memory
(e.g., pfn_to_online_page(), kdump, ...). Right now, the altmap always
falls into an online section once the memory block is online, so
pfn_to_online_page() and especially online_section() holds as soon as
the altmap has reasonable content -- when the memory block is online and
initialized the memmap. Maybe that can be worked around, just pointing
it out.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-08-08 7:46 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-01 4:41 [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 1/7] mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig Aneesh Kumar K.V
2023-08-01 4:41 ` [PATCH v7 5/7] powerpc/book3s64/memhotplug: Enable memmap on memory for radix Aneesh Kumar K.V
[not found] ` <20230801044116.10674-8-aneesh.kumar@linux.ibm.com>
2023-08-01 8:58 ` [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Michal Hocko
[not found] ` <a32fe748-fa18-bd92-3a10-5da8dbad96e6@linux.ibm.com>
2023-08-01 10:50 ` Michal Hocko
2023-08-02 4:45 ` Aneesh Kumar K V
2023-08-02 15:50 ` Michal Hocko
2023-08-02 15:54 ` David Hildenbrand
2023-08-02 15:57 ` Aneesh Kumar K V
2023-08-02 16:02 ` Verma, Vishal L
2023-08-02 16:59 ` Michal Hocko
2023-08-03 8:52 ` Michal Hocko
2023-08-03 9:24 ` David Hildenbrand
2023-08-03 11:30 ` Michal Hocko
[not found] ` <d71a85b1-c0ea-6451-d65c-d7c5040caf77@linux.ibm.com>
2023-08-07 12:27 ` Michal Hocko
2023-08-07 12:41 ` David Hildenbrand
2023-08-07 18:35 ` David Hildenbrand
2023-08-08 6:09 ` Aneesh Kumar K V
2023-08-08 6:29 ` Aneesh Kumar K V
2023-08-08 7:46 ` David Hildenbrand
2023-08-07 12:54 ` David Hildenbrand
[not found] ` <20230801044116.10674-5-aneesh.kumar@linux.ibm.com>
2023-08-01 9:04 ` [PATCH v7 4/7] mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Michal Hocko
2023-08-01 9:07 ` [PATCH v7 0/7] Add support for memmap on memory feature on ppc64 Michal Hocko
[not found] ` <20230801044116.10674-7-aneesh.kumar@linux.ibm.com>
2023-08-01 23:10 ` [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block Verma, Vishal L
2023-08-02 4:47 ` Aneesh Kumar K V
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).