* [PATCH v3 0/5] implement "memmap on memory" feature on s390 @ 2023-11-27 8:20 Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers Sumanth Korikkar ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Sumanth Korikkar @ 2023-11-27 8:20 UTC (permalink / raw) To: linux-mm, Andrew Morton, David Hildenbrand Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Sumanth Korikkar, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML Hi All, The patch series implements "memmap on memory" feature on s390. Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure altmap cannot be written when addidng memory - before it is set online. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Patches 2 allocates vmemmap pages from self-contained memory range for s390. It allocates memory map (struct pages array) from the hotplugged memory range, rather than using system memory by passing altmap to vmemmap functions. Patch 3 removes unhandled memory notifier types on s390. Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers on s390. MEM_PREPARE_ONLINE memory notifier makes memory block physical accessible via sclp assign command. The notifier ensures self-contained memory maps are accessible and hence enabling the "memmap on memory" on s390. MEM_FINISH_OFFLINE memory notifier shifts the memory block to an inaccessible state via sclp unassign command Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390 Note: These patches are rebased on top of three fixes: mm: use vmem_altmap code without CONFIG_ZONE_DEVICE mm/memory_hotplug: fix error handling in add_memory_resource() mm/memory_hotplug: add missing mem_hotplug_lock v3: * added comments to MHP_OFFLINE_ACCESSIBLE as suggested by David. * Squashed three commits related to new memory notifier. v2: * Fixes are integrated and hence removed from this patch series Suggestions from David: * Add new flag MHP_OFFLINE_INACCESSIBLE to avoid accessing memory during memory hotplug addition phase. * Avoid page_init_poison() on memmap during mhp addition phase, when MHP_OFFLINE_INACCESSIBLE mhp_flag is passed in add_memory(). * Do not skip add_pages() in arch_add_memory(). Similarly, remove similar hacks in arch_remove_memory(). * Use MHP_PREPARE_ONLINE/MHP_FINISH_OFFLINE naming convention for new memory notifiers. * Rearrange removal of unused s390 memory notifier. * Necessary commit messages changes. Thank you Sumanth Korikkar (5): mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers s390/mm: allocate vmemmap pages from self-contained memory range s390/sclp: remove unhandled memory notifier type s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers s390: enable MHP_MEMMAP_ON_MEMORY arch/s390/Kconfig | 1 + arch/s390/mm/init.c | 3 -- arch/s390/mm/vmem.c | 62 +++++++++++++++++++--------------- drivers/base/memory.c | 21 ++++++++++-- drivers/s390/char/sclp_cmd.c | 31 ++++++++++++----- include/linux/memory.h | 2 ++ include/linux/memory_hotplug.h | 18 +++++++++- include/linux/memremap.h | 1 + mm/memory_hotplug.c | 30 ++++++++++++++-- mm/sparse.c | 3 +- 10 files changed, 127 insertions(+), 45 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2023-11-27 8:20 [PATCH v3 0/5] implement "memmap on memory" feature on s390 Sumanth Korikkar @ 2023-11-27 8:20 ` Sumanth Korikkar 2023-11-27 15:16 ` David Hildenbrand 2023-11-27 15:50 ` David Hildenbrand 2023-11-27 8:20 ` [PATCH v3 2/5] s390/mm: allocate vmemmap pages from self-contained memory range Sumanth Korikkar ` (2 subsequent siblings) 3 siblings, 2 replies; 11+ messages in thread From: Sumanth Korikkar @ 2023-11-27 8:20 UTC (permalink / raw) To: linux-mm, Andrew Morton, David Hildenbrand Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Sumanth Korikkar, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Platforms such as x86 can support physical memory hotplug via ACPI. When there is physical memory hotplug, ACPI event leads to the memory addition with the following callchain: acpi_memory_device_add() -> acpi_memory_enable_device() -> __add_memory() After this, the hotplugged memory is physically accessible, and altmap support prepared, before the "memmap on memory" initialization in memory_block_online() is called. On s390, memory hotplug works in a different way. The available hotplug memory has to be defined upfront in the hypervisor, but it is made physically accessible only when the user sets it online via sysfs, currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap on memory" initialization is performed before calling MEM_GOING_ONLINE notifier. During the memory hotplug addition phase, altmap support is prepared and during the memory onlining phase s390 requires memory to be physically accessible and then subsequently initiate the "memmap on memory" initialization process. The memory provider will handle new MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE notifications and make the memory accessible. The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written (e.g., poisoned) when adding memory -- before it is set online. This allows for adding memory with an altmap that is not currently made available by a hypervisor. When onlining that memory, the hypervisor can be instructed to make that memory accessible via the new notifiers and the onlining phase will not require any memory allocations, which is helpful in low-memory situations. All architectures ignore unknown memory notifiers. Therefore, the introduction of these new notifiers does not result in any functional modifications across architectures. Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> --- drivers/base/memory.c | 21 +++++++++++++++++++-- include/linux/memory.h | 2 ++ include/linux/memory_hotplug.h | 18 +++++++++++++++++- include/linux/memremap.h | 1 + mm/memory_hotplug.c | 30 ++++++++++++++++++++++++++++-- mm/sparse.c | 3 ++- 6 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8a13babd826c..5c6b2af75db4 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -188,6 +188,7 @@ static int memory_block_online(struct memory_block *mem) unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; unsigned long nr_vmemmap_pages = 0; + struct memory_notify arg; struct zone *zone; int ret; @@ -197,6 +198,14 @@ static int memory_block_online(struct memory_block *mem) zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group, start_pfn, nr_pages); + arg.start_pfn = start_pfn; + arg.nr_pages = nr_pages; + mem_hotplug_begin(); + ret = memory_notify(MEM_PREPARE_ONLINE, &arg); + ret = notifier_to_errno(ret); + if (ret) + goto out_notifier; + /* * Although vmemmap pages have a different lifecycle than the pages * they describe (they remain until the memory is unplugged), doing @@ -207,9 +216,9 @@ static int memory_block_online(struct memory_block *mem) if (mem->altmap) nr_vmemmap_pages = mem->altmap->free; - mem_hotplug_begin(); if (nr_vmemmap_pages) { - ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); + ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, + zone, mem->altmap->inaccessible); if (ret) goto out; } @@ -231,7 +240,11 @@ static int memory_block_online(struct memory_block *mem) nr_vmemmap_pages); mem->zone = zone; + mem_hotplug_done(); + return ret; out: + memory_notify(MEM_FINISH_OFFLINE, &arg); +out_notifier: mem_hotplug_done(); return ret; } @@ -244,6 +257,7 @@ static int memory_block_offline(struct memory_block *mem) unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; unsigned long nr_vmemmap_pages = 0; + struct memory_notify arg; int ret; if (!mem->zone) @@ -275,6 +289,9 @@ static int memory_block_offline(struct memory_block *mem) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); mem->zone = NULL; + arg.start_pfn = start_pfn; + arg.nr_pages = nr_pages; + memory_notify(MEM_FINISH_OFFLINE, &arg); out: mem_hotplug_done(); return ret; diff --git a/include/linux/memory.h b/include/linux/memory.h index f53cfdaaaa41..de802994a8fa 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -96,6 +96,8 @@ int set_memory_block_size_order(unsigned int order); #define MEM_GOING_ONLINE (1<<3) #define MEM_CANCEL_ONLINE (1<<4) #define MEM_CANCEL_OFFLINE (1<<5) +#define MEM_PREPARE_ONLINE (1<<6) +#define MEM_FINISH_OFFLINE (1<<7) struct memory_notify { unsigned long start_pfn; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 7d2076583494..ee00015575aa 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -106,6 +106,22 @@ typedef int __bitwise mhp_t; * implies the node id (nid). */ #define MHP_NID_IS_MGID ((__force mhp_t)BIT(2)) +/* + * The hotplugged memory is completely inaccessible while the memory is + * offline. The memory provider will handle MEM_PREPARE_ONLINE / + * MEM_FINISH_OFFLINE notifications and make the memory accessible. + * + * This flag is only relevant when used along with MHP_MEMMAP_ON_MEMORY, + * because the altmap cannot be written (e.g., poisoned) when adding + * memory -- before it is set online. + * + * This allows for adding memory with an altmap that is not currently + * made available by a hypervisor. When onlining that memory, the + * hypervisor can be instructed to make that memory available, and + * the onlining phase will not require any memory allocations, which is + * helpful in low-memory situations. + */ +#define MHP_OFFLINE_INACCESSIBLE ((__force mhp_t)BIT(3)) /* * Extended parameters for memory hotplug: @@ -154,7 +170,7 @@ extern void adjust_present_page_count(struct page *page, long nr_pages); /* VM interface that may be used by firmware interface */ extern int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, - struct zone *zone); + struct zone *zone, bool mhp_off_inaccessible); extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages); extern int online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 744c830f4b13..9837f3e6fb95 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -25,6 +25,7 @@ struct vmem_altmap { unsigned long free; unsigned long align; unsigned long alloc; + bool inaccessible; }; /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7a5fc89a8652..ac7cfc09502d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1083,8 +1083,25 @@ void adjust_present_page_count(struct page *page, struct memory_group *group, group->present_kernel_pages += nr_pages; } +static void page_init_poison_with_resched(unsigned long start_pfn, unsigned long nr_pages) +{ + const unsigned long end_pfn = start_pfn + nr_pages; + unsigned long pfn, cur_nr_pages; + + /* Poison struct pages because they are now uninitialized again. */ + for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { + cond_resched(); + + /* Select all remaining pages up to the next section boundary */ + cur_nr_pages = + min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); + page_init_poison(pfn_to_page(pfn), + sizeof(struct page) * cur_nr_pages); + } +} + int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, - struct zone *zone) + struct zone *zone, bool mhp_off_inaccessible) { unsigned long end_pfn = pfn + nr_pages; int ret, i; @@ -1092,7 +1109,14 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); if (ret) return ret; - + /* + * Memory block is accessible at this stage and hence poison the struct + * pages now. If the memory block is accessible during memory hotplug + * addition phase, then page poisining is already performed in + * sparse_add_section(). + */ + if (mhp_off_inaccessible) + page_init_poison_with_resched(pfn, nr_pages); move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); for (i = 0; i < nr_pages; i++) @@ -1439,6 +1463,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) if (mhp_flags & MHP_MEMMAP_ON_MEMORY) { if (mhp_supports_memmap_on_memory(size)) { mhp_altmap.free = memory_block_memmap_on_memory_pages(); + if (mhp_flags & MHP_OFFLINE_INACCESSIBLE) + mhp_altmap.inaccessible = true; params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL); if (!params.altmap) { ret = -ENOMEM; diff --git a/mm/sparse.c b/mm/sparse.c index 77d91e565045..3991c717b769 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -907,7 +907,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, * Poison uninitialized struct pages in order to catch invalid flags * combinations. */ - page_init_poison(memmap, sizeof(struct page) * nr_pages); + if (!altmap || !altmap->inaccessible) + page_init_poison(memmap, sizeof(struct page) * nr_pages); ms = __nr_to_section(section_nr); set_section_nid(section_nr, nid); -- 2.41.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2023-11-27 8:20 ` [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers Sumanth Korikkar @ 2023-11-27 15:16 ` David Hildenbrand 2023-11-28 7:54 ` Sumanth Korikkar 2023-11-27 15:50 ` David Hildenbrand 1 sibling, 1 reply; 11+ messages in thread From: David Hildenbrand @ 2023-11-27 15:16 UTC (permalink / raw) To: Sumanth Korikkar, linux-mm, Andrew Morton Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On 27.11.23 09:20, Sumanth Korikkar wrote: > Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to > prepare the transition of memory to and from a physically accessible > state. This enhancement is crucial for implementing the "memmap on > memory" feature for s390 in a subsequent patch. > > Platforms such as x86 can support physical memory hotplug via ACPI. When > there is physical memory hotplug, ACPI event leads to the memory > addition with the following callchain: > acpi_memory_device_add() > -> acpi_memory_enable_device() > -> __add_memory() > > After this, the hotplugged memory is physically accessible, and altmap > support prepared, before the "memmap on memory" initialization in > memory_block_online() is called. > > On s390, memory hotplug works in a different way. The available hotplug > memory has to be defined upfront in the hypervisor, but it is made > physically accessible only when the user sets it online via sysfs, > currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap > on memory" initialization is performed before calling MEM_GOING_ONLINE > notifier. > > During the memory hotplug addition phase, altmap support is prepared and > during the memory onlining phase s390 requires memory to be physically > accessible and then subsequently initiate the "memmap on memory" > initialization process. > > The memory provider will handle new MEM_PREPARE_ONLINE / > MEM_FINISH_OFFLINE notifications and make the memory accessible. > > The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when > used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be > written (e.g., poisoned) when adding memory -- before it is set online. > This allows for adding memory with an altmap that is not currently made > available by a hypervisor. When onlining that memory, the hypervisor can > be instructed to make that memory accessible via the new notifiers and > the onlining phase will not require any memory allocations, which is > helpful in low-memory situations. > > All architectures ignore unknown memory notifiers. Therefore, the > introduction of these new notifiers does not result in any functional > modifications across architectures. > > Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > --- [...] > }; > > /* > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 7a5fc89a8652..ac7cfc09502d 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1083,8 +1083,25 @@ void adjust_present_page_count(struct page *page, struct memory_group *group, > group->present_kernel_pages += nr_pages; > } > > +static void page_init_poison_with_resched(unsigned long start_pfn, unsigned long nr_pages) > +{ > + const unsigned long end_pfn = start_pfn + nr_pages; > + unsigned long pfn, cur_nr_pages; > + > + /* Poison struct pages because they are now uninitialized again. */ > + for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { > + cond_resched(); > + > + /* Select all remaining pages up to the next section boundary */ > + cur_nr_pages = > + min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); > + page_init_poison(pfn_to_page(pfn), > + sizeof(struct page) * cur_nr_pages); > + } > +} > + > int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, > - struct zone *zone) > + struct zone *zone, bool mhp_off_inaccessible) > { > unsigned long end_pfn = pfn + nr_pages; > int ret, i; > @@ -1092,7 +1109,14 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, > ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); > if (ret) > return ret; > - > + /* > + * Memory block is accessible at this stage and hence poison the struct > + * pages now. If the memory block is accessible during memory hotplug > + * addition phase, then page poisining is already performed in > + * sparse_add_section(). > + */ > + if (mhp_off_inaccessible) > + page_init_poison_with_resched(pfn, nr_pages); Can you elaborate why a simple page_init_poison() as for sparse_add_section() is insufficient? Apart from that looks good. Ideally, we'd be updating altmap->inaccessible as we online/offline memory. But then, we'd have to remember MHP_OFFLINE_INACCESSIBLE somehow differently. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2023-11-27 15:16 ` David Hildenbrand @ 2023-11-28 7:54 ` Sumanth Korikkar 0 siblings, 0 replies; 11+ messages in thread From: Sumanth Korikkar @ 2023-11-28 7:54 UTC (permalink / raw) To: David Hildenbrand Cc: linux-mm, Andrew Morton, Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On Mon, Nov 27, 2023 at 04:16:18PM +0100, David Hildenbrand wrote: > On 27.11.23 09:20, Sumanth Korikkar wrote: > > Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to > > prepare the transition of memory to and from a physically accessible > > state. This enhancement is crucial for implementing the "memmap on > > memory" feature for s390 in a subsequent patch. > > > > Platforms such as x86 can support physical memory hotplug via ACPI. When > > there is physical memory hotplug, ACPI event leads to the memory > > addition with the following callchain: > > acpi_memory_device_add() > > -> acpi_memory_enable_device() > > -> __add_memory() > > > > After this, the hotplugged memory is physically accessible, and altmap > > support prepared, before the "memmap on memory" initialization in > > memory_block_online() is called. > > > > On s390, memory hotplug works in a different way. The available hotplug > > memory has to be defined upfront in the hypervisor, but it is made > > physically accessible only when the user sets it online via sysfs, > > currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap > > on memory" initialization is performed before calling MEM_GOING_ONLINE > > notifier. > > > > During the memory hotplug addition phase, altmap support is prepared and > > during the memory onlining phase s390 requires memory to be physically > > accessible and then subsequently initiate the "memmap on memory" > > initialization process. > > > > The memory provider will handle new MEM_PREPARE_ONLINE / > > MEM_FINISH_OFFLINE notifications and make the memory accessible. > > > > The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when > > used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be > > written (e.g., poisoned) when adding memory -- before it is set online. > > This allows for adding memory with an altmap that is not currently made > > available by a hypervisor. When onlining that memory, the hypervisor can > > be instructed to make that memory accessible via the new notifiers and > > the onlining phase will not require any memory allocations, which is > > helpful in low-memory situations. > > > > All architectures ignore unknown memory notifiers. Therefore, the > > introduction of these new notifiers does not result in any functional > > modifications across architectures. > > > > Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > > --- ... > > int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, > > - struct zone *zone) > > + struct zone *zone, bool mhp_off_inaccessible) > > { > > unsigned long end_pfn = pfn + nr_pages; > > int ret, i; > > @@ -1092,7 +1109,14 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, > > ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); > > if (ret) > > return ret; > > - > > + /* > > + * Memory block is accessible at this stage and hence poison the struct > > + * pages now. If the memory block is accessible during memory hotplug > > + * addition phase, then page poisining is already performed in > > + * sparse_add_section(). > > + */ > > + if (mhp_off_inaccessible) > > + page_init_poison_with_resched(pfn, nr_pages); > > Can you elaborate why a simple page_init_poison() as for > sparse_add_section() is insufficient? > Looks like cond_resched() is not needed. page_init_poison() could be performed similar to when adding new memory in sparse_add_section(). IIUC, As memory is onlined in memory block granuality, this cond_resched() wouldnt be needed then. Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2023-11-27 8:20 ` [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers Sumanth Korikkar 2023-11-27 15:16 ` David Hildenbrand @ 2023-11-27 15:50 ` David Hildenbrand 1 sibling, 0 replies; 11+ messages in thread From: David Hildenbrand @ 2023-11-27 15:50 UTC (permalink / raw) To: Sumanth Korikkar, linux-mm, Andrew Morton Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On 27.11.23 09:20, Sumanth Korikkar wrote: > Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to > prepare the transition of memory to and from a physically accessible > state. This enhancement is crucial for implementing the "memmap on > memory" feature for s390 in a subsequent patch. > > Platforms such as x86 can support physical memory hotplug via ACPI. When > there is physical memory hotplug, ACPI event leads to the memory > addition with the following callchain: > acpi_memory_device_add() > -> acpi_memory_enable_device() > -> __add_memory() > > After this, the hotplugged memory is physically accessible, and altmap > support prepared, before the "memmap on memory" initialization in > memory_block_online() is called. > > On s390, memory hotplug works in a different way. The available hotplug > memory has to be defined upfront in the hypervisor, but it is made > physically accessible only when the user sets it online via sysfs, > currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap > on memory" initialization is performed before calling MEM_GOING_ONLINE > notifier. > > During the memory hotplug addition phase, altmap support is prepared and > during the memory onlining phase s390 requires memory to be physically > accessible and then subsequently initiate the "memmap on memory" > initialization process. > > The memory provider will handle new MEM_PREPARE_ONLINE / > MEM_FINISH_OFFLINE notifications and make the memory accessible. > > The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when > used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be > written (e.g., poisoned) when adding memory -- before it is set online. > This allows for adding memory with an altmap that is not currently made > available by a hypervisor. When onlining that memory, the hypervisor can > be instructed to make that memory accessible via the new notifiers and > the onlining phase will not require any memory allocations, which is > helpful in low-memory situations. > > All architectures ignore unknown memory notifiers. Therefore, the > introduction of these new notifiers does not result in any functional > modifications across architectures. > > Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > --- > drivers/base/memory.c | 21 +++++++++++++++++++-- > include/linux/memory.h | 2 ++ > include/linux/memory_hotplug.h | 18 +++++++++++++++++- > include/linux/memremap.h | 1 + > mm/memory_hotplug.c | 30 ++++++++++++++++++++++++++++-- > mm/sparse.c | 3 ++- > 6 files changed, 69 insertions(+), 6 deletions(-) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 8a13babd826c..5c6b2af75db4 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -188,6 +188,7 @@ static int memory_block_online(struct memory_block *mem) > unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); > unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; > unsigned long nr_vmemmap_pages = 0; > + struct memory_notify arg; > struct zone *zone; > int ret; > > @@ -197,6 +198,14 @@ static int memory_block_online(struct memory_block *mem) > zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group, > start_pfn, nr_pages); > > + arg.start_pfn = start_pfn; > + arg.nr_pages = nr_pages; Thinking about it, it's possibly cleanest to send the altmap range along here. Later memory onlining notifiers will only be notified about the actual !altmap part. arg.altmap_start_pfn = start_pfn; arg.altmap_nr_pages = nr_vmemmap_pages; arg.start_pfn = start_pfn + nr_vmemmap_pages; arg.nr_pages = nr_pages - nr_vmemmap_pages; Use that for the two new notifiers only and document it. Thoughts? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/5] s390/mm: allocate vmemmap pages from self-contained memory range 2023-11-27 8:20 [PATCH v3 0/5] implement "memmap on memory" feature on s390 Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers Sumanth Korikkar @ 2023-11-27 8:20 ` Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 3/5] s390/sclp: remove unhandled memory notifier type Sumanth Korikkar [not found] ` <20231127082023.2079810-5-sumanthk@linux.ibm.com> 3 siblings, 0 replies; 11+ messages in thread From: Sumanth Korikkar @ 2023-11-27 8:20 UTC (permalink / raw) To: linux-mm, Andrew Morton, David Hildenbrand Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Sumanth Korikkar, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML Allocate memory map (struct pages array) from the hotplugged memory range, rather than using system memory. The change addresses the issue where standby memory, when configured to be much larger than online memory, could potentially lead to ipl failure due to memory map allocation from online memory. For example, 16MB of memory map allocation is needed for a memory block size of 1GB and when standby memory is configured much larger than online memory, this could lead to ipl failure. To address this issue, the solution involves introducing "memmap on memory" using the vmem_altmap structure on s390. Architectures that want to implement it should pass the altmap to the vmemmap_populate() function and its associated callchain. This enhancement is discussed in the commit 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()"). Provide "memmap on memory" support for s390 by passing the altmap in vmemmap_populate() and its callchain. The allocation path is described as follows: * When altmap is NULL in vmemmap_populate(), memory map allocation occurs using the existing vmemmap_alloc_block_buf(). * When altmap is not NULL in vmemmap_populate(), memory map allocation still uses vmemmap_alloc_block_buf(), but this function internally calls altmap_alloc_block_buf(). For deallocation, the process is outlined as follows: * When altmap is NULL in vmemmap_free(), memory map deallocation happens through free_pages(). * When altmap is not NULL in vmemmap_free(), memory map deallocation occurs via vmem_altmap_free(). While memory map allocation is primarily handled through the self-contained memory map range, there might still be a small amount of system memory allocation required for vmemmap pagetables. To mitigate this impact, this feature will be limited to machines with EDAT1 support. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> --- arch/s390/mm/init.c | 3 --- arch/s390/mm/vmem.c | 62 +++++++++++++++++++++++++-------------------- 2 files changed, 35 insertions(+), 30 deletions(-) diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 43e612bc2bcd..8d9a60ccb777 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -281,9 +281,6 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long size_pages = PFN_DOWN(size); int rc; - if (WARN_ON_ONCE(params->altmap)) - return -EINVAL; - if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)) return -EINVAL; diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 186a020857cf..eb100479f7be 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -33,8 +33,12 @@ static void __ref *vmem_alloc_pages(unsigned int order) return memblock_alloc(size, size); } -static void vmem_free_pages(unsigned long addr, int order) +static void vmem_free_pages(unsigned long addr, int order, struct vmem_altmap *altmap) { + if (altmap) { + vmem_altmap_free(altmap, 1 << order); + return; + } /* We don't expect boot memory to be removed ever. */ if (!slab_is_available() || WARN_ON_ONCE(PageReserved(virt_to_page((void *)addr)))) @@ -156,7 +160,8 @@ static bool vmemmap_unuse_sub_pmd(unsigned long start, unsigned long end) /* __ref: we'll only call vmemmap_alloc_block() via vmemmap_populate() */ static int __ref modify_pte_table(pmd_t *pmd, unsigned long addr, - unsigned long end, bool add, bool direct) + unsigned long end, bool add, bool direct, + struct vmem_altmap *altmap) { unsigned long prot, pages = 0; int ret = -ENOMEM; @@ -172,11 +177,11 @@ static int __ref modify_pte_table(pmd_t *pmd, unsigned long addr, if (pte_none(*pte)) continue; if (!direct) - vmem_free_pages((unsigned long) pfn_to_virt(pte_pfn(*pte)), 0); + vmem_free_pages((unsigned long)pfn_to_virt(pte_pfn(*pte)), get_order(PAGE_SIZE), altmap); pte_clear(&init_mm, addr, pte); } else if (pte_none(*pte)) { if (!direct) { - void *new_page = vmemmap_alloc_block(PAGE_SIZE, NUMA_NO_NODE); + void *new_page = vmemmap_alloc_block_buf(PAGE_SIZE, NUMA_NO_NODE, altmap); if (!new_page) goto out; @@ -213,7 +218,8 @@ static void try_free_pte_table(pmd_t *pmd, unsigned long start) /* __ref: we'll only call vmemmap_alloc_block() via vmemmap_populate() */ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, - unsigned long end, bool add, bool direct) + unsigned long end, bool add, bool direct, + struct vmem_altmap *altmap) { unsigned long next, prot, pages = 0; int ret = -ENOMEM; @@ -234,11 +240,11 @@ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, if (IS_ALIGNED(addr, PMD_SIZE) && IS_ALIGNED(next, PMD_SIZE)) { if (!direct) - vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE)); + vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE), altmap); pmd_clear(pmd); pages++; } else if (!direct && vmemmap_unuse_sub_pmd(addr, next)) { - vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE)); + vmem_free_pages(pmd_deref(*pmd), get_order(PMD_SIZE), altmap); pmd_clear(pmd); } continue; @@ -261,7 +267,7 @@ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, * page tables since vmemmap_populate gets * called for each section separately. */ - new_page = vmemmap_alloc_block(PMD_SIZE, NUMA_NO_NODE); + new_page = vmemmap_alloc_block_buf(PMD_SIZE, NUMA_NO_NODE, altmap); if (new_page) { set_pmd(pmd, __pmd(__pa(new_page) | prot)); if (!IS_ALIGNED(addr, PMD_SIZE) || @@ -280,7 +286,7 @@ static int __ref modify_pmd_table(pud_t *pud, unsigned long addr, vmemmap_use_sub_pmd(addr, next); continue; } - ret = modify_pte_table(pmd, addr, next, add, direct); + ret = modify_pte_table(pmd, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -302,12 +308,12 @@ static void try_free_pmd_table(pud_t *pud, unsigned long start) for (i = 0; i < PTRS_PER_PMD; i++, pmd++) if (!pmd_none(*pmd)) return; - vmem_free_pages(pud_deref(*pud), CRST_ALLOC_ORDER); + vmem_free_pages(pud_deref(*pud), CRST_ALLOC_ORDER, NULL); pud_clear(pud); } static int modify_pud_table(p4d_t *p4d, unsigned long addr, unsigned long end, - bool add, bool direct) + bool add, bool direct, struct vmem_altmap *altmap) { unsigned long next, prot, pages = 0; int ret = -ENOMEM; @@ -347,7 +353,7 @@ static int modify_pud_table(p4d_t *p4d, unsigned long addr, unsigned long end, } else if (pud_large(*pud)) { continue; } - ret = modify_pmd_table(pud, addr, next, add, direct); + ret = modify_pmd_table(pud, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -370,12 +376,12 @@ static void try_free_pud_table(p4d_t *p4d, unsigned long start) if (!pud_none(*pud)) return; } - vmem_free_pages(p4d_deref(*p4d), CRST_ALLOC_ORDER); + vmem_free_pages(p4d_deref(*p4d), CRST_ALLOC_ORDER, NULL); p4d_clear(p4d); } static int modify_p4d_table(pgd_t *pgd, unsigned long addr, unsigned long end, - bool add, bool direct) + bool add, bool direct, struct vmem_altmap *altmap) { unsigned long next; int ret = -ENOMEM; @@ -394,7 +400,7 @@ static int modify_p4d_table(pgd_t *pgd, unsigned long addr, unsigned long end, goto out; p4d_populate(&init_mm, p4d, pud); } - ret = modify_pud_table(p4d, addr, next, add, direct); + ret = modify_pud_table(p4d, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -415,12 +421,12 @@ static void try_free_p4d_table(pgd_t *pgd, unsigned long start) if (!p4d_none(*p4d)) return; } - vmem_free_pages(pgd_deref(*pgd), CRST_ALLOC_ORDER); + vmem_free_pages(pgd_deref(*pgd), CRST_ALLOC_ORDER, NULL); pgd_clear(pgd); } static int modify_pagetable(unsigned long start, unsigned long end, bool add, - bool direct) + bool direct, struct vmem_altmap *altmap) { unsigned long addr, next; int ret = -ENOMEM; @@ -445,7 +451,7 @@ static int modify_pagetable(unsigned long start, unsigned long end, bool add, goto out; pgd_populate(&init_mm, pgd, p4d); } - ret = modify_p4d_table(pgd, addr, next, add, direct); + ret = modify_p4d_table(pgd, addr, next, add, direct, altmap); if (ret) goto out; if (!add) @@ -458,14 +464,16 @@ static int modify_pagetable(unsigned long start, unsigned long end, bool add, return ret; } -static int add_pagetable(unsigned long start, unsigned long end, bool direct) +static int add_pagetable(unsigned long start, unsigned long end, bool direct, + struct vmem_altmap *altmap) { - return modify_pagetable(start, end, true, direct); + return modify_pagetable(start, end, true, direct, altmap); } -static int remove_pagetable(unsigned long start, unsigned long end, bool direct) +static int remove_pagetable(unsigned long start, unsigned long end, bool direct, + struct vmem_altmap *altmap) { - return modify_pagetable(start, end, false, direct); + return modify_pagetable(start, end, false, direct, altmap); } /* @@ -474,7 +482,7 @@ static int remove_pagetable(unsigned long start, unsigned long end, bool direct) static int vmem_add_range(unsigned long start, unsigned long size) { start = (unsigned long)__va(start); - return add_pagetable(start, start + size, true); + return add_pagetable(start, start + size, true, NULL); } /* @@ -483,7 +491,7 @@ static int vmem_add_range(unsigned long start, unsigned long size) static void vmem_remove_range(unsigned long start, unsigned long size) { start = (unsigned long)__va(start); - remove_pagetable(start, start + size, true); + remove_pagetable(start, start + size, true, NULL); } /* @@ -496,9 +504,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, mutex_lock(&vmem_mutex); /* We don't care about the node, just use NUMA_NO_NODE on allocations */ - ret = add_pagetable(start, end, false); + ret = add_pagetable(start, end, false, altmap); if (ret) - remove_pagetable(start, end, false); + remove_pagetable(start, end, false, altmap); mutex_unlock(&vmem_mutex); return ret; } @@ -509,7 +517,7 @@ void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap) { mutex_lock(&vmem_mutex); - remove_pagetable(start, end, false); + remove_pagetable(start, end, false, altmap); mutex_unlock(&vmem_mutex); } -- 2.41.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 3/5] s390/sclp: remove unhandled memory notifier type 2023-11-27 8:20 [PATCH v3 0/5] implement "memmap on memory" feature on s390 Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 2/5] s390/mm: allocate vmemmap pages from self-contained memory range Sumanth Korikkar @ 2023-11-27 8:20 ` Sumanth Korikkar 2023-11-27 15:02 ` David Hildenbrand [not found] ` <20231127082023.2079810-5-sumanthk@linux.ibm.com> 3 siblings, 1 reply; 11+ messages in thread From: Sumanth Korikkar @ 2023-11-27 8:20 UTC (permalink / raw) To: linux-mm, Andrew Morton, David Hildenbrand Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Sumanth Korikkar, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML Remove memory notifier types which are unhandled by s390. Unhandled memory notifier types are covered by default case. Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> --- drivers/s390/char/sclp_cmd.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 11c428f4c7cf..355e63e44e95 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -340,9 +340,6 @@ static int sclp_mem_notifier(struct notifier_block *nb, if (contains_standby_increment(start, start + size)) rc = -EPERM; break; - case MEM_ONLINE: - case MEM_CANCEL_OFFLINE: - break; case MEM_GOING_ONLINE: rc = sclp_mem_change_state(start, size, 1); break; -- 2.41.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/5] s390/sclp: remove unhandled memory notifier type 2023-11-27 8:20 ` [PATCH v3 3/5] s390/sclp: remove unhandled memory notifier type Sumanth Korikkar @ 2023-11-27 15:02 ` David Hildenbrand 0 siblings, 0 replies; 11+ messages in thread From: David Hildenbrand @ 2023-11-27 15:02 UTC (permalink / raw) To: Sumanth Korikkar, linux-mm, Andrew Morton Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On 27.11.23 09:20, Sumanth Korikkar wrote: > Remove memory notifier types which are unhandled by s390. Unhandled > memory notifier types are covered by default case. > > Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20231127082023.2079810-5-sumanthk@linux.ibm.com>]
* Re: [PATCH v3 4/5] s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers [not found] ` <20231127082023.2079810-5-sumanthk@linux.ibm.com> @ 2023-11-27 15:11 ` David Hildenbrand 2023-11-27 16:12 ` Sumanth Korikkar 0 siblings, 1 reply; 11+ messages in thread From: David Hildenbrand @ 2023-11-27 15:11 UTC (permalink / raw) To: Sumanth Korikkar, linux-mm, Andrew Morton Cc: Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On 27.11.23 09:20, Sumanth Korikkar wrote: > MEM_PREPARE_ONLINE memory notifier makes memory block physical > accessible via sclp assign command. The notifier ensures self-contained > memory maps are accessible and hence enabling the "memmap on memory" on > s390. > > MEM_FINISH_OFFLINE memory notifier shifts the memory block to an > inaccessible state via sclp unassign command. > > Implementation considerations: > * When MHP_MEMMAP_ON_MEMORY is disabled, the system retains the old > behavior. This means the memory map is allocated from default memory. > * If MACHINE_HAS_EDAT1 is unavailable, MHP_MEMMAP_ON_MEMORY is > automatically disabled. This ensures that vmemmap pagetables do not > consume additional memory from the default memory allocator. > * The MEM_GOING_ONLINE notifier has been modified to perform no > operation, as MEM_PREPARE_ONLINE already executes the sclp assign > command. > * The MEM_CANCEL_ONLINE/MEM_OFFLINE notifier now performs no operation, as > MEM_FINISH_OFFLINE already executes the sclp unassign command. > > Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > --- > drivers/s390/char/sclp_cmd.c | 28 ++++++++++++++++++++++------ > 1 file changed, 22 insertions(+), 6 deletions(-) > > diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c > index 355e63e44e95..30b829e4c052 100644 > --- a/drivers/s390/char/sclp_cmd.c > +++ b/drivers/s390/char/sclp_cmd.c > @@ -18,6 +18,7 @@ > #include <linux/mm.h> > #include <linux/mmzone.h> > #include <linux/memory.h> > +#include <linux/memory_hotplug.h> > #include <linux/module.h> > #include <asm/ctlreg.h> > #include <asm/chpid.h> > @@ -26,6 +27,7 @@ > #include <asm/sclp.h> > #include <asm/numa.h> > #include <asm/facility.h> > +#include <asm/page-states.h> > > #include "sclp.h" > > @@ -319,6 +321,7 @@ static bool contains_standby_increment(unsigned long start, unsigned long end) > static int sclp_mem_notifier(struct notifier_block *nb, > unsigned long action, void *data) > { > + struct memory_block *memory_block; > unsigned long start, size; > struct memory_notify *arg; > unsigned char id; > @@ -340,18 +343,29 @@ static int sclp_mem_notifier(struct notifier_block *nb, > if (contains_standby_increment(start, start + size)) > rc = -EPERM; > break; > - case MEM_GOING_ONLINE: > + case MEM_PREPARE_ONLINE: > + memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn)); > + if (!memory_block) { > + rc = -EINVAL; > + goto out; > + } > rc = sclp_mem_change_state(start, size, 1); > + if (rc || !memory_block->altmap) > + goto out; > + /* > + * Set CMMA state to nodat here, since the struct page memory > + * at the beginning of the memory block will not go through the > + * buddy allocator later. > + */ > + __arch_set_page_nodat((void *)__va(start), memory_block->altmap->free); Looking up the memory block and grabbing the altmap from there is a bit unfortunate. Why can't we do that when adding the altmap? Will the hypervisor scream at us? ... would we want to communicate any altmap start+size via the memory notifier instead? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/5] s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2023-11-27 15:11 ` [PATCH v3 4/5] s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers David Hildenbrand @ 2023-11-27 16:12 ` Sumanth Korikkar 2023-11-27 16:58 ` David Hildenbrand 0 siblings, 1 reply; 11+ messages in thread From: Sumanth Korikkar @ 2023-11-27 16:12 UTC (permalink / raw) To: David Hildenbrand Cc: linux-mm, Andrew Morton, Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On Mon, Nov 27, 2023 at 04:11:05PM +0100, David Hildenbrand wrote: > > diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c > > index 355e63e44e95..30b829e4c052 100644 > > --- a/drivers/s390/char/sclp_cmd.c > > +++ b/drivers/s390/char/sclp_cmd.c > > @@ -18,6 +18,7 @@ > > #include <linux/mm.h> > > #include <linux/mmzone.h> > > #include <linux/memory.h> > > +#include <linux/memory_hotplug.h> > > #include <linux/module.h> > > #include <asm/ctlreg.h> > > #include <asm/chpid.h> > > @@ -26,6 +27,7 @@ > > #include <asm/sclp.h> > > #include <asm/numa.h> > > #include <asm/facility.h> > > +#include <asm/page-states.h> > > #include "sclp.h" > > @@ -319,6 +321,7 @@ static bool contains_standby_increment(unsigned long start, unsigned long end) > > static int sclp_mem_notifier(struct notifier_block *nb, > > unsigned long action, void *data) > > { > > + struct memory_block *memory_block; > > unsigned long start, size; > > struct memory_notify *arg; > > unsigned char id; > > @@ -340,18 +343,29 @@ static int sclp_mem_notifier(struct notifier_block *nb, > > if (contains_standby_increment(start, start + size)) > > rc = -EPERM; > > break; > > - case MEM_GOING_ONLINE: > > + case MEM_PREPARE_ONLINE: > > + memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn)); > > + if (!memory_block) { > > + rc = -EINVAL; > > + goto out; > > + } > > rc = sclp_mem_change_state(start, size, 1); > > + if (rc || !memory_block->altmap) > > + goto out; > > + /* > > + * Set CMMA state to nodat here, since the struct page memory > > + * at the beginning of the memory block will not go through the > > + * buddy allocator later. > > + */ > > + __arch_set_page_nodat((void *)__va(start), memory_block->altmap->free); > > Looking up the memory block and grabbing the altmap from there is a bit > unfortunate. > > Why can't we do that when adding the altmap? Will the hypervisor scream at > us? > calling __arch_set_page_nodat() before making memory block accessible will lead to crash. Hence, we think this is the only safe location to place it. > ... would we want to communicate any altmap start+size via the memory > notifier instead? Passing start, size of memory range via memory notifier looks correct approach to me, as we try to make the specified range accessible. If we want to pass altmap size (nr_vmemmap_pages), then we might need a new field in struct memory_notify, which would prevent access of memory_block->altmap->free in the notifier. Do you want to take this approach instead? If yes, Then I could add a new field nr_vmemmap_pages in struct memory_notify and place it in PATCH : introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/5] s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2023-11-27 16:12 ` Sumanth Korikkar @ 2023-11-27 16:58 ` David Hildenbrand 0 siblings, 0 replies; 11+ messages in thread From: David Hildenbrand @ 2023-11-27 16:58 UTC (permalink / raw) To: Sumanth Korikkar Cc: linux-mm, Andrew Morton, Oscar Salvador, Michal Hocko, Aneesh Kumar K.V, Anshuman Khandual, Gerald Schaefer, Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390, LKML On 27.11.23 17:12, Sumanth Korikkar wrote: > On Mon, Nov 27, 2023 at 04:11:05PM +0100, David Hildenbrand wrote: >>> diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c >>> index 355e63e44e95..30b829e4c052 100644 >>> --- a/drivers/s390/char/sclp_cmd.c >>> +++ b/drivers/s390/char/sclp_cmd.c >>> @@ -18,6 +18,7 @@ >>> #include <linux/mm.h> >>> #include <linux/mmzone.h> >>> #include <linux/memory.h> >>> +#include <linux/memory_hotplug.h> >>> #include <linux/module.h> >>> #include <asm/ctlreg.h> >>> #include <asm/chpid.h> >>> @@ -26,6 +27,7 @@ >>> #include <asm/sclp.h> >>> #include <asm/numa.h> >>> #include <asm/facility.h> >>> +#include <asm/page-states.h> >>> #include "sclp.h" >>> @@ -319,6 +321,7 @@ static bool contains_standby_increment(unsigned long start, unsigned long end) >>> static int sclp_mem_notifier(struct notifier_block *nb, >>> unsigned long action, void *data) >>> { >>> + struct memory_block *memory_block; >>> unsigned long start, size; >>> struct memory_notify *arg; >>> unsigned char id; >>> @@ -340,18 +343,29 @@ static int sclp_mem_notifier(struct notifier_block *nb, >>> if (contains_standby_increment(start, start + size)) >>> rc = -EPERM; >>> break; >>> - case MEM_GOING_ONLINE: >>> + case MEM_PREPARE_ONLINE: >>> + memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn)); >>> + if (!memory_block) { >>> + rc = -EINVAL; >>> + goto out; >>> + } >>> rc = sclp_mem_change_state(start, size, 1); >>> + if (rc || !memory_block->altmap) >>> + goto out; >>> + /* >>> + * Set CMMA state to nodat here, since the struct page memory >>> + * at the beginning of the memory block will not go through the >>> + * buddy allocator later. >>> + */ >>> + __arch_set_page_nodat((void *)__va(start), memory_block->altmap->free); >> >> Looking up the memory block and grabbing the altmap from there is a bit >> unfortunate. >> >> Why can't we do that when adding the altmap? Will the hypervisor scream at >> us? >> > calling __arch_set_page_nodat() before making memory block accessible > will lead to crash. Hence, we think this is the only safe location to > place it. > >> ... would we want to communicate any altmap start+size via the memory >> notifier instead? > > Passing start, size of memory range via memory notifier looks correct > approach to me, as we try to make the specified range accessible. > > If we want to pass altmap size (nr_vmemmap_pages), then we might need a > new field in struct memory_notify, which would prevent access of > memory_block->altmap->free in the notifier. > > Do you want to take this approach instead? > > If yes, Then I could add a new field nr_vmemmap_pages in struct > memory_notify and place it in PATCH : introduce > MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. Yes, see my other mail. That's probably cleanest! -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-11-28 7:54 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-27 8:20 [PATCH v3 0/5] implement "memmap on memory" feature on s390 Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers Sumanth Korikkar 2023-11-27 15:16 ` David Hildenbrand 2023-11-28 7:54 ` Sumanth Korikkar 2023-11-27 15:50 ` David Hildenbrand 2023-11-27 8:20 ` [PATCH v3 2/5] s390/mm: allocate vmemmap pages from self-contained memory range Sumanth Korikkar 2023-11-27 8:20 ` [PATCH v3 3/5] s390/sclp: remove unhandled memory notifier type Sumanth Korikkar 2023-11-27 15:02 ` David Hildenbrand [not found] ` <20231127082023.2079810-5-sumanthk@linux.ibm.com> 2023-11-27 15:11 ` [PATCH v3 4/5] s390/mm: implement MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers David Hildenbrand 2023-11-27 16:12 ` Sumanth Korikkar 2023-11-27 16:58 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).