* [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove [not found] <cover.1511433386.git.ar@linux.vnet.ibm.com> @ 2017-11-24 10:22 ` Andrea Reale [not found] ` <4e21a27570f665793debf167c8567c6752116d0a.1511433386.git.ar@linux.vnet.ibm.com> 1 sibling, 0 replies; 14+ messages in thread From: Andrea Reale @ 2017-11-24 10:22 UTC (permalink / raw) To: linux-arm-kernel Cc: linux-kernel, linux-mm, m.bielski, arunks, mark.rutland, scott.branden, will.deacon, qiuxishi, catalin.marinas, mhocko, rafael.j.wysocki, linux-acpi Resending the patch adding linux-acpi in CC, as suggested by Rafael. Everyone else: apologies for the noise. Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") introduced an assumption whereas when control reaches remove_memory the corresponding memory has been already offlined. In that case, the acpi_memhotplug was making sure that the assumption held. This assumption, however, is not necessarily true if offlining and removal are not done by the same "controller" (for example, when first offlining via sysfs). Removing this assumption for the generic remove_memory code and moving it in the specific acpi_memhotplug code. This is a dependency for the software-aided arm64 offlining and removal process. Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> --- drivers/acpi/acpi_memhotplug.c | 2 +- include/linux/memory_hotplug.h | 9 ++++++--- mm/memory_hotplug.c | 13 +++++++++---- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 6b0d3ef..b0126a0 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) nid = memory_add_physaddr_to_nid(info->start_addr); acpi_unbind_memory_blocks(info); - remove_memory(nid, info->start_addr, info->length); + BUG_ON(remove_memory(nid, info->start_addr, info->length)); list_del(&info->list); kfree(info); } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 58e110a..1a9c7b2 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -295,7 +295,7 @@ static inline bool movable_node_is_enabled(void) extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); -extern void remove_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); #else static inline bool is_mem_section_removable(unsigned long pfn, @@ -311,7 +311,10 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) return -EINVAL; } -static inline void remove_memory(int nid, u64 start, u64 size) {} +static inline int remove_memory(int nid, u64 start, u64 size) +{ + return -EINVAL; +} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, @@ -323,7 +326,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern bool is_memblock_offlined(struct memory_block *mem); -extern void remove_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, unsigned long map_offset); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d4b5f29..d5f15af 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1892,7 +1892,7 @@ EXPORT_SYMBOL(try_offline_node); * and online/offline operations before this call, as required by * try_offline_node(). */ -void __ref remove_memory(int nid, u64 start, u64 size) +int __ref remove_memory(int nid, u64 start, u64 size) { int ret; @@ -1908,18 +1908,23 @@ void __ref remove_memory(int nid, u64 start, u64 size) ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, check_memblock_offlined_cb); if (ret) - BUG(); + goto end_remove; + + ret = arch_remove_memory(start, size); + + if (ret) + goto end_remove; /* remove memmap entry */ firmware_map_remove(start, start + size, "System RAM"); memblock_free(start, size); memblock_remove(start, size); - arch_remove_memory(start, size); - try_offline_node(nid); +end_remove: mem_hotplug_done(); + return ret; } EXPORT_SYMBOL_GPL(remove_memory); #endif /* CONFIG_MEMORY_HOTREMOVE */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <4e21a27570f665793debf167c8567c6752116d0a.1511433386.git.ar@linux.vnet.ibm.com>]
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove [not found] ` <4e21a27570f665793debf167c8567c6752116d0a.1511433386.git.ar@linux.vnet.ibm.com> @ 2017-11-24 14:39 ` Rafael J. Wysocki 2017-11-24 14:49 ` Andrea Reale 2017-11-29 0:49 ` joeyli 1 sibling, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2017-11-24 14:39 UTC (permalink / raw) To: Andrea Reale Cc: linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Michal Hocko, Rafael Wysocki, ACPI Devel Maling List On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > Everyone else: apologies for the noise. > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > introduced an assumption whereas when control > reaches remove_memory the corresponding memory has been already > offlined. In that case, the acpi_memhotplug was making sure that > the assumption held. > This assumption, however, is not necessarily true if offlining > and removal are not done by the same "controller" (for example, > when first offlining via sysfs). > > Removing this assumption for the generic remove_memory code > and moving it in the specific acpi_memhotplug code. This is > a dependency for the software-aided arm64 offlining and removal > process. > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > --- > drivers/acpi/acpi_memhotplug.c | 2 +- > include/linux/memory_hotplug.h | 9 ++++++--- > mm/memory_hotplug.c | 13 +++++++++---- > 3 files changed, 16 insertions(+), 8 deletions(-) > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > index 6b0d3ef..b0126a0 100644 > --- a/drivers/acpi/acpi_memhotplug.c > +++ b/drivers/acpi/acpi_memhotplug.c > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > nid = memory_add_physaddr_to_nid(info->start_addr); > > acpi_unbind_memory_blocks(info); > - remove_memory(nid, info->start_addr, info->length); > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); Why does this have to be BUG_ON()? Is it really necessary to kill the system here? If it is, please add a comment describing why continuing is not an option here. > list_del(&info->list); > kfree(info); > } Thanks, Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-24 14:39 ` Rafael J. Wysocki @ 2017-11-24 14:49 ` Andrea Reale 2017-11-24 15:43 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: Andrea Reale @ 2017-11-24 14:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Michal Hocko, Rafael Wysocki, ACPI Devel Maling List Hi Rafael, On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: > On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > Everyone else: apologies for the noise. > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > introduced an assumption whereas when control > > reaches remove_memory the corresponding memory has been already > > offlined. In that case, the acpi_memhotplug was making sure that > > the assumption held. > > This assumption, however, is not necessarily true if offlining > > and removal are not done by the same "controller" (for example, > > when first offlining via sysfs). > > > > Removing this assumption for the generic remove_memory code > > and moving it in the specific acpi_memhotplug code. This is > > a dependency for the software-aided arm64 offlining and removal > > process. > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > --- > > drivers/acpi/acpi_memhotplug.c | 2 +- > > include/linux/memory_hotplug.h | 9 ++++++--- > > mm/memory_hotplug.c | 13 +++++++++---- > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > index 6b0d3ef..b0126a0 100644 > > --- a/drivers/acpi/acpi_memhotplug.c > > +++ b/drivers/acpi/acpi_memhotplug.c > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > acpi_unbind_memory_blocks(info); > > - remove_memory(nid, info->start_addr, info->length); > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > Why does this have to be BUG_ON()? Is it really necessary to kill the > system here? Actually, I hoped you would help me understand that: that BUG() call was introduced by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") in memory_hoptlug.c:remove_memory()). Just reading at that commit my understanding was that you were assuming that acpi_memory_remove_memory() have already done the job of offlining the target memory, so there would be a bug if that wasn't the case. In my case, that assumption did not hold and I found that it might not hold for other platforms that do not use ACPI. In fact, the purpose of this patch is to move this assumption out of the generic hotplug code and move it to ACPI code where it originated. Thanks, Andrea > If it is, please add a comment describing why continuing is not an option here. > > > list_del(&info->list); > > kfree(info); > > } > > Thanks, > Rafael > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-24 14:49 ` Andrea Reale @ 2017-11-24 15:43 ` Michal Hocko 2017-11-24 15:54 ` Andrea Reale 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2017-11-24 15:43 UTC (permalink / raw) To: Andrea Reale Cc: Rafael J. Wysocki, linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Rafael Wysocki, ACPI Devel Maling List On Fri 24-11-17 14:49:17, Andrea Reale wrote: > Hi Rafael, > > On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: > > On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > > Everyone else: apologies for the noise. > > > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > introduced an assumption whereas when control > > > reaches remove_memory the corresponding memory has been already > > > offlined. In that case, the acpi_memhotplug was making sure that > > > the assumption held. > > > This assumption, however, is not necessarily true if offlining > > > and removal are not done by the same "controller" (for example, > > > when first offlining via sysfs). > > > > > > Removing this assumption for the generic remove_memory code > > > and moving it in the specific acpi_memhotplug code. This is > > > a dependency for the software-aided arm64 offlining and removal > > > process. > > > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > > --- > > > drivers/acpi/acpi_memhotplug.c | 2 +- > > > include/linux/memory_hotplug.h | 9 ++++++--- > > > mm/memory_hotplug.c | 13 +++++++++---- > > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > > index 6b0d3ef..b0126a0 100644 > > > --- a/drivers/acpi/acpi_memhotplug.c > > > +++ b/drivers/acpi/acpi_memhotplug.c > > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > > > acpi_unbind_memory_blocks(info); > > > - remove_memory(nid, info->start_addr, info->length); > > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > > > Why does this have to be BUG_ON()? Is it really necessary to kill the > > system here? > > Actually, I hoped you would help me understand that: that BUG() call was introduced > by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > in memory_hoptlug.c:remove_memory()). > > Just reading at that commit my understanding was that you were assuming > that acpi_memory_remove_memory() have already done the job of offlining > the target memory, so there would be a bug if that wasn't the case. > > In my case, that assumption did not hold and I found that it might not > hold for other platforms that do not use ACPI. In fact, the purpose of > this patch is to move this assumption out of the generic hotplug code > and move it to ACPI code where it originated. remove_memory failure is basically impossible to handle AFAIR. The original code to BUG in remove_memory is ugly as hell and we do not want to spread that out of that function. Instead we really want to get rid of it. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-24 15:43 ` Michal Hocko @ 2017-11-24 15:54 ` Andrea Reale 2017-11-24 18:17 ` Michal Hocko 2017-11-27 15:20 ` Robin Murphy 0 siblings, 2 replies; 14+ messages in thread From: Andrea Reale @ 2017-11-24 15:54 UTC (permalink / raw) To: Michal Hocko Cc: Rafael J. Wysocki, linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Rafael Wysocki, ACPI Devel Maling List On Fri 24 Nov 2017, 16:43, Michal Hocko wrote: > On Fri 24-11-17 14:49:17, Andrea Reale wrote: > > Hi Rafael, > > > > On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: > > > On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > > > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > > > Everyone else: apologies for the noise. > > > > > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > > introduced an assumption whereas when control > > > > reaches remove_memory the corresponding memory has been already > > > > offlined. In that case, the acpi_memhotplug was making sure that > > > > the assumption held. > > > > This assumption, however, is not necessarily true if offlining > > > > and removal are not done by the same "controller" (for example, > > > > when first offlining via sysfs). > > > > > > > > Removing this assumption for the generic remove_memory code > > > > and moving it in the specific acpi_memhotplug code. This is > > > > a dependency for the software-aided arm64 offlining and removal > > > > process. > > > > > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > > > --- > > > > drivers/acpi/acpi_memhotplug.c | 2 +- > > > > include/linux/memory_hotplug.h | 9 ++++++--- > > > > mm/memory_hotplug.c | 13 +++++++++---- > > > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > > > index 6b0d3ef..b0126a0 100644 > > > > --- a/drivers/acpi/acpi_memhotplug.c > > > > +++ b/drivers/acpi/acpi_memhotplug.c > > > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > > > > > acpi_unbind_memory_blocks(info); > > > > - remove_memory(nid, info->start_addr, info->length); > > > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > > > > > Why does this have to be BUG_ON()? Is it really necessary to kill the > > > system here? > > > > Actually, I hoped you would help me understand that: that BUG() call was introduced > > by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > in memory_hoptlug.c:remove_memory()). > > > > Just reading at that commit my understanding was that you were assuming > > that acpi_memory_remove_memory() have already done the job of offlining > > the target memory, so there would be a bug if that wasn't the case. > > > > In my case, that assumption did not hold and I found that it might not > > hold for other platforms that do not use ACPI. In fact, the purpose of > > this patch is to move this assumption out of the generic hotplug code > > and move it to ACPI code where it originated. > > remove_memory failure is basically impossible to handle AFAIR. The > original code to BUG in remove_memory is ugly as hell and we do not want > to spread that out of that function. Instead we really want to get rid > of it. Today, BUG() is called even in the simple case where remove fails because the section we are removing is not offline. I cannot see any need to BUG() in such a case: an error code seems more than sufficient to me. This is why this patch removes the BUG() call when the "offline" check fails from the generic code. It moves it back to the ACPI call, where the assumption originated. Honestlly, I cannot tell if it makes sense to BUG() there: I have nothing against removing it from ACPI hotplug too, but I don't know enough to feel free to change the acpi semantics myself, so I moved it there to keep the original behavior unchanged for x86 code. In this arm64 hot-remove port, offline and remove are done in two separate steps, and is conceivable that an user tries erroneusly to remove some section that he forgot to offline first: in that case, with the patch, remove will just report an erro without BUGing. Is my reasoning flawed? Cheers, Andrea > -- > Michal Hocko > SUSE Labs > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-24 15:54 ` Andrea Reale @ 2017-11-24 18:17 ` Michal Hocko 2017-11-29 1:20 ` joeyli 2017-11-27 15:20 ` Robin Murphy 1 sibling, 1 reply; 14+ messages in thread From: Michal Hocko @ 2017-11-24 18:17 UTC (permalink / raw) To: Andrea Reale Cc: Rafael J. Wysocki, linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Rafael Wysocki, ACPI Devel Maling List On Fri 24-11-17 15:54:59, Andrea Reale wrote: > On Fri 24 Nov 2017, 16:43, Michal Hocko wrote: > > On Fri 24-11-17 14:49:17, Andrea Reale wrote: > > > Hi Rafael, > > > > > > On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: > > > > On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > > > > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > > > > Everyone else: apologies for the noise. > > > > > > > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > > > introduced an assumption whereas when control > > > > > reaches remove_memory the corresponding memory has been already > > > > > offlined. In that case, the acpi_memhotplug was making sure that > > > > > the assumption held. > > > > > This assumption, however, is not necessarily true if offlining > > > > > and removal are not done by the same "controller" (for example, > > > > > when first offlining via sysfs). > > > > > > > > > > Removing this assumption for the generic remove_memory code > > > > > and moving it in the specific acpi_memhotplug code. This is > > > > > a dependency for the software-aided arm64 offlining and removal > > > > > process. > > > > > > > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > > > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > > > > --- > > > > > drivers/acpi/acpi_memhotplug.c | 2 +- > > > > > include/linux/memory_hotplug.h | 9 ++++++--- > > > > > mm/memory_hotplug.c | 13 +++++++++---- > > > > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > > > > index 6b0d3ef..b0126a0 100644 > > > > > --- a/drivers/acpi/acpi_memhotplug.c > > > > > +++ b/drivers/acpi/acpi_memhotplug.c > > > > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > > > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > > > > > > > acpi_unbind_memory_blocks(info); > > > > > - remove_memory(nid, info->start_addr, info->length); > > > > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > > > > > > > Why does this have to be BUG_ON()? Is it really necessary to kill the > > > > system here? > > > > > > Actually, I hoped you would help me understand that: that BUG() call was introduced > > > by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > in memory_hoptlug.c:remove_memory()). > > > > > > Just reading at that commit my understanding was that you were assuming > > > that acpi_memory_remove_memory() have already done the job of offlining > > > the target memory, so there would be a bug if that wasn't the case. > > > > > > In my case, that assumption did not hold and I found that it might not > > > hold for other platforms that do not use ACPI. In fact, the purpose of > > > this patch is to move this assumption out of the generic hotplug code > > > and move it to ACPI code where it originated. > > > > remove_memory failure is basically impossible to handle AFAIR. The > > original code to BUG in remove_memory is ugly as hell and we do not want > > to spread that out of that function. Instead we really want to get rid > > of it. > > Today, BUG() is called even in the simple case where remove fails > because the section we are removing is not offline. You cannot hotremove memory which is still online. This is what caller should enforce. This is too late to handle the failure. At least for ACPI. > I cannot see any need to > BUG() in such a case: an error code seems more than sufficient to me. I do not rememeber details but AFAIR ACPI is in a deferred (kworker) context here and cannot simply communicate error code down the road. I agree that we should be able to simply return an error but what is the actual error condition that might happen here? > This is why this patch removes the BUG() call when the "offline" check > fails from the generic code. As I've said we should simply get rid of BUG rather than move it around. > It moves it back to the ACPI call, where the assumption > originated. Honestlly, I cannot tell if it makes sense to BUG() there: > I have nothing against removing it from ACPI hotplug too, but > I don't know enough to feel free to change the acpi semantics myself, so I > moved it there to keep the original behavior unchanged for x86 code. Heh, yeah that is an easier path for sure. I would prefer sorting this out ;) Not that I would enforce that, though. My concern is that the previous hotplug development followed this "I do not understand exactly so I will simply put my on top of existing code" mantra and it ended up in a huge mess. > In this arm64 hot-remove port, offline and remove are done in two separate > steps, and is conceivable that an user tries erroneusly to remove some > section that he forgot to offline first: in that case, with the patch, > remove will just report an erro without BUGing. As I've said it is the caller to enforce that. > Is my reasoning flawed? I wouldn't say flawed but this is a low-level call that should already happen in a reasonable context. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-24 18:17 ` Michal Hocko @ 2017-11-29 1:20 ` joeyli 2017-11-30 9:47 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: joeyli @ 2017-11-29 1:20 UTC (permalink / raw) To: Michal Hocko Cc: Andrea Reale, Rafael J. Wysocki, linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Rafael Wysocki, ACPI Devel Maling List On Fri, Nov 24, 2017 at 07:17:41PM +0100, Michal Hocko wrote: > On Fri 24-11-17 15:54:59, Andrea Reale wrote: > > On Fri 24 Nov 2017, 16:43, Michal Hocko wrote: > > > On Fri 24-11-17 14:49:17, Andrea Reale wrote: > > > > Hi Rafael, > > > > > > > > On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: > > > > > On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > > > > > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > > > > > Everyone else: apologies for the noise. > > > > > > > > > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > > > > introduced an assumption whereas when control > > > > > > reaches remove_memory the corresponding memory has been already > > > > > > offlined. In that case, the acpi_memhotplug was making sure that > > > > > > the assumption held. > > > > > > This assumption, however, is not necessarily true if offlining > > > > > > and removal are not done by the same "controller" (for example, > > > > > > when first offlining via sysfs). > > > > > > > > > > > > Removing this assumption for the generic remove_memory code > > > > > > and moving it in the specific acpi_memhotplug code. This is > > > > > > a dependency for the software-aided arm64 offlining and removal > > > > > > process. > > > > > > > > > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > > > > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > > > > > --- > > > > > > drivers/acpi/acpi_memhotplug.c | 2 +- > > > > > > include/linux/memory_hotplug.h | 9 ++++++--- > > > > > > mm/memory_hotplug.c | 13 +++++++++---- > > > > > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > > > > > index 6b0d3ef..b0126a0 100644 > > > > > > --- a/drivers/acpi/acpi_memhotplug.c > > > > > > +++ b/drivers/acpi/acpi_memhotplug.c > > > > > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > > > > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > > > > > > > > > acpi_unbind_memory_blocks(info); > > > > > > - remove_memory(nid, info->start_addr, info->length); > > > > > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > > > > > > > > > Why does this have to be BUG_ON()? Is it really necessary to kill the > > > > > system here? > > > > > > > > Actually, I hoped you would help me understand that: that BUG() call was introduced > > > > by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > > in memory_hoptlug.c:remove_memory()). > > > > > > > > Just reading at that commit my understanding was that you were assuming > > > > that acpi_memory_remove_memory() have already done the job of offlining > > > > the target memory, so there would be a bug if that wasn't the case. > > > > > > > > In my case, that assumption did not hold and I found that it might not > > > > hold for other platforms that do not use ACPI. In fact, the purpose of > > > > this patch is to move this assumption out of the generic hotplug code > > > > and move it to ACPI code where it originated. > > > > > > remove_memory failure is basically impossible to handle AFAIR. The > > > original code to BUG in remove_memory is ugly as hell and we do not want > > > to spread that out of that function. Instead we really want to get rid > > > of it. > > > > Today, BUG() is called even in the simple case where remove fails > > because the section we are removing is not offline. > > You cannot hotremove memory which is still online. This is what caller > should enforce. This is too late to handle the failure. At least for > ACPI. > The logic in acpi_scan_hot_remove() calls memory_subsys_offline(). If there doesn't have any error returns by memory_subsys_offline, then ACPI assumes all devices are offlined by subsystem (memory subsystem in this case). Then system moves to remove stage, ACPI calls acpi_memory_device_remove(). Here > > I cannot see any need to > > BUG() in such a case: an error code seems more than sufficient to me. > > I do not rememeber details but AFAIR ACPI is in a deferred (kworker) > context here and cannot simply communicate error code down the road. > I agree that we should be able to simply return an error but what is the > actual error condition that might happen here? > Currently acpi_bus_trim() didn't handle any return error. If subsystem returns error, then ACPI can only interrupt hot-remove process. > > This is why this patch removes the BUG() call when the "offline" check > > fails from the generic code. > > As I've said we should simply get rid of BUG rather than move it around. > As I remember that the original BUG() helped us to find out a bug about the offline state doesn't sync between memblock device with memory state. Something likes: mem->dev.offline != (mem->state == MEM_OFFLINE) So, the BUG() is useful to capture bug about state sync between device object and subsystem object. Thanks Joey Lee ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-29 1:20 ` joeyli @ 2017-11-30 9:47 ` Michal Hocko 0 siblings, 0 replies; 14+ messages in thread From: Michal Hocko @ 2017-11-30 9:47 UTC (permalink / raw) To: joeyli Cc: Andrea Reale, Rafael J. Wysocki, linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Rafael Wysocki, ACPI Devel Maling List On Wed 29-11-17 09:20:40, Joey Lee wrote: > On Fri, Nov 24, 2017 at 07:17:41PM +0100, Michal Hocko wrote: [...] > > You cannot hotremove memory which is still online. This is what caller > > should enforce. This is too late to handle the failure. At least for > > ACPI. > > > > The logic in acpi_scan_hot_remove() calls memory_subsys_offline(). If > there doesn't have any error returns by memory_subsys_offline, then ACPI > assumes all devices are offlined by subsystem (memory subsystem in this case). yes, that is what I meant by calling it caller responsibility > Then system moves to remove stage, ACPI calls acpi_memory_device_remove(). > Here > > > > I cannot see any need to > > > BUG() in such a case: an error code seems more than sufficient to me. > > > > I do not rememeber details but AFAIR ACPI is in a deferred (kworker) > > context here and cannot simply communicate error code down the road. > > I agree that we should be able to simply return an error but what is the > > actual error condition that might happen here? > > > > Currently acpi_bus_trim() didn't handle any return error. If subsystem > returns error, then ACPI can only interrupt hot-remove process. > > > > This is why this patch removes the BUG() call when the "offline" check > > > fails from the generic code. > > > > As I've said we should simply get rid of BUG rather than move it around. > > > > As I remember that the original BUG() helped us to find out a bug about the > offline state doesn't sync between memblock device with memory state. > Something likes: > mem->dev.offline != (mem->state == MEM_OFFLINE) > > So, the BUG() is useful to capture bug about state sync between device object > and subsystem object. BUG is a fatal condition under many contexts. And therefore not an appropriate error handling. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-24 15:54 ` Andrea Reale 2017-11-24 18:17 ` Michal Hocko @ 2017-11-27 15:20 ` Robin Murphy 2017-11-27 17:44 ` Andrea Reale 1 sibling, 1 reply; 14+ messages in thread From: Robin Murphy @ 2017-11-27 15:20 UTC (permalink / raw) To: Andrea Reale, Michal Hocko Cc: Mark Rutland, Rafael Wysocki, m.bielski, ACPI Devel Maling List, Rafael J. Wysocki, Catalin Marinas, scott.branden, Will Deacon, Linux Kernel Mailing List, Linux Memory Management List, arunks, qiuxishi, linux-arm-kernel@lists.infradead.org On 24/11/17 15:54, Andrea Reale wrote: > On Fri 24 Nov 2017, 16:43, Michal Hocko wrote: >> On Fri 24-11-17 14:49:17, Andrea Reale wrote: >>> Hi Rafael, >>> >>> On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: >>>> On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: >>>>> Resending the patch adding linux-acpi in CC, as suggested by Rafael. >>>>> Everyone else: apologies for the noise. >>>>> >>>>> Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") >>>>> introduced an assumption whereas when control >>>>> reaches remove_memory the corresponding memory has been already >>>>> offlined. In that case, the acpi_memhotplug was making sure that >>>>> the assumption held. >>>>> This assumption, however, is not necessarily true if offlining >>>>> and removal are not done by the same "controller" (for example, >>>>> when first offlining via sysfs). >>>>> >>>>> Removing this assumption for the generic remove_memory code >>>>> and moving it in the specific acpi_memhotplug code. This is >>>>> a dependency for the software-aided arm64 offlining and removal >>>>> process. >>>>> >>>>> Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> >>>>> Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> >>>>> --- >>>>> drivers/acpi/acpi_memhotplug.c | 2 +- >>>>> include/linux/memory_hotplug.h | 9 ++++++--- >>>>> mm/memory_hotplug.c | 13 +++++++++---- >>>>> 3 files changed, 16 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c >>>>> index 6b0d3ef..b0126a0 100644 >>>>> --- a/drivers/acpi/acpi_memhotplug.c >>>>> +++ b/drivers/acpi/acpi_memhotplug.c >>>>> @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) >>>>> nid = memory_add_physaddr_to_nid(info->start_addr); >>>>> >>>>> acpi_unbind_memory_blocks(info); >>>>> - remove_memory(nid, info->start_addr, info->length); >>>>> + BUG_ON(remove_memory(nid, info->start_addr, info->length)); >>>> >>>> Why does this have to be BUG_ON()? Is it really necessary to kill the >>>> system here? >>> >>> Actually, I hoped you would help me understand that: that BUG() call was introduced >>> by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") >>> in memory_hoptlug.c:remove_memory()). >>> >>> Just reading at that commit my understanding was that you were assuming >>> that acpi_memory_remove_memory() have already done the job of offlining >>> the target memory, so there would be a bug if that wasn't the case. >>> >>> In my case, that assumption did not hold and I found that it might not >>> hold for other platforms that do not use ACPI. In fact, the purpose of >>> this patch is to move this assumption out of the generic hotplug code >>> and move it to ACPI code where it originated. >> >> remove_memory failure is basically impossible to handle AFAIR. The >> original code to BUG in remove_memory is ugly as hell and we do not want >> to spread that out of that function. Instead we really want to get rid >> of it. > > Today, BUG() is called even in the simple case where remove fails > because the section we are removing is not offline. I cannot see any need to > BUG() in such a case: an error code seems more than sufficient to me. > This is why this patch removes the BUG() call when the "offline" check > fails from the generic code. > It moves it back to the ACPI call, where the assumption > originated. Honestlly, I cannot tell if it makes sense to BUG() there: > I have nothing against removing it from ACPI hotplug too, but > I don't know enough to feel free to change the acpi semantics myself, so I > moved it there to keep the original behavior unchanged for x86 code. > > In this arm64 hot-remove port, offline and remove are done in two separate > steps, and is conceivable that an user tries erroneusly to remove some > section that he forgot to offline first: in that case, with the patch, > remove will just report an erro without BUGing. The user can already kill the system by misusing the sysfs probe driver; should similar theoretical misuse of your sysfs remove driver really need to be all that different? > Is my reasoning flawed? Furthermore, even if your driver does want to enforce this, I don't see why it can't just do the equivalent of memory_subsys_offline() itself before even trying to call remove_memory(). Robin. > > Cheers, > Andrea > >> -- >> Michal Hocko >> SUSE Labs >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-27 15:20 ` Robin Murphy @ 2017-11-27 17:44 ` Andrea Reale 0 siblings, 0 replies; 14+ messages in thread From: Andrea Reale @ 2017-11-27 17:44 UTC (permalink / raw) To: Robin Murphy Cc: Michal Hocko, Mark Rutland, Rafael Wysocki, m.bielski, ACPI Devel Maling List, Rafael J. Wysocki, Catalin Marinas, scott.branden, Will Deacon, Linux Kernel Mailing List, Linux Memory Management List, arunks, qiuxishi, linux-arm-kernel@lists.infradead.org Hi again, On Mon 27 Nov 2017, 15:20, Robin Murphy wrote: > On 24/11/17 15:54, Andrea Reale wrote: > >On Fri 24 Nov 2017, 16:43, Michal Hocko wrote: > >>On Fri 24-11-17 14:49:17, Andrea Reale wrote: > >>>Hi Rafael, > >>> > >>>On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote: > >>>>On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > >>>>>Resending the patch adding linux-acpi in CC, as suggested by Rafael. > >>>>>Everyone else: apologies for the noise. > >>>>> > >>>>>Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > >>>>>introduced an assumption whereas when control > >>>>>reaches remove_memory the corresponding memory has been already > >>>>>offlined. In that case, the acpi_memhotplug was making sure that > >>>>>the assumption held. > >>>>>This assumption, however, is not necessarily true if offlining > >>>>>and removal are not done by the same "controller" (for example, > >>>>>when first offlining via sysfs). > >>>>> > >>>>>Removing this assumption for the generic remove_memory code > >>>>>and moving it in the specific acpi_memhotplug code. This is > >>>>>a dependency for the software-aided arm64 offlining and removal > >>>>>process. > >>>>> > >>>>>Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > >>>>>Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > >>>>>--- > >>>>> drivers/acpi/acpi_memhotplug.c | 2 +- > >>>>> include/linux/memory_hotplug.h | 9 ++++++--- > >>>>> mm/memory_hotplug.c | 13 +++++++++---- > >>>>> 3 files changed, 16 insertions(+), 8 deletions(-) > >>>>> > >>>>>diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > >>>>>index 6b0d3ef..b0126a0 100644 > >>>>>--- a/drivers/acpi/acpi_memhotplug.c > >>>>>+++ b/drivers/acpi/acpi_memhotplug.c > >>>>>@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > >>>>> nid = memory_add_physaddr_to_nid(info->start_addr); > >>>>> > >>>>> acpi_unbind_memory_blocks(info); > >>>>>- remove_memory(nid, info->start_addr, info->length); > >>>>>+ BUG_ON(remove_memory(nid, info->start_addr, info->length)); > >>>> > >>>>Why does this have to be BUG_ON()? Is it really necessary to kill the > >>>>system here? > >>> > >>>Actually, I hoped you would help me understand that: that BUG() call was introduced > >>>by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > >>>in memory_hoptlug.c:remove_memory()). > >>> > >>>Just reading at that commit my understanding was that you were assuming > >>>that acpi_memory_remove_memory() have already done the job of offlining > >>>the target memory, so there would be a bug if that wasn't the case. > >>> > >>>In my case, that assumption did not hold and I found that it might not > >>>hold for other platforms that do not use ACPI. In fact, the purpose of > >>>this patch is to move this assumption out of the generic hotplug code > >>>and move it to ACPI code where it originated. > >> > >>remove_memory failure is basically impossible to handle AFAIR. The > >>original code to BUG in remove_memory is ugly as hell and we do not want > >>to spread that out of that function. Instead we really want to get rid > >>of it. > > > >Today, BUG() is called even in the simple case where remove fails > >because the section we are removing is not offline. I cannot see any need to > >BUG() in such a case: an error code seems more than sufficient to me. > >This is why this patch removes the BUG() call when the "offline" check > >fails from the generic code. > >It moves it back to the ACPI call, where the assumption > >originated. Honestlly, I cannot tell if it makes sense to BUG() there: > >I have nothing against removing it from ACPI hotplug too, but > >I don't know enough to feel free to change the acpi semantics myself, so I > >moved it there to keep the original behavior unchanged for x86 code. > > > >In this arm64 hot-remove port, offline and remove are done in two separate > >steps, and is conceivable that an user tries erroneusly to remove some > >section that he forgot to offline first: in that case, with the patch, > >remove will just report an erro without BUGing. > > The user can already kill the system by misusing the sysfs probe driver; > should similar theoretical misuse of your sysfs remove driver really need to > be all that different? > > >Is my reasoning flawed? > > Furthermore, even if your driver does want to enforce this, I don't see why > it can't just do the equivalent of memory_subsys_offline() itself before > even trying to call remove_memory(). > > Robin. My whole point is that I do not see any good reason to kill the system when an hot-remove fails. My guess is that the original assumption is that - once a memory is successfully offlined - then hot remove should always succeed. Even if we assume that offlining and removal are always done in one single step (but then why expose the separate sysfs handle to offline without removing memory), I don't see that as a good excuse to kill the system: there is no critical kernel state being compromised AFAICT, so we can leave the system happily running with an hot remove that did not succeed. Thanks, Andrea > > > >Cheers, > >Andrea > > > >>-- > >>Michal Hocko > >>SUSE Labs > >>-- > >>To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > >>the body of a message to majordomo@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > > >_______________________________________________ > >linux-arm-kernel mailing list > >linux-arm-kernel@lists.infradead.org > >http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove [not found] ` <4e21a27570f665793debf167c8567c6752116d0a.1511433386.git.ar@linux.vnet.ibm.com> 2017-11-24 14:39 ` Rafael J. Wysocki @ 2017-11-29 0:49 ` joeyli 2017-11-29 1:52 ` joeyli 1 sibling, 1 reply; 14+ messages in thread From: joeyli @ 2017-11-29 0:49 UTC (permalink / raw) To: Andrea Reale Cc: linux-arm-kernel, linux-kernel, linux-mm, m.bielski, arunks, mark.rutland, scott.branden, will.deacon, qiuxishi, catalin.marinas, mhocko, rafael.j.wysocki, linux-acpi Hi Andrea, On Fri, Nov 24, 2017 at 10:22:35AM +0000, Andrea Reale wrote: > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > Everyone else: apologies for the noise. > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > introduced an assumption whereas when control > reaches remove_memory the corresponding memory has been already > offlined. In that case, the acpi_memhotplug was making sure that > the assumption held. > This assumption, however, is not necessarily true if offlining > and removal are not done by the same "controller" (for example, > when first offlining via sysfs). > > Removing this assumption for the generic remove_memory code > and moving it in the specific acpi_memhotplug code. This is > a dependency for the software-aided arm64 offlining and removal > process. > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > --- > drivers/acpi/acpi_memhotplug.c | 2 +- > include/linux/memory_hotplug.h | 9 ++++++--- > mm/memory_hotplug.c | 13 +++++++++---- > 3 files changed, 16 insertions(+), 8 deletions(-) > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > index 6b0d3ef..b0126a0 100644 > --- a/drivers/acpi/acpi_memhotplug.c > +++ b/drivers/acpi/acpi_memhotplug.c > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > nid = memory_add_physaddr_to_nid(info->start_addr); > > acpi_unbind_memory_blocks(info); > - remove_memory(nid, info->start_addr, info->length); > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > list_del(&info->list); > kfree(info); > } > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 58e110a..1a9c7b2 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -295,7 +295,7 @@ static inline bool movable_node_is_enabled(void) > extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); > extern void try_offline_node(int nid); > extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); > -extern void remove_memory(int nid, u64 start, u64 size); > +extern int remove_memory(int nid, u64 start, u64 size); > > #else > static inline bool is_mem_section_removable(unsigned long pfn, > @@ -311,7 +311,10 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > return -EINVAL; > } > > -static inline void remove_memory(int nid, u64 start, u64 size) {} > +static inline int remove_memory(int nid, u64 start, u64 size) > +{ > + return -EINVAL; > +} > #endif /* CONFIG_MEMORY_HOTREMOVE */ > > extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, > @@ -323,7 +326,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > unsigned long nr_pages); > extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); > extern bool is_memblock_offlined(struct memory_block *mem); > -extern void remove_memory(int nid, u64 start, u64 size); > +extern int remove_memory(int nid, u64 start, u64 size); > extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn); > extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, > unsigned long map_offset); > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index d4b5f29..d5f15af 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1892,7 +1892,7 @@ EXPORT_SYMBOL(try_offline_node); > * and online/offline operations before this call, as required by > * try_offline_node(). > */ > -void __ref remove_memory(int nid, u64 start, u64 size) > +int __ref remove_memory(int nid, u64 start, u64 size) > { > int ret; > > @@ -1908,18 +1908,23 @@ void __ref remove_memory(int nid, u64 start, u64 size) > ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, > check_memblock_offlined_cb); > if (ret) > - BUG(); > + goto end_remove; > + > + ret = arch_remove_memory(start, size); > + > + if (ret) > + goto end_remove; The original code triggers BUG() when any memblock is not offlined. Why the new logic includes the result of arch_remove_memory()? But I agreed the we don't need BUG(). Returning a error is better. > > /* remove memmap entry */ > firmware_map_remove(start, start + size, "System RAM"); > memblock_free(start, size); > memblock_remove(start, size); > > - arch_remove_memory(start, size); > - > try_offline_node(nid); > > +end_remove: > mem_hotplug_done(); > + return ret; > } > EXPORT_SYMBOL_GPL(remove_memory); > #endif /* CONFIG_MEMORY_HOTREMOVE */ > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-29 0:49 ` joeyli @ 2017-11-29 1:52 ` joeyli 2017-12-04 11:28 ` Andrea Reale 0 siblings, 1 reply; 14+ messages in thread From: joeyli @ 2017-11-29 1:52 UTC (permalink / raw) To: Andrea Reale Cc: linux-arm-kernel, linux-kernel, linux-mm, m.bielski, arunks, mark.rutland, scott.branden, will.deacon, qiuxishi, catalin.marinas, mhocko, rafael.j.wysocki, linux-acpi On Wed, Nov 29, 2017 at 08:49:13AM +0800, joeyli wrote: > Hi Andrea, > > On Fri, Nov 24, 2017 at 10:22:35AM +0000, Andrea Reale wrote: > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > Everyone else: apologies for the noise. > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > introduced an assumption whereas when control > > reaches remove_memory the corresponding memory has been already > > offlined. In that case, the acpi_memhotplug was making sure that > > the assumption held. > > This assumption, however, is not necessarily true if offlining > > and removal are not done by the same "controller" (for example, > > when first offlining via sysfs). > > > > Removing this assumption for the generic remove_memory code > > and moving it in the specific acpi_memhotplug code. This is > > a dependency for the software-aided arm64 offlining and removal > > process. > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > --- > > drivers/acpi/acpi_memhotplug.c | 2 +- > > include/linux/memory_hotplug.h | 9 ++++++--- > > mm/memory_hotplug.c | 13 +++++++++---- > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > index 6b0d3ef..b0126a0 100644 > > --- a/drivers/acpi/acpi_memhotplug.c > > +++ b/drivers/acpi/acpi_memhotplug.c > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > acpi_unbind_memory_blocks(info); > > - remove_memory(nid, info->start_addr, info->length); > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > list_del(&info->list); > > kfree(info); > > } > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > > index 58e110a..1a9c7b2 100644 > > --- a/include/linux/memory_hotplug.h > > +++ b/include/linux/memory_hotplug.h > > @@ -295,7 +295,7 @@ static inline bool movable_node_is_enabled(void) > > extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); > > extern void try_offline_node(int nid); > > extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); > > -extern void remove_memory(int nid, u64 start, u64 size); > > +extern int remove_memory(int nid, u64 start, u64 size); > > > > #else > > static inline bool is_mem_section_removable(unsigned long pfn, > > @@ -311,7 +311,10 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > > return -EINVAL; > > } > > > > -static inline void remove_memory(int nid, u64 start, u64 size) {} > > +static inline int remove_memory(int nid, u64 start, u64 size) > > +{ > > + return -EINVAL; > > +} > > #endif /* CONFIG_MEMORY_HOTREMOVE */ > > > > extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, > > @@ -323,7 +326,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > > unsigned long nr_pages); > > extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); > > extern bool is_memblock_offlined(struct memory_block *mem); > > -extern void remove_memory(int nid, u64 start, u64 size); > > +extern int remove_memory(int nid, u64 start, u64 size); > > extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn); > > extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, > > unsigned long map_offset); > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index d4b5f29..d5f15af 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1892,7 +1892,7 @@ EXPORT_SYMBOL(try_offline_node); > > * and online/offline operations before this call, as required by > > * try_offline_node(). > > */ > > -void __ref remove_memory(int nid, u64 start, u64 size) > > +int __ref remove_memory(int nid, u64 start, u64 size) > > { > > int ret; > > > > @@ -1908,18 +1908,23 @@ void __ref remove_memory(int nid, u64 start, u64 size) > > ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, > > check_memblock_offlined_cb); > > if (ret) > > - BUG(); > > + goto end_remove; > > + > > + ret = arch_remove_memory(start, size); Should not include arch_remove_memory() to BUG(). > > + > > + if (ret) > > + goto end_remove; > > The original code triggers BUG() when any memblock is not offlined. Why > the new logic includes the result of arch_remove_memory()? > > But I agreed the we don't need BUG(). Returning a error is better. Actually, I lost one thing. The BUG() have caught a issue about the offline state doesn't sync between memory_block and device object. like: mem->dev.offline != (mem->state == MEM_OFFLINE) So, the BUG() is useful to capture state issue in memory subsystem. But, I understood your concern about the two steps offline/remove from userland. Maybe we should move the BUG() to somewhere but not just remove it. Or if we think that the BUG() is too intense, at least we should print out a error message, and ACPI should checks the return value from subsystem to interrupt memory-hotplug process. Thanks a lot! Joey Lee ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-11-29 1:52 ` joeyli @ 2017-12-04 11:28 ` Andrea Reale 2017-12-04 14:05 ` Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Andrea Reale @ 2017-12-04 11:28 UTC (permalink / raw) To: joeyli Cc: linux-arm-kernel, linux-kernel, linux-mm, m.bielski, arunks, mark.rutland, scott.branden, will.deacon, qiuxishi, catalin.marinas, mhocko, rafael.j.wysocki, linux-acpi Hi Joey, and thanks for your comments. Response inline: On Wed 29 Nov 2017, 09:52, joeyli wrote: > On Wed, Nov 29, 2017 at 08:49:13AM +0800, joeyli wrote: > > Hi Andrea, > > > > On Fri, Nov 24, 2017 at 10:22:35AM +0000, Andrea Reale wrote: > > > Resending the patch adding linux-acpi in CC, as suggested by Rafael. > > > Everyone else: apologies for the noise. > > > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal") > > > introduced an assumption whereas when control > > > reaches remove_memory the corresponding memory has been already > > > offlined. In that case, the acpi_memhotplug was making sure that > > > the assumption held. > > > This assumption, however, is not necessarily true if offlining > > > and removal are not done by the same "controller" (for example, > > > when first offlining via sysfs). > > > > > > Removing this assumption for the generic remove_memory code > > > and moving it in the specific acpi_memhotplug code. This is > > > a dependency for the software-aided arm64 offlining and removal > > > process. > > > > > > Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com> > > > Signed-off-by: Maciej Bielski <m.bielski@linux.vnet.ibm.com> > > > --- > > > drivers/acpi/acpi_memhotplug.c | 2 +- > > > include/linux/memory_hotplug.h | 9 ++++++--- > > > mm/memory_hotplug.c | 13 +++++++++---- > > > 3 files changed, 16 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > > index 6b0d3ef..b0126a0 100644 > > > --- a/drivers/acpi/acpi_memhotplug.c > > > +++ b/drivers/acpi/acpi_memhotplug.c > > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) > > > nid = memory_add_physaddr_to_nid(info->start_addr); > > > > > > acpi_unbind_memory_blocks(info); > > > - remove_memory(nid, info->start_addr, info->length); > > > + BUG_ON(remove_memory(nid, info->start_addr, info->length)); > > > list_del(&info->list); > > > kfree(info); > > > } > > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > > > index 58e110a..1a9c7b2 100644 > > > --- a/include/linux/memory_hotplug.h > > > +++ b/include/linux/memory_hotplug.h > > > @@ -295,7 +295,7 @@ static inline bool movable_node_is_enabled(void) > > > extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); > > > extern void try_offline_node(int nid); > > > extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); > > > -extern void remove_memory(int nid, u64 start, u64 size); > > > +extern int remove_memory(int nid, u64 start, u64 size); > > > > > > #else > > > static inline bool is_mem_section_removable(unsigned long pfn, > > > @@ -311,7 +311,10 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > > > return -EINVAL; > > > } > > > > > > -static inline void remove_memory(int nid, u64 start, u64 size) {} > > > +static inline int remove_memory(int nid, u64 start, u64 size) > > > +{ > > > + return -EINVAL; > > > +} > > > #endif /* CONFIG_MEMORY_HOTREMOVE */ > > > > > > extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, > > > @@ -323,7 +326,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > > > unsigned long nr_pages); > > > extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); > > > extern bool is_memblock_offlined(struct memory_block *mem); > > > -extern void remove_memory(int nid, u64 start, u64 size); > > > +extern int remove_memory(int nid, u64 start, u64 size); > > > extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn); > > > extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, > > > unsigned long map_offset); > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > > index d4b5f29..d5f15af 100644 > > > --- a/mm/memory_hotplug.c > > > +++ b/mm/memory_hotplug.c > > > @@ -1892,7 +1892,7 @@ EXPORT_SYMBOL(try_offline_node); > > > * and online/offline operations before this call, as required by > > > * try_offline_node(). > > > */ > > > -void __ref remove_memory(int nid, u64 start, u64 size) > > > +int __ref remove_memory(int nid, u64 start, u64 size) > > > { > > > int ret; > > > > > > @@ -1908,18 +1908,23 @@ void __ref remove_memory(int nid, u64 start, u64 size) > > > ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, > > > check_memblock_offlined_cb); > > > if (ret) > > > - BUG(); > > > + goto end_remove; > > > + > > > + ret = arch_remove_memory(start, size); > > Should not include arch_remove_memory() to BUG(). arch_remove_memory might also fail in some cases. In the arm64 implementation of this patchset, for example, it might fail in the (very rare) case when we would have to split a P[UM]D mapped section for removal (and we do not support that - see email thread here: https://lkml.org/lkml/2017/11/23/456). > > > + > > > + if (ret) > > > + goto end_remove; > > > > The original code triggers BUG() when any memblock is not offlined. Why > > the new logic includes the result of arch_remove_memory()? > > > > But I agreed the we don't need BUG(). Returning a error is better. > > Actually, I lost one thing. > > The BUG() have caught a issue about the offline state doesn't sync between > memory_block and device object. like: > mem->dev.offline != (mem->state == MEM_OFFLINE) > > So, the BUG() is useful to capture state issue in memory subsystem. But, I > understood your concern about the two steps offline/remove from userland. > > Maybe we should move the BUG() to somewhere but not just remove it. Or if > we think that the BUG() is too intense, at least we should print out a error > message, and ACPI should checks the return value from subsystem to > interrupt memory-hotplug process. In this patchset, BUG() is moved to acpi_memory_remove_memory(), the caller of arch_remove_memory(). However, I agree with Michal, that we should not BUG() here but rather halt the hotremove process and print some errors. Is there any state in ACPI that should be undone in case of hotremove errors or we can just stop the process "halfway"? > Thanks a lot! > Joey Lee Thanks, Andrea > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove 2017-12-04 11:28 ` Andrea Reale @ 2017-12-04 14:05 ` Rafael J. Wysocki 0 siblings, 0 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2017-12-04 14:05 UTC (permalink / raw) To: Andrea Reale Cc: joeyli, linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List, Linux Memory Management List, m.bielski, arunks, Mark Rutland, scott.branden, Will Deacon, qiuxishi, Catalin Marinas, Michal Hocko, Rafael Wysocki, ACPI Devel Maling List On Mon, Dec 4, 2017 at 12:28 PM, Andrea Reale <ar@linux.vnet.ibm.com> wrote: > Hi Joey, > > and thanks for your comments. Response inline: > [cut] >> >> So, the BUG() is useful to capture state issue in memory subsystem. But, I >> understood your concern about the two steps offline/remove from userland. >> >> Maybe we should move the BUG() to somewhere but not just remove it. Or if >> we think that the BUG() is too intense, at least we should print out a error >> message, and ACPI should checks the return value from subsystem to >> interrupt memory-hotplug process. > > In this patchset, BUG() is moved to acpi_memory_remove_memory(), > the caller of arch_remove_memory(). However, I agree with Michal, that > we should not BUG() here but rather halt the hotremove process and print > some errors. > Is there any state in ACPI that should be undone in case of hotremove > errors or we can just stop the process "halfway"? I have to recall a couple of things before answering this question, so that may take some time. Thanks, Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-12-04 14:05 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1511433386.git.ar@linux.vnet.ibm.com>
2017-11-24 10:22 ` [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove Andrea Reale
[not found] ` <4e21a27570f665793debf167c8567c6752116d0a.1511433386.git.ar@linux.vnet.ibm.com>
2017-11-24 14:39 ` Rafael J. Wysocki
2017-11-24 14:49 ` Andrea Reale
2017-11-24 15:43 ` Michal Hocko
2017-11-24 15:54 ` Andrea Reale
2017-11-24 18:17 ` Michal Hocko
2017-11-29 1:20 ` joeyli
2017-11-30 9:47 ` Michal Hocko
2017-11-27 15:20 ` Robin Murphy
2017-11-27 17:44 ` Andrea Reale
2017-11-29 0:49 ` joeyli
2017-11-29 1:52 ` joeyli
2017-12-04 11:28 ` Andrea Reale
2017-12-04 14:05 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox