* [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
Mathieu Malaterre, David Hildenbrand, linux-kernel, Wei Yang,
Arun KS, Qian Cai, Wei Yang, Igor Mammedov, akpm, linuxppc-dev,
Dan Williams, linux-arm-kernel, Oscar Salvador
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
By converting start and size to page granularity, we actually ignore
unaligned parts within a page instead of properly bailing out with an
error.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory_hotplug.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e096c987d261..762887b2358b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1051,16 +1051,11 @@ int try_online_node(int nid)
static int check_hotplug_memory_range(u64 start, u64 size)
{
- unsigned long block_sz = memory_block_size_bytes();
- u64 block_nr_pages = block_sz >> PAGE_SHIFT;
- u64 nr_pages = size >> PAGE_SHIFT;
- u64 start_pfn = PFN_DOWN(start);
-
/* memory range must be block size aligned */
- if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
- !IS_ALIGNED(nr_pages, block_nr_pages)) {
+ if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
+ !IS_ALIGNED(size, memory_block_size_bytes())) {
pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx",
- block_sz, start, size);
+ memory_block_size_bytes(), start, size);
return -EINVAL;
}
--
2.20.1
^ permalink raw reply related
* [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
Vasily Gorbik, linux-sh, David Hildenbrand, Heiko Carstens,
linux-kernel, Wei Yang, Mike Rapoport, Martin Schwidefsky,
Igor Mammedov, akpm, linuxppc-dev, Dan Williams, linux-arm-kernel
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
don't forget arch_add_memory()/arch_remove_memory() when unlocking
support.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/init.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 14d1eae9fe43..d552e330fbcc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -226,6 +226,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
unsigned long size_pages = PFN_DOWN(size);
int rc;
+ if (WARN_ON_ONCE(restrictions->altmap))
+ return -EINVAL;
+
rc = vmem_add_mapping(start, size);
if (rc)
return rc;
--
2.20.1
^ permalink raw reply related
* [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory()
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
Vasily Gorbik, linux-sh, David Hildenbrand, Heiko Carstens,
linux-kernel, Wei Yang, Mike Rapoport, Martin Schwidefsky,
Igor Mammedov, akpm, linuxppc-dev, Dan Williams, linux-arm-kernel
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
Will come in handy when wanting to handle errors after
arch_add_memory().
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/init.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index d552e330fbcc..14955e0a9fcf 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -243,12 +243,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
- /*
- * There is no hardware or firmware interface which could trigger a
- * hot memory remove on s390. So there is nothing that needs to be
- * implemented.
- */
- BUG();
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ __remove_pages(zone, start_pfn, nr_pages, altmap);
+ vmem_remove_mapping(start, size);
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
--
2.20.1
^ permalink raw reply related
* [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Mark Rutland, linux-s390, Ard Biesheuvel, linux-ia64, Yu Zhao,
Anshuman Khandual, linux-sh, Catalin Marinas, David Hildenbrand,
Will Deacon, linux-kernel, Wei Yang, Jun Yao, Chintan Pandya,
Igor Mammedov, akpm, Mike Rapoport, linuxppc-dev, Dan Williams,
linux-arm-kernel, Robin Murphy
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
A proper arch_remove_memory() implementation is on its way, which also
cleanly removes page tables in arch_add_memory() in case something goes
wrong.
As we want to use arch_remove_memory() in case something goes wrong
during memory hotplug after arch_add_memory() finished, let's add
a temporary hack that is sufficient enough until we get a proper
implementation that cleans up page table entries.
We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
patches.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a1bfc4413982..e569a543c384 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
restrictions);
}
+#ifdef CONFIG_MEMORY_HOTREMOVE
+void arch_remove_memory(int nid, u64 start, u64 size,
+ struct vmem_altmap *altmap)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct zone *zone;
+
+ /*
+ * FIXME: Cleanup page tables (also in arch_add_memory() in case
+ * adding fails). Until then, this function should only be used
+ * during memory hotplug (adding memory), not for memory
+ * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
+ * unlocked yet.
+ */
+ zone = page_zone(pfn_to_page(start_pfn));
+ __remove_pages(zone, start_pfn, nr_pages, altmap);
+}
+#endif
#endif
--
2.20.1
^ permalink raw reply related
* [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: linux-s390, linux-ia64, linux-sh, Greg Kroah-Hartman,
David Hildenbrand, linux-kernel, Wei Yang, Rafael J. Wysocki,
Igor Mammedov, akpm, linuxppc-dev, Dan Williams, linux-arm-kernel
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
We'll rework hotplug_memory_register() shortly, so it no longer consumes
pass a section.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/base/memory.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f180427e48f4..f914fa6fe350 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
return ret;
}
-static int init_memory_block(struct memory_block **memory,
- struct mem_section *section, unsigned long state)
+static int init_memory_block(struct memory_block **memory, int block_id,
+ unsigned long state)
{
struct memory_block *mem;
unsigned long start_pfn;
- int scn_nr;
int ret = 0;
mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
- scn_nr = __section_nr(section);
- mem->start_section_nr =
- base_memory_block_id(scn_nr) * sections_per_block;
+ mem->start_section_nr = block_id * sections_per_block;
mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
mem->state = state;
start_pfn = section_nr_to_pfn(mem->start_section_nr);
@@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
if (section_count == 0)
return 0;
- ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+ ret = init_memory_block(&mem, base_memory_block_id(base_section_nr),
+ MEM_ONLINE);
if (ret)
return ret;
mem->section_count = section_count;
@@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr)
*/
int hotplug_memory_register(int nid, struct mem_section *section)
{
+ int block_id = base_memory_block_id(__section_nr(section));
int ret = 0;
struct memory_block *mem;
@@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section *section)
mem->section_count++;
put_device(&mem->dev);
} else {
- ret = init_memory_block(&mem, section, MEM_OFFLINE);
+ ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
if (ret)
goto out;
mem->section_count++;
--
2.20.1
^ permalink raw reply related
* [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Oscar Salvador, Rich Felker, linux-ia64, Anshuman Khandual,
linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens, Arun KS,
Wei Yang, Masahiro Yamada, Michal Hocko, Paul Mackerras,
H. Peter Anvin, Thomas Gleixner, Rafael J. Wysocki, Qian Cai,
linux-s390, Yoshinori Sato, David Hildenbrand, Mike Rapoport,
Ingo Molnar, Fenghua Yu, Pavel Tatashin, Vasily Gorbik,
Rob Herring, mike.travis@hpe.com, Nicholas Piggin, Alex Deucher,
Mark Brown, Borislav Petkov, Andy Lutomirski, Dan Williams,
Chris Wilson, linux-arm-kernel, Tony Luck, Baoquan He,
Andrew Banman, Mathieu Malaterre, Greg Kroah-Hartman,
linux-kernel, Logan Gunthorpe, Wei Yang, Martin Schwidefsky,
Igor Mammedov, akpm, linuxppc-dev, David S. Miller,
Kirill A. Shutemov
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
We want to improve error handling while adding memory by allowing
to use arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
arch_add_memory()
rc = do_something();
if (rc) {
arch_remove_memory();
}
We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/arm64/mm/mmu.c | 2 --
arch/ia64/mm/init.c | 2 --
arch/powerpc/mm/mem.c | 2 --
arch/s390/mm/init.c | 2 --
arch/sh/mm/init.c | 2 --
arch/x86/mm/init_32.c | 2 --
arch/x86/mm/init_64.c | 2 --
drivers/base/memory.c | 2 --
include/linux/memory.h | 2 --
include/linux/memory_hotplug.h | 2 --
mm/memory_hotplug.c | 2 --
mm/sparse.c | 6 ------
12 files changed, 28 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e569a543c384..9ccd7539f2d4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
restrictions);
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
@@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
-#endif
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d28e29103bdb..aae75fd7b810 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
-#endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index e885fe2aafcc..e4bc2dc3f593 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start_pfn, nr_pages, restrictions);
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
void __ref arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
pr_warn("Hash collision while resizing HPT\n");
}
#endif
-#endif /* CONFIG_MEMORY_HOTPLUG */
#ifndef CONFIG_NEED_MULTIPLE_NODES
void __init mem_topology_setup(void)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 14955e0a9fcf..ffb81fe95c77 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return rc;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
@@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
vmem_remove_mapping(start, size);
}
-#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 13c6a6bb5fd9..dfdbaa50946e 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif
-#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
@@ -440,5 +439,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
zone = page_zone(pfn_to_page(start_pfn));
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
-#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index f265a4316179..4068abb9427f 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -860,7 +860,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start_pfn, nr_pages, restrictions);
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
@@ -872,7 +871,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
-#endif
int kernel_set_to_readonly __read_mostly;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 693aaf28d5fe..8335ac6e1112 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1196,7 +1196,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
remove_pagetable(start, end, false, altmap);
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
static void __meminit
kernel_physical_mapping_remove(unsigned long start, unsigned long end)
{
@@ -1221,7 +1220,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
kernel_physical_mapping_remove(start, start + size);
}
-#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
static struct kcore_list kcore_vsyscall;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f914fa6fe350..ac17c95a5f28 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -727,7 +727,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
return ret;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
static void
unregister_memory(struct memory_block *memory)
{
@@ -766,7 +765,6 @@ void unregister_memory_section(struct mem_section *section)
out_unlock:
mutex_unlock(&mem_sysfs_mutex);
}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
/* return true if the memory block is offlined, otherwise, return false */
bool is_memblock_offlined(struct memory_block *mem)
diff --git a/include/linux/memory.h b/include/linux/memory.h
index e1dc1bb2b787..474c7c60c8f2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
extern int register_memory_isolate_notifier(struct notifier_block *nb);
extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
int hotplug_memory_register(int nid, struct mem_section *section);
-#ifdef CONFIG_MEMORY_HOTREMOVE
extern void unregister_memory_section(struct mem_section *);
-#endif
extern int memory_dev_init(void);
extern int memory_notify(unsigned long val, void *v);
extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..2d4de313926d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
return movable_node_enabled;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
extern void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap);
extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
-#endif /* CONFIG_MEMORY_HOTREMOVE */
/*
* Do we want sysfs memblock files created. This will allow userspace to online
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 762887b2358b..4b9d2974f86c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -318,7 +318,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
return err;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
unsigned long start_pfn,
@@ -582,7 +581,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
set_zone_contiguous(zone);
}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
int set_online_page_callback(online_page_callback_t callback)
{
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..d1d5e05f5b8d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
vmemmap_free(start, end, altmap);
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
static void free_map_bootmem(struct page *memmap)
{
unsigned long start = (unsigned long)memmap;
@@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
vmemmap_free(start, end, NULL);
}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
#else
static struct page *__kmalloc_section_memmap(void)
{
@@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
get_order(sizeof(struct page) * PAGES_PER_SECTION));
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
static void free_map_bootmem(struct page *memmap)
{
unsigned long maps_section_nr, removing_section_nr, i;
@@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
put_page_bootmem(page);
}
}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
/**
@@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
return ret;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
#ifdef CONFIG_MEMORY_FAILURE
static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
{
@@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
PAGES_PER_SECTION - map_offset);
free_section_usemap(memmap, usemap, altmap);
}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
#endif /* CONFIG_MEMORY_HOTPLUG */
--
2.20.1
^ permalink raw reply related
* [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Michal Hocko, linux-ia64, linux-sh, Wei Yang, Arun KS,
Ingo Molnar, linux-s390, David Hildenbrand, Pavel Tatashin,
mike.travis@hpe.com, Qian Cai, Dan Williams, linux-arm-kernel,
Oscar Salvador, Andrew Banman, Mathieu Malaterre,
Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki,
Igor Mammedov, akpm, linuxppc-dev
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
Only memory to be added to the buddy and to be onlined/offlined by
user space using /sys/devices/system/memory/... needs (and should have!)
memory block devices.
Factor out creation of memory block devices. Create all devices after
arch_add_memory() succeeded. We can later drop the want_memblock parameter,
because it is now effectively stale.
Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space at
all before arch_add_memory() succeeded.
While at it
- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
- introduce find_memory_block_by_id() to search via block id
- Use find_memory_block_by_id() in init_memory_block() to catch
duplicates
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/base/memory.c | 82 +++++++++++++++++++++++++++---------------
include/linux/memory.h | 2 +-
mm/memory_hotplug.c | 15 ++++----
3 files changed, 63 insertions(+), 36 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index ac17c95a5f28..5a0370f0c506 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr)
return section_nr / sections_per_block;
}
+static inline int pfn_to_block_id(unsigned long pfn)
+{
+ return base_memory_block_id(pfn_to_section_nr(pfn));
+}
+
static int memory_subsys_online(struct device *dev);
static int memory_subsys_offline(struct device *dev);
@@ -582,10 +587,9 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
* A reference for the returned object is held and the reference for the
* hinted object is released.
*/
-struct memory_block *find_memory_block_hinted(struct mem_section *section,
- struct memory_block *hint)
+static struct memory_block *find_memory_block_by_id(int block_id,
+ struct memory_block *hint)
{
- int block_id = base_memory_block_id(__section_nr(section));
struct device *hintdev = hint ? &hint->dev : NULL;
struct device *dev;
@@ -597,6 +601,14 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section,
return to_memory_block(dev);
}
+struct memory_block *find_memory_block_hinted(struct mem_section *section,
+ struct memory_block *hint)
+{
+ int block_id = base_memory_block_id(__section_nr(section));
+
+ return find_memory_block_by_id(block_id, hint);
+}
+
/*
* For now, we have a linear search to go find the appropriate
* memory_block corresponding to a particular phys_index. If
@@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
unsigned long start_pfn;
int ret = 0;
+ mem = find_memory_block_by_id(block_id, NULL);
+ if (mem) {
+ put_device(&mem->dev);
+ return -EEXIST;
+ }
mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
@@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
return 0;
}
+static void unregister_memory(struct memory_block *memory)
+{
+ if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
+ return;
+
+ /* drop the ref. we got via find_memory_block() */
+ put_device(&memory->dev);
+ device_unregister(&memory->dev);
+}
+
/*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
+ * Create memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * will be initialized as offline.
*/
-int hotplug_memory_register(int nid, struct mem_section *section)
+int create_memory_block_devices(unsigned long start, unsigned long size)
{
- int block_id = base_memory_block_id(__section_nr(section));
- int ret = 0;
+ const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
+ int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
struct memory_block *mem;
+ unsigned long block_id;
+ int ret = 0;
- mutex_lock(&mem_sysfs_mutex);
+ if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
+ !IS_ALIGNED(size, memory_block_size_bytes())))
+ return -EINVAL;
- mem = find_memory_block(section);
- if (mem) {
- mem->section_count++;
- put_device(&mem->dev);
- } else {
+ mutex_lock(&mem_sysfs_mutex);
+ for (block_id = start_block_id; block_id != end_block_id; block_id++) {
ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
if (ret)
- goto out;
- mem->section_count++;
+ break;
+ mem->section_count = sections_per_block;
+ }
+ if (ret) {
+ end_block_id = block_id;
+ for (block_id = start_block_id; block_id != end_block_id;
+ block_id++) {
+ mem = find_memory_block_by_id(block_id, NULL);
+ mem->section_count = 0;
+ unregister_memory(mem);
+ }
}
-
-out:
mutex_unlock(&mem_sysfs_mutex);
return ret;
}
-static void
-unregister_memory(struct memory_block *memory)
-{
- BUG_ON(memory->dev.bus != &memory_subsys);
-
- /* drop the ref. we got via find_memory_block() */
- put_device(&memory->dev);
- device_unregister(&memory->dev);
-}
-
void unregister_memory_section(struct mem_section *section)
{
struct memory_block *mem;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 474c7c60c8f2..db3e8567f900 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
extern void unregister_memory_notifier(struct notifier_block *nb);
extern int register_memory_isolate_notifier(struct notifier_block *nb);
extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int create_memory_block_devices(unsigned long start, unsigned long size);
extern void unregister_memory_section(struct mem_section *);
extern int memory_dev_init(void);
extern int memory_notify(unsigned long val, void *v);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4b9d2974f86c..b1fde90bbf19 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -259,13 +259,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
return -EEXIST;
ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
- if (ret < 0)
- return ret;
-
- if (!want_memblock)
- return 0;
-
- return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+ return ret < 0 ? ret : 0;
}
/*
@@ -1107,6 +1101,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
if (ret < 0)
goto error;
+ /* create memory block devices after memory was added */
+ ret = create_memory_block_devices(start, size);
+ if (ret) {
+ arch_remove_memory(nid, start, size, NULL);
+ goto error;
+ }
+
if (new_node) {
/* If sysfs file of new node can't be created, cpu on the node
* can't be hot-added. There is no rollback way now.
--
2.20.1
^ permalink raw reply related
* [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
Pavel Tatashin, linux-sh, Mathieu Malaterre, Joonsoo Kim,
David Hildenbrand, linux-kernel, Wei Yang, Arun KS, Qian Cai,
Igor Mammedov, akpm, linuxppc-dev, Dan Williams, linux-arm-kernel
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
No longer needed, the callers of arch_add_memory() can handle this
manually.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/memory_hotplug.h | 8 --------
mm/memory_hotplug.c | 9 +++------
2 files changed, 3 insertions(+), 14 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2d4de313926d..2f1f87e13baa 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
-/*
- * Do we want sysfs memblock files created. This will allow userspace to online
- * and offline memory explicitly. Lack of this bit means that the caller has to
- * call move_pfn_range_to_zone to finish the initialization.
- */
-
-#define MHP_MEMBLOCK_API (1<<0)
-
/* reasonably generic interface to expand the physical pages */
extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b1fde90bbf19..9a92549ef23b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,7 +251,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
#endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
- struct vmem_altmap *altmap, bool want_memblock)
+ struct vmem_altmap *altmap)
{
int ret;
@@ -294,8 +294,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
}
for (i = start_sec; i <= end_sec; i++) {
- err = __add_section(nid, section_nr_to_pfn(i), altmap,
- restrictions->flags & MHP_MEMBLOCK_API);
+ err = __add_section(nid, section_nr_to_pfn(i), altmap);
/*
* EEXIST is finally dealt with by ioresource collision
@@ -1067,9 +1066,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
*/
int __ref add_memory_resource(int nid, struct resource *res)
{
- struct mhp_restrictions restrictions = {
- .flags = MHP_MEMBLOCK_API,
- };
+ struct mhp_restrictions restrictions = {};
u64 start, size;
bool new_node = false;
int ret;
--
2.20.1
^ permalink raw reply related
* [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Michal Hocko, linux-ia64, linux-sh, Wei Yang, Arun KS,
Ingo Molnar, Rafael J. Wysocki, linux-s390, David Hildenbrand,
Pavel Tatashin, mike.travis@hpe.com, Mark Brown, Jonathan Cameron,
Dan Williams, Chris Wilson, linux-arm-kernel, Oscar Salvador,
Andrew Banman, Mathieu Malaterre, Greg Kroah-Hartman,
linux-kernel, Alex Deucher, Igor Mammedov, akpm, linuxppc-dev,
David S. Miller
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().
This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/base/memory.c | 37 ++++++++++++++++++-------------------
drivers/base/node.c | 11 ++++++-----
include/linux/memory.h | 2 +-
include/linux/node.h | 6 ++----
mm/memory_hotplug.c | 5 +++--
5 files changed, 30 insertions(+), 31 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5a0370f0c506..f28efb0bf5c7 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
return ret;
}
-void unregister_memory_section(struct mem_section *section)
+/*
+ * Remove memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * have to be offline.
+ */
+void remove_memory_block_devices(unsigned long start, unsigned long size)
{
+ const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
+ const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
struct memory_block *mem;
+ int block_id;
- if (WARN_ON_ONCE(!present_section(section)))
+ if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
+ !IS_ALIGNED(size, memory_block_size_bytes())))
return;
mutex_lock(&mem_sysfs_mutex);
-
- /*
- * Some users of the memory hotplug do not want/need memblock to
- * track all sections. Skip over those.
- */
- mem = find_memory_block(section);
- if (!mem)
- goto out_unlock;
-
- unregister_mem_sect_under_nodes(mem, __section_nr(section));
-
- mem->section_count--;
- if (mem->section_count == 0)
+ for (block_id = start_block_id; block_id != end_block_id; block_id++) {
+ mem = find_memory_block_by_id(block_id, NULL);
+ if (WARN_ON_ONCE(!mem))
+ continue;
+ mem->section_count = 0;
+ unregister_memory_block_under_nodes(mem);
unregister_memory(mem);
- else
- put_device(&mem->dev);
-
-out_unlock:
+ }
mutex_unlock(&mem_sysfs_mutex);
}
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..04fdfa99b8bc 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
return 0;
}
-/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
- unsigned long phys_index)
+/*
+ * Unregister memory block device under all nodes that it spans.
+ */
+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
{
NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
return -ENOMEM;
nodes_clear(*unlinked_nodes);
- sect_start_pfn = section_nr_to_pfn(phys_index);
- sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+ sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+ sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
int nid;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index db3e8567f900..f26a5417ec5d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
extern int register_memory_isolate_notifier(struct notifier_block *nb);
extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
int create_memory_block_devices(unsigned long start, unsigned long size);
-extern void unregister_memory_section(struct mem_section *);
+void remove_memory_block_devices(unsigned long start, unsigned long size);
extern int memory_dev_init(void);
extern int memory_notify(unsigned long val, void *v);
extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..02a29e71b175 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int register_mem_sect_under_node(struct memory_block *mem_blk,
void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
- unsigned long phys_index);
+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
extern int register_memory_node_under_compute_node(unsigned int mem_nid,
unsigned int cpu_nid,
@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
{
return 0;
}
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
- unsigned long phys_index)
+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
{
return 0;
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9a92549ef23b..82136c5b4c5f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -520,8 +520,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
if (WARN_ON_ONCE(!valid_section(ms)))
return;
- unregister_memory_section(ms);
-
scn_nr = __section_nr(ms);
start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
__remove_zone(zone, start_pfn);
@@ -1845,6 +1843,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
memblock_free(start, size);
memblock_remove(start, size);
+ /* remove memory block devices before removing memory */
+ remove_memory_block_devices(start, size);
+
arch_remove_memory(nid, start, size, NULL);
__release_memory_resource(start, size);
--
2.20.1
^ permalink raw reply related
* [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: linux-s390, linux-ia64, linux-sh, Greg Kroah-Hartman, Mark Brown,
David Hildenbrand, linux-kernel, Wei Yang, Alex Deucher,
David S. Miller, Jonathan Cameron, Rafael J. Wysocki,
Igor Mammedov, akpm, Chris Wilson, linuxppc-dev, Dan Williams,
linux-arm-kernel, Oscar Salvador
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
We really don't want anything during memory hotunplug to fail.
We always pass a valid memory block device, that check can go. Avoid
allocating memory and eventually failing. As we are always called under
lock, we can use a static piece of memory. This avoids having to put
the structure onto the stack, having to guess about the stack size
of callers.
Patch inspired by a patch from Oscar Salvador.
In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove. Memory block devices
with mixed nodes (added during boot) should properly fenced off and never
removed.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/base/node.c | 18 +++++-------------
include/linux/node.h | 5 ++---
2 files changed, 7 insertions(+), 16 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04fdfa99b8bc..9be88fd05147 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
/*
* Unregister memory block device under all nodes that it spans.
+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
*/
-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
{
- NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
unsigned long pfn, sect_start_pfn, sect_end_pfn;
+ static nodemask_t unlinked_nodes;
- if (!mem_blk) {
- NODEMASK_FREE(unlinked_nodes);
- return -EFAULT;
- }
- if (!unlinked_nodes)
- return -ENOMEM;
- nodes_clear(*unlinked_nodes);
-
+ nodes_clear(unlinked_nodes);
sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
continue;
if (!node_online(nid))
continue;
- if (node_test_and_set(nid, *unlinked_nodes))
+ if (node_test_and_set(nid, unlinked_nodes))
continue;
sysfs_remove_link(&node_devices[nid]->dev.kobj,
kobject_name(&mem_blk->dev.kobj));
sysfs_remove_link(&mem_blk->dev.kobj,
kobject_name(&node_devices[nid]->dev.kobj));
}
- NODEMASK_FREE(unlinked_nodes);
- return 0;
}
int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 02a29e71b175..548c226966a2 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int register_mem_sect_under_node(struct memory_block *mem_blk,
void *arg);
-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
extern int register_memory_node_under_compute_node(unsigned int mem_nid,
unsigned int cpu_nid,
@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
{
return 0;
}
-static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
{
- return 0;
}
static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
--
2.20.1
^ permalink raw reply related
* [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: linux-s390, linux-ia64, linux-sh, David Hildenbrand, linux-kernel,
Wei Yang, Igor Mammedov, akpm, linuxppc-dev, Dan Williams,
linux-arm-kernel
In-Reply-To: <20190527111152.16324-1-david@redhat.com>
The parameter is unused, so let's drop it. Memory removal paths should
never care about zones. This is the job of memory offlining and will
require more refactorings.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/memory_hotplug.h | 2 +-
mm/memory_hotplug.c | 2 +-
mm/sparse.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2f1f87e13baa..1a4257c5f74c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
extern bool is_memblock_offlined(struct memory_block *mem);
extern int sparse_add_one_section(int nid, unsigned long start_pfn,
struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+extern void sparse_remove_one_section(struct mem_section *ms,
unsigned long map_offset, struct vmem_altmap *altmap);
extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 82136c5b4c5f..e48ec7b9dee2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -524,7 +524,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
__remove_zone(zone, start_pfn);
- sparse_remove_one_section(zone, ms, map_offset, altmap);
+ sparse_remove_one_section(ms, map_offset, altmap);
}
/**
diff --git a/mm/sparse.c b/mm/sparse.c
index d1d5e05f5b8d..1552c855d62a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap,
free_map_bootmem(memmap);
}
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
- unsigned long map_offset, struct vmem_altmap *altmap)
+void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset,
+ struct vmem_altmap *altmap)
{
struct page *memmap = NULL;
unsigned long *usemap = NULL;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v7 1/1] iommu: enhance IOMMU dma mode build options
From: Joerg Roedel @ 2019-05-27 14:21 UTC (permalink / raw)
To: Zhen Lei
Cc: linux-ia64, Sebastian Ott, linux-doc, Hanjun Guo, Heiko Carstens,
Paul Mackerras, H . Peter Anvin, linux-s390, Jonathan Corbet,
Jean-Philippe Brucker, x86, Ingo Molnar, Fenghua Yu, Will Deacon,
John Garry, linuxppc-dev, Borislav Petkov, Thomas Gleixner,
Gerald Schaefer, Tony Luck, David Woodhouse, linux-kernel, iommu,
Martin Schwidefsky, Robin Murphy
In-Reply-To: <20190520135947.14960-2-thunder.leizhen@huawei.com>
Hi Zhen Lei,
On Mon, May 20, 2019 at 09:59:47PM +0800, Zhen Lei wrote:
> arch/ia64/kernel/pci-dma.c | 2 +-
> arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
> arch/s390/pci/pci_dma.c | 2 +-
> arch/x86/kernel/pci-dma.c | 7 ++---
> drivers/iommu/Kconfig | 44 ++++++++++++++++++++++++++-----
> drivers/iommu/amd_iommu_init.c | 3 ++-
> drivers/iommu/intel-iommu.c | 2 +-
> drivers/iommu/iommu.c | 3 ++-
> 8 files changed, 48 insertions(+), 18 deletions(-)
This needs Acks from the arch maintainers of ia64, powerpc, s390 and
x86, at least.
It is easier for them if you split it up into the Kconfig change and
separete patches per arch and per iommu driver. Then collect the Acks on
the individual patches.
Thanks,
Joerg
^ permalink raw reply
* [Bug 203723] New: Build error: taking address of packed member of 'struct ftrace_graph_ent' may result in an unaligned pointer value
From: bugzilla-daemon @ 2019-05-27 14:36 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=203723
Bug ID: 203723
Summary: Build error: taking address of packed member of
'struct ftrace_graph_ent' may result in an unaligned
pointer value
Product: Platform Specific/Hardware
Version: 2.5
Kernel Version: 4.14.122
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: PPC-64
Assignee: platform_ppc-64@kernel-bugs.osdl.org
Reporter: jason@bluehome.net
Regression: No
Created attachment 282967
--> https://bugzilla.kernel.org/attachment.cgi?id=282967&action=edit
Build log
This error appears while building 4.14.122. I'm building with GCC 9.1 for
ppc64el.
make -f ./scripts/Makefile.build obj=arch/powerpc/kernel/trace
powerpc64le-linux-gcc -m64 -Wp,-MD,arch/powerpc/kernel/trace/.ftrace.o.d
-nostdinc -isystem
/home/jason/toolchain/bin/../lib/gcc/powerpc64le-linux/9.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-D__KERNEL__ -DCC_USING_MPROFILE_KERNEL -Iarch/powerpc -DHAVE_AS_ATHIGH=1 -Wall
-Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-fshort-wchar -Werror-implicit-function-declaration -Wno-format-security
-std=gnu89 -fno-PIE -msoft-float -pipe -Iarch/powerpc -mtraceback=no
-mabi=elfv2 -mcmodel=medium -mno-pointers-to-nested-functions -mcpu=power8
-mno-altivec -mno-vsx -funit-at-a-time -fno-dwarf2-cfi-asm -mno-string
-Wa,-maltivec -mlittle-endian -mno-strict-align -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow
-Wno-int-in-bool-context -Wno-attribute-alias -O2
--param=allow-store-data-races=0 -DCC_HAVE_ASM_GOTO -Wframe-larger-than=2048
-fno-stack-protector -Wno-unused-but-set-variable -Wno-unused-const-variable
-fno-var-tracking-assignments -Wdeclaration-after-statement -Wno-pointer-sign
-Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants
-fmerge-constants -fno-stack-check -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types
-Werror=designated-init -Wno-packed-not-aligned -Werror -Werror
-DKBUILD_BASENAME='"ftrace"' -DKBUILD_MODNAME='"ftrace"' -c -o
arch/powerpc/kernel/trace/ftrace.o arch/powerpc/kernel/trace/ftrace.c
arch/powerpc/kernel/trace/ftrace.c: In function 'prepare_ftrace_return':
arch/powerpc/kernel/trace/ftrace.c:596:43: error: taking address of packed
member of 'struct ftrace_graph_ent' may result in an unaligned pointer value
[-Werror=address-of-packed-member]
596 | if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
| ^~~~~~~~~~~~
cc1: all warnings being treated as errors
scripts/Makefile.build:326: recipe for target
'arch/powerpc/kernel/trace/ftrace.o' failed
make[2]: *** [arch/powerpc/kernel/trace/ftrace.o] Error 1
scripts/Makefile.build:585: recipe for target 'arch/powerpc/kernel/trace'
failed
make[1]: *** [arch/powerpc/kernel/trace] Error 2
Makefile:1038: recipe for target 'arch/powerpc/kernel' failed
make: *** [arch/powerpc/kernel] Error 2
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* [Bug 203725] New: Build error: 'init_module' specifies less restrictive attribute than its target 'rtas_flash_init': 'cold'
From: bugzilla-daemon @ 2019-05-27 14:42 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=203725
Bug ID: 203725
Summary: Build error: 'init_module' specifies less restrictive
attribute than its target 'rtas_flash_init': 'cold'
Product: Platform Specific/Hardware
Version: 2.5
Kernel Version: 4.19.46
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: PPC-64
Assignee: platform_ppc-64@kernel-bugs.osdl.org
Reporter: jason@bluehome.net
Regression: No
Created attachment 282969
--> https://bugzilla.kernel.org/attachment.cgi?id=282969&action=edit
Build log
I recently upgraded to GCC 9.1. This error appears while building 4.19.46. I'm
building with ppc64el. GCC 8.3 doesn't doesn't generate an error so that's my
workaround for now.
powerpc64le-linux-gcc -Wp,-MD,arch/powerpc/kernel/.rtas_flash.o.d -nostdinc
-isystem /home/jason/toolchain/bin/../lib/gcc/powerpc64le-linux/9.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -Iarch/powerpc
-DHAVE_AS_ATHIGH=1 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -fshort-wchar
-Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE
-DCC_HAVE_ASM_GOTO -mlittle-endian -m64 -msoft-float -pipe -Iarch/powerpc
-mtraceback=no -mabi=elfv2 -mcmodel=medium -mno-pointers-to-nested-functions
-mcpu=power8 -mtune=power9 -mno-altivec -mno-vsx -funit-at-a-time
-fno-dwarf2-cfi-asm -mno-string -Wa,-maltivec -Wa,-mpower4 -Wa,-many
-mno-strict-align -mlittle-endian -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow
-Wno-int-in-bool-context -O2 --param=allow-store-data-races=0
-Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable
-Wno-unused-const-variable -fno-var-tracking-assignments -pg -mprofile-kernel
-Wdeclaration-after-statement -Wno-pointer-sign -Wno-stringop-truncation
-fno-strict-overflow -fno-merge-all-constants -fmerge-constants
-fno-stack-check -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types
-Werror=designated-init -fmacro-prefix-map=./= -Wno-packed-not-aligned -Werror
-DMODULE -mno-save-toc-indirect -mcmodel=large
-DKBUILD_BASENAME='"rtas_flash"' -DKBUILD_MODNAME='"rtas_flash"' -c -o
arch/powerpc/kernel/rtas_flash.o arch/powerpc/kernel/rtas_flash.c
In file included from arch/powerpc/kernel/rtas_flash.c:16:
./include/linux/module.h:133:6: error: 'init_module' specifies less restrictive
attribute than its target 'rtas_flash_init': 'cold'
[-Werror=missing-attributes]
133 | int init_module(void) __attribute__((alias(#initfn)));
| ^~~~~~~~~~~
arch/powerpc/kernel/rtas_flash.c:779:1: note: in expansion of macro
'module_init'
779 | module_init(rtas_flash_init);
| ^~~~~~~~~~~
arch/powerpc/kernel/rtas_flash.c:703:19: note: 'init_module' target declared
here
703 | static int __init rtas_flash_init(void)
| ^~~~~~~~~~~~~~~
In file included from arch/powerpc/kernel/rtas_flash.c:16:
./include/linux/module.h:139:7: error: 'cleanup_module' specifies less
restrictive attribute than its target 'rtas_flash_cleanup': 'cold'
[-Werror=missing-attributes]
139 | void cleanup_module(void) __attribute__((alias(#exitfn)));
| ^~~~~~~~~~~~~~
arch/powerpc/kernel/rtas_flash.c:780:1: note: in expansion of macro
'module_exit'
780 | module_exit(rtas_flash_cleanup);
| ^~~~~~~~~~~
arch/powerpc/kernel/rtas_flash.c:759:20: note: 'cleanup_module' target declared
here
759 | static void __exit rtas_flash_cleanup(void)
| ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
scripts/Makefile.build:309: recipe for target
'arch/powerpc/kernel/rtas_flash.o' failed
make[1]: *** [arch/powerpc/kernel/rtas_flash.o] Error 1
Makefile:1051: recipe for target 'arch/powerpc/kernel' failed
make: *** [arch/powerpc/kernel] Error 2
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* Re: [PATCH] powerpc: Fix loading of kernel + initramfs with kexec_file_load()
From: Thiago Jung Bauermann @ 2019-05-27 20:14 UTC (permalink / raw)
To: Michael Ellerman
Cc: kexec, linuxppc-dev, linux-kernel, Mimi Zohar, AKASHI Takahiro
In-Reply-To: <459lBd53mCz9sBr@ozlabs.org>
Michael Ellerman <patch-notifications@ellerman.id.au> writes:
> On Wed, 2019-05-22 at 22:01:58 UTC, Thiago Jung Bauermann wrote:
>> Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
>> changed kexec_add_buffer() to skip searching for a memory location if
>> kexec_buf.mem is already set, and use the address that is there.
>>
>> In powerpc code we reuse a kexec_buf variable for loading both the kernel
>> and the initramfs by resetting some of the fields between those uses, but
>> not mem. This causes kexec_add_buffer() to try to load the kernel at the
>> same address where initramfs will be loaded, which is naturally rejected:
>>
>> # kexec -s -l --initrd initramfs vmlinuz
>> kexec_file_load failed: Invalid argument
>>
>> Setting the mem field before every call to kexec_add_buffer() fixes this
>> regression.
>>
>> Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
>> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
>> Reviewed-by: Dave Young <dyoung@redhat.com>
>
> Applied to powerpc fixes, thanks.
>
> https://git.kernel.org/powerpc/c/8b909e3548706cbebc0a676067b81aad
Thanks!!
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* [RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries
From: Shawn Anastasio @ 2019-05-27 22:55 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
Hello all,
This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of
configuring the resources and should obey alignment requirements.
The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use
with VFIO.
In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").
Feedback is appreciated.
Thanks,
Shawn
Shawn Anastasio (3):
PCI: Introduce pcibios_ignore_alignment_request
powerpc/64: Enable pcibios_after_init hook on ppc64
powerpc/pseries: Allow user-specified PCI resource alignment after
init
arch/powerpc/include/asm/machdep.h | 6 ++++--
arch/powerpc/kernel/pci-common.c | 9 +++++++++
arch/powerpc/kernel/pci_64.c | 4 ++++
arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
drivers/pci/pci.c | 9 +++++++--
5 files changed, 46 insertions(+), 4 deletions(-)
--
2.20.1
^ permalink raw reply
* [RESEND PATCH 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64
From: Shawn Anastasio @ 2019-05-27 22:55 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
In-Reply-To: <20190527225521.5884-1-shawn@anastas.io>
Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.
Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
arch/powerpc/include/asm/machdep.h | 3 +--
arch/powerpc/kernel/pci_64.c | 4 ++++
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void (*init)(void);
void (*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
/*
* optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, unsigned char);
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
+ /* Call machine dependent post-init code */
+ if (ppc_md.pcibios_after_init)
+ ppc_md.pcibios_after_init();
+
return 0;
}
--
2.20.1
^ permalink raw reply related
* [RESEND PATCH 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
From: Shawn Anastasio @ 2019-05-27 22:55 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
In-Reply-To: <20190527225521.5884-1-shawn@anastas.io>
On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.
Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.
In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
arch/powerpc/include/asm/machdep.h | 3 +++
arch/powerpc/kernel/pci-common.c | 9 +++++++++
arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
resource_size_t (*pcibios_default_alignment)(void);
+ /* Called when determining PCI resource alignment */
+ int (*pcibios_ignore_alignment_request)(void);
+
#ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
}
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+ if (ppc_md.pcibios_ignore_alignment_request)
+ return ppc_md.pcibios_ignore_alignment_request();
+
+ /* Fall back to default method of checking PCI_PROBE_ONLY */
+ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
#ifdef CONFIG_PCI_IOV
resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
{
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..c6af2ed8ee0f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
int fwnmi_active; /* TRUE if an FWNMI handler is present */
+int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
+
static void pSeries_show_cpuinfo(struct seq_file *m)
{
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
}
#endif
+static void pseries_after_init(void)
+{
+ initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+ if (initial_pci_init_done)
+ /*
+ * Allow custom alignments after init for things
+ * like PCI hotplugging.
+ */
+ return 0;
+
+ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
static void __init pSeries_setup_arch(void)
{
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+ ppc_md.pcibios_after_init = pseries_after_init;
+ ppc_md.pcibios_ignore_alignment_request =
+ pseries_ignore_alignment_request;
}
static void pseries_panic(char *str)
--
2.20.1
^ permalink raw reply related
* [RESEND PATCH 1/3] PCI: Introduce pcibios_ignore_alignment_request
From: Shawn Anastasio @ 2019-05-27 22:55 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
In-Reply-To: <20190527225521.5884-1-shawn@anastas.io>
Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.
The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
drivers/pci/pci.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
}
+int __weak pcibios_ignore_alignment_request(void)
+{
+ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
#define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
- if (pci_has_flag(PCI_PROBE_ONLY)) {
+ if (pcibios_ignore_alignment_request()) {
align = 0;
- pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
+ pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype
From: Michael Ellerman @ 2019-05-28 1:15 UTC (permalink / raw)
To: Mathieu Malaterre
Cc: Rafael J. Wysocki, linux-s390, Len Brown, Mathieu Malaterre,
linux-pm, Heiko Carstens, linux-kernel, Paul Mackerras,
Pavel Machek, Martin Schwidefsky, linuxppc-dev
In-Reply-To: <20190524104418.17194-1-malat@debian.org>
Mathieu Malaterre <malat@debian.org> writes:
> The declaration for pfn_is_nosave is only available in
> kernel/power/power.h. Since this function can be override in arch,
> expose it globally. Having a prototype will make sure to avoid warning
> (sometime treated as error with W=1) such as:
>
> arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]
>
> This moves the declaration into a globally visible header file and add
> missing include to avoid a warning on powerpc. Also remove the
> duplicated prototypes since not required anymore.
>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Mathieu Malaterre <malat@debian.org>
> ---
> v2: As suggestion by christophe remove duplicates prototypes
>
> arch/powerpc/kernel/suspend.c | 1 +
> arch/s390/kernel/entry.h | 1 -
> include/linux/suspend.h | 1 +
> kernel/power/power.h | 2 --
> 4 files changed, 2 insertions(+), 3 deletions(-)
Looks fine to me.
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
cheers
^ permalink raw reply
* Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype
From: Michael Ellerman @ 2019-05-28 1:16 UTC (permalink / raw)
To: Rafael J. Wysocki, Mathieu Malaterre
Cc: Len Brown, linux-s390, linux-pm, Heiko Carstens, linux-kernel,
Paul Mackerras, Pavel Machek, Martin Schwidefsky, linuxppc-dev
In-Reply-To: <1929721.iDiXxTFbjN@kreacher>
"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
>> The declaration for pfn_is_nosave is only available in
>> kernel/power/power.h. Since this function can be override in arch,
>> expose it globally. Having a prototype will make sure to avoid warning
>> (sometime treated as error with W=1) such as:
>>
>> arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]
>>
>> This moves the declaration into a globally visible header file and add
>> missing include to avoid a warning on powerpc. Also remove the
>> duplicated prototypes since not required anymore.
>>
>> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>> Signed-off-by: Mathieu Malaterre <malat@debian.org>
>> ---
>> v2: As suggestion by christophe remove duplicates prototypes
>>
>> arch/powerpc/kernel/suspend.c | 1 +
>> arch/s390/kernel/entry.h | 1 -
>> include/linux/suspend.h | 1 +
>> kernel/power/power.h | 2 --
>> 4 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/power/power.h b/kernel/power/power.h
>> index 9e58bdc8a562..44bee462ff57 100644
>> --- a/kernel/power/power.h
>> +++ b/kernel/power/power.h
>> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
>> static inline void hibernate_image_size_init(void) {}
>> #endif /* !CONFIG_HIBERNATION */
>>
>> -extern int pfn_is_nosave(unsigned long);
>> -
>> #define power_attr(_name) \
>> static struct kobj_attribute _name##_attr = { \
>> .attr = { \
>>
>
> With an ACK from the powerpc maintainers, I could apply this one.
Sent.
cheers
^ permalink raw reply
* [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
To: linux-mm
Cc: Mark Rutland, Oscar Salvador, Rafael J. Wysocki, Michal Hocko,
linux-ia64, linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens,
Wei Yang, Masahiro Yamada, Pavel Tatashin, Rich Felker, Arun KS,
Chintan Pandya, Ingo Molnar, Paul Mackerras, Qian Cai, linux-s390,
H. Peter Anvin, Yu Zhao, Baoquan He, Logan Gunthorpe,
David Hildenbrand, Mike Rapoport, Jun Yao, Ingo Molnar,
Catalin Marinas, Rob Herring, Fenghua Yu, Pavel Tatashin,
Vasily Gorbik, Anshuman Khandual, mike.travis@hpe.com,
Will Deacon, Robin Murphy, Nicholas Piggin, Martin Schwidefsky,
Mark Brown, Borislav Petkov, Andy Lutomirski, Jonathan Cameron,
Dan Williams, Chris Wilson, Joonsoo Kim, linux-arm-kernel,
Oscar Salvador, Tony Luck, Yoshinori Sato, Ard Biesheuvel,
Mathieu Malaterre, Greg Kroah-Hartman, Andrew Banman,
linux-kernel, Mike Rapoport, Thomas Gleixner, Wei Yang,
Alex Deucher, Igor Mammedov, akpm, linuxppc-dev, David S. Miller,
Kirill A. Shutemov
We only want memory block devices for memory to be onlined/offlined
(add/remove from the buddy). This is required so user space can
online/offline memory and kdump gets notified about newly onlined memory.
Let's factor out creation/removal of memory block devices. This helps
to further cleanup arch_add_memory/arch_remove_memory() and to make
implementation of new features easier - especially sub-section
memory hot add from Dan.
Anshuman Khandual is currently working on arch_remove_memory(). I added
a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
implementation", that is sufficient as a firsts tep in the context of
this series. (we don't cleanup page tables in case anything goes
wrong already)
Did a quick sanity test with DIMM plug/unplug, making sure all devices
and sysfs links properly get added/removed. Compile tested on s390x and
x86-64.
Based on next/master.
Next refactoring on my list will be making sure that remove_memory()
will never deal with zones / access "struct pages". Any kind of zone
handling will have to be done when offlining system memory / before
removing device memory. I am thinking about remove_pfn_range_from_zone()",
du undo everything "move_pfn_range_to_zone()" did.
v2 -> v3:
- Add "s390x/mm: Fail when an altmap is used for arch_add_memory()"
- Add "arm64/mm: Add temporary arch_remove_memory() implementation"
- Add "drivers/base/memory: Pass a block_id to init_memory_block()"
- Various changes to "mm/memory_hotplug: Create memory block devices
after arch_add_memory()" and "mm/memory_hotplug: Create memory block
devices after arch_add_memory()" due to switching from sections to
block_id's.
v1 -> v2:
- s390x/mm: Implement arch_remove_memory()
-- remove mapping after "__remove_pages"
David Hildenbrand (11):
mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
s390x/mm: Fail when an altmap is used for arch_add_memory()
s390x/mm: Implement arch_remove_memory()
arm64/mm: Add temporary arch_remove_memory() implementation
drivers/base/memory: Pass a block_id to init_memory_block()
mm/memory_hotplug: Allow arch_remove_pages() without
CONFIG_MEMORY_HOTREMOVE
mm/memory_hotplug: Create memory block devices after arch_add_memory()
mm/memory_hotplug: Drop MHP_MEMBLOCK_API
mm/memory_hotplug: Remove memory block devices before
arch_remove_memory()
mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
fail
mm/memory_hotplug: Remove "zone" parameter from
sparse_remove_one_section
arch/arm64/mm/mmu.c | 17 +++++
arch/ia64/mm/init.c | 2 -
arch/powerpc/mm/mem.c | 2 -
arch/s390/mm/init.c | 18 +++--
arch/sh/mm/init.c | 2 -
arch/x86/mm/init_32.c | 2 -
arch/x86/mm/init_64.c | 2 -
drivers/base/memory.c | 134 +++++++++++++++++++--------------
drivers/base/node.c | 27 +++----
include/linux/memory.h | 6 +-
include/linux/memory_hotplug.h | 12 +--
include/linux/node.h | 7 +-
mm/memory_hotplug.c | 44 +++++------
mm/sparse.c | 10 +--
14 files changed, 140 insertions(+), 145 deletions(-)
--
2.20.1
^ permalink raw reply
* [PATCH v2 1/3] PCI: Introduce pcibios_ignore_alignment_request
From: Shawn Anastasio @ 2019-05-28 1:54 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
In-Reply-To: <20190528015412.30521-1-shawn@anastas.io>
Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.
The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
drivers/pci/pci.c | 9 +++++++--
include/linux/pci.h | 1 +
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
}
+int __weak pcibios_ignore_alignment_request(void)
+{
+ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
#define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
- if (pci_has_flag(PCI_PROBE_ONLY)) {
+ if (pcibios_ignore_alignment_request()) {
align = 0;
- pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
+ pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
int pcibios_alloc_irq(struct pci_dev *dev);
void pcibios_free_irq(struct pci_dev *dev);
resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);
#ifdef CONFIG_HIBERNATE_CALLBACKS
extern struct dev_pm_ops pcibios_pm_ops;
--
2.20.1
^ permalink raw reply related
* [PATCH v2 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
From: Shawn Anastasio @ 2019-05-28 1:54 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
In-Reply-To: <20190528015412.30521-1-shawn@anastas.io>
On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.
Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.
In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
---
arch/powerpc/include/asm/machdep.h | 3 +++
arch/powerpc/kernel/pci-common.c | 9 +++++++++
arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
resource_size_t (*pcibios_default_alignment)(void);
+ /* Called when determining PCI resource alignment */
+ int (*pcibios_ignore_alignment_request)(void);
+
#ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
}
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+ if (ppc_md.pcibios_ignore_alignment_request)
+ return ppc_md.pcibios_ignore_alignment_request();
+
+ /* Fall back to default method of checking PCI_PROBE_ONLY */
+ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
#ifdef CONFIG_PCI_IOV
resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
{
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..07f03be02afe 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
int fwnmi_active; /* TRUE if an FWNMI handler is present */
+static int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
+
static void pSeries_show_cpuinfo(struct seq_file *m)
{
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
}
#endif
+static void pseries_after_init(void)
+{
+ initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+ if (initial_pci_init_done)
+ /*
+ * Allow custom alignments after init for things
+ * like PCI hotplugging.
+ */
+ return 0;
+
+ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
static void __init pSeries_setup_arch(void)
{
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+ ppc_md.pcibios_after_init = pseries_after_init;
+ ppc_md.pcibios_ignore_alignment_request =
+ pseries_ignore_alignment_request;
}
static void pseries_panic(char *str)
--
2.20.1
^ permalink raw reply related
* [PATCH v2 0/3] Allow custom PCI resource alignment on pseries
From: Shawn Anastasio @ 2019-05-28 1:54 UTC (permalink / raw)
To: linux-pci, linuxppc-dev
Cc: sbobroff, linux-kernel, rppt, xyjxie, bhelgaas, paulus
Changes from v1 to v2:
- Fix function declaration warnings caught by sparse
Hello all,
This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of
configuring the resources and should obey alignment requirements.
The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use
with VFIO.
In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").
Feedback is appreciated.
Thanks,
Shawn
Shawn Anastasio (3):
PCI: Introduce pcibios_ignore_alignment_request
powerpc/64: Enable pcibios_after_init hook on ppc64
powerpc/pseries: Allow user-specified PCI resource alignment after
init
arch/powerpc/include/asm/machdep.h | 6 ++++--
arch/powerpc/kernel/pci-common.c | 9 +++++++++
arch/powerpc/kernel/pci_64.c | 4 ++++
arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
drivers/pci/pci.c | 9 +++++++--
include/linux/pci.h | 1 +
6 files changed, 47 insertions(+), 4 deletions(-)
--
2.20.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox