linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [v4 00/15] complete deferred page initialization
@ 2017-08-02 20:38 Pavel Tatashin
  2017-08-02 20:38 ` [v4 01/15] x86/mm: reserve only exiting low pages Pavel Tatashin
                   ` (14 more replies)
  0 siblings, 15 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Changelog:
v3 - v2
- Rewrote code to zero sturct pages in __init_single_page() as
  suggested by Michal Hocko
- Added code to handle issues related to accessing struct page
  memory before they are initialized.

v2 - v3
- Addressed David Miller comments about one change per patch:
    * Splited changes to platforms into 4 patches
    * Made "do not zero vmemmap_buf" as a separate patch

v1 - v2
- Per request, added s390 to deferred "struct page" zeroing
- Collected performance data on x86 which proofs the importance to
  keep memset() as prefetch (see below).

SMP machines can benefit from the DEFERRED_STRUCT_PAGE_INIT config option,
which defers initializing struct pages until all cpus have been started so
it can be done in parallel.

However, this feature is sub-optimal, because the deferred page
initialization code expects that the struct pages have already been zeroed,
and the zeroing is done early in boot with a single thread only.  Also, we
access that memory and set flags before struct pages are initialized. All
of this is fixed in this patchset.

In this work we do the following:
- Never read access struct page until it was initialized
- Never set any fields in struct pages before they are initialized
- Zero struct page at the beginning of struct page initialization

Performance improvements on x86 machine with 8 nodes:
Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz

Single threaded struct page init: 7.6s/T improvement
Deferred struct page init: 10.2s/T improvement

Pavel Tatashin (15):
  x86/mm: reserve only exiting low pages
  x86/mm: setting fields in deferred pages
  sparc64/mm: setting fields in deferred pages
  mm: discard memblock data later
  mm: don't accessed uninitialized struct pages
  sparc64: simplify vmemmap_populate
  mm: defining memblock_virt_alloc_try_nid_raw
  mm: zero struct pages during initialization
  sparc64: optimized struct page zeroing
  x86/kasan: explicitly zero kasan shadow memory
  arm64/kasan: explicitly zero kasan shadow memory
  mm: explicitly zero pagetable memory
  mm: stop zeroing memory during allocation in vmemmap
  mm: optimize early system hash allocations
  mm: debug for raw alloctor

 arch/arm64/mm/kasan_init.c          |  32 ++++++++
 arch/sparc/include/asm/pgtable_64.h |  18 +++++
 arch/sparc/mm/init_64.c             |  31 +++-----
 arch/x86/kernel/setup.c             |   5 +-
 arch/x86/mm/init_64.c               |   9 ++-
 arch/x86/mm/kasan_init_64.c         |  29 +++++++
 include/linux/bootmem.h             |  11 +++
 include/linux/memblock.h            |  10 ++-
 include/linux/mm.h                  |   9 +++
 mm/memblock.c                       | 152 ++++++++++++++++++++++++++++--------
 mm/nobootmem.c                      |  16 ----
 mm/page_alloc.c                     |  29 ++++---
 mm/sparse-vmemmap.c                 |  10 ++-
 mm/sparse.c                         |   6 +-
 14 files changed, 279 insertions(+), 88 deletions(-)

--
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [v4 01/15] x86/mm: reserve only exiting low pages
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 02/15] x86/mm: setting fields in deferred pages Pavel Tatashin
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Struct pages are initialized by going through __init_single_page(). Since
the existing physical memory in memblock is represented in memblock.memory
list, struct page for every page from this list goes through
__init_single_page().

The second memblock list: memblock.reserved, manages the allocated memory.
The memory that won't be available to kernel allocator. So, every page from
this list goes through reserve_bootmem_region(), where certain struct page
fields are set, the assumption being that the struct pages have been
initialized beforehand.

In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but
memblock.memory might start at a later PFN. For example, in QEMU,
e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory,
so PFN 0 is not on memblock.memory (and hence isn't initialized via
__init_single_page) but is on memblock.reserved (and hence we set fields in
the uninitialized struct page).

Currently, the struct page memory is always zeroed during allocation,
which prevents this problem from being detected. But, if some asserts
provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become
visible in existing kernels.

In this patchset we will stop zeroing struct page memory during allocation.
Therefore, this bug must be fixed in order to avoid random assert failures
caused by CONFIG_DEBUG_VM_PGFLAGS triggers.

The fix is to reserve memory from the first existing PFN.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/x86/kernel/setup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3486d0498800..489cdc141bcb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow);
 
 static void __init trim_low_memory_range(void)
 {
-	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
+	unsigned long min_pfn = find_min_pfn_with_active_regions();
+	phys_addr_t base = min_pfn << PAGE_SHIFT;
+
+	memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE));
 }
 	
 /*
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 02/15] x86/mm: setting fields in deferred pages
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
  2017-08-02 20:38 ` [v4 01/15] x86/mm: reserve only exiting low pages Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 03/15] sparc64/mm: " Pavel Tatashin
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set some
fields prior to initializing:

        mem_init() {
                register_page_bootmem_info();
                free_all_bootmem();
                ...
        }

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

  mem_init
   register_page_bootmem_info
    register_page_bootmem_info_node
     get_page_bootmem
      .. setting fields here ..
      such as: page->freelist = (void *)type;

We end-up with similar issue as in the previous patch, where currently we
do not observe problem as memory is zeroed. But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/x86/mm/init_64.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 136422d7d539..1e863baec847 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1165,12 +1165,17 @@ void __init mem_init(void)
 
 	/* clear_bss() already clear the empty_zero_page */
 
-	register_page_bootmem_info();
-
 	/* this will put all memory onto the freelists */
 	free_all_bootmem();
 	after_bootmem = 1;
 
+	/* Must be done after boot memory is put on freelist, because here we
+	 * might set fields in deferred struct pages that have not yet been
+	 * initialized, and free_all_bootmem() initializes all the reserved
+	 * deferred pages for us.
+	 */
+	register_page_bootmem_info();
+
 	/* Register memory areas for /proc/kcore */
 	kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR,
 			 PAGE_SIZE, KCORE_OTHER);
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 03/15] sparc64/mm: setting fields in deferred pages
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
  2017-08-02 20:38 ` [v4 01/15] x86/mm: reserve only exiting low pages Pavel Tatashin
  2017-08-02 20:38 ` [v4 02/15] x86/mm: setting fields in deferred pages Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 04/15] mm: discard memblock data later Pavel Tatashin
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set some
fields prior to initializing:

 mem_init() {
	 register_page_bootmem_info();
	 free_all_bootmem();
	 ...
 }

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

mem_init
register_page_bootmem_info
 register_page_bootmem_info_node
  get_page_bootmem
   .. setting fields here ..
   such as: page->freelist = (void *)type;

We end-up with similar issue as in the previous patch, where currently we
do not observe problem as memory is zeroed. But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/sparc/mm/init_64.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd50f92..ba957b763c07 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2464,9 +2464,15 @@ void __init mem_init(void)
 {
 	high_memory = __va(last_valid_pfn << PAGE_SHIFT);
 
-	register_page_bootmem_info();
 	free_all_bootmem();
 
+	/* Must be done after boot memory is put on freelist, because here we
+	 * might set fields in deferred struct pages that have not yet been
+	 * initialized, and free_all_bootmem() initializes all the reserved
+	 * deferred pages for us.
+	 */
+	register_page_bootmem_info();
+
 	/*
 	 * Set up the zero page, mark it reserved, so that page count
 	 * is not manipulated when freeing the page from user ptes.
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 04/15] mm: discard memblock data later
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (2 preceding siblings ...)
  2017-08-02 20:38 ` [v4 03/15] sparc64/mm: " Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-03  4:29   ` kbuild test robot
  2017-08-02 20:38 ` [v4 05/15] mm: don't accessed uninitialized struct pages Pavel Tatashin
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

There is existing use after free bug when deferred struct pages are
enabled:

The memblock_add() allocates memory for the memory array if more than
128 entries are needed.  See comment in e820__memblock_setup():

  * The bootstrap memblock region count maximum is 128 entries
  * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
  * than that - so allow memblock resizing.

This memblock memory is freed here:
        free_low_memory_core_early()

We access the freed memblock.memory later in boot when deferred pages are
initialized in this path:

        deferred_init_memmap()
                for_each_mem_pfn_range()
                  __next_mem_pfn_range()
                    type = &memblock.memory;

One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been exceeded
at least on systems where deferred struct pages were enabled.

Another reason why we want this problem fixed in this patch series is,
in the next patch, we will need to access memblock.reserved from
deferred_init_memmap().

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 include/linux/memblock.h |  7 +++++--
 mm/memblock.c            | 38 +++++++++++++++++---------------------
 mm/nobootmem.c           | 16 ----------------
 mm/page_alloc.c          |  2 ++
 4 files changed, 24 insertions(+), 39 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 77d427974f57..c89d16c88512 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -61,9 +61,11 @@ extern int memblock_debug;
 #ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
 #define __init_memblock __meminit
 #define __initdata_memblock __meminitdata
+void memblock_discard(void);
 #else
 #define __init_memblock
 #define __initdata_memblock
+#define memblock_discard()
 #endif
 
 #define memblock_dbg(fmt, ...) \
@@ -74,8 +76,6 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t size, phys_addr_t align,
 					int nid, ulong flags);
 phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
 				   phys_addr_t size, phys_addr_t align);
-phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
-phys_addr_t get_allocated_memblock_memory_regions_info(phys_addr_t *addr);
 void memblock_allow_resize(void);
 int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
 int memblock_add(phys_addr_t base, phys_addr_t size);
@@ -110,6 +110,9 @@ void __next_mem_range_rev(u64 *idx, int nid, ulong flags,
 void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
 				phys_addr_t *out_end);
 
+void __memblock_free_early(phys_addr_t base, phys_addr_t size);
+void __memblock_free_late(phys_addr_t base, phys_addr_t size);
+
 /**
  * for_each_mem_range - iterate through memblock areas from type_a and not
  * included in type_b. Or just type_a if type_b is NULL.
diff --git a/mm/memblock.c b/mm/memblock.c
index 2cb25fe4452c..3a2707914064 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -285,31 +285,27 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
 }
 
 #ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-
-phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
-					phys_addr_t *addr)
-{
-	if (memblock.reserved.regions == memblock_reserved_init_regions)
-		return 0;
-
-	*addr = __pa(memblock.reserved.regions);
-
-	return PAGE_ALIGN(sizeof(struct memblock_region) *
-			  memblock.reserved.max);
-}
-
-phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info(
-					phys_addr_t *addr)
+/**
+ * Discard memory and reserved arrays if they were allocated
+ */
+void __init_memblock memblock_discard(void)
 {
-	if (memblock.memory.regions == memblock_memory_init_regions)
-		return 0;
+	phys_addr_t addr, size;
 
-	*addr = __pa(memblock.memory.regions);
+	if (memblock.reserved.regions != memblock_reserved_init_regions) {
+		addr = __pa(memblock.reserved.regions);
+		size = PAGE_ALIGN(sizeof(struct memblock_region) *
+				  memblock.reserved.max);
+		__memblock_free_late(addr, size);
+	}
 
-	return PAGE_ALIGN(sizeof(struct memblock_region) *
-			  memblock.memory.max);
+	if (memblock.memory.regions == memblock_memory_init_regions) {
+		addr = __pa(memblock.memory.regions);
+		size = PAGE_ALIGN(sizeof(struct memblock_region) *
+				  memblock.memory.max);
+		__memblock_free_late(addr, size);
+	}
 }
-
 #endif
 
 /**
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 36454d0f96ee..3637809a18d0 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -146,22 +146,6 @@ static unsigned long __init free_low_memory_core_early(void)
 				NULL)
 		count += __free_memory_core(start, end);
 
-#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-	{
-		phys_addr_t size;
-
-		/* Free memblock.reserved array if it was allocated */
-		size = get_allocated_memblock_reserved_regions_info(&start);
-		if (size)
-			count += __free_memory_core(start, start + size);
-
-		/* Free memblock.memory array if it was allocated */
-		size = get_allocated_memblock_memory_regions_info(&start);
-		if (size)
-			count += __free_memory_core(start, start + size);
-	}
-#endif
-
 	return count;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d30e914afb6..87fb35ac0b87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1584,6 +1584,8 @@ void __init page_alloc_init_late(void)
 	/* Reinit limits that are based on free pages after the kernel is up */
 	files_maxfiles_init();
 #endif
+	/* Discard memblock private memory */
+	memblock_discard();
 
 	for_each_populated_zone(zone)
 		set_zone_contiguous(zone);
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 05/15] mm: don't accessed uninitialized struct pages
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (3 preceding siblings ...)
  2017-08-02 20:38 ` [v4 04/15] mm: discard memblock data later Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 06/15] sparc64: simplify vmemmap_populate Pavel Tatashin
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

In deferred_init_memmap() where all deferred struct pages are initialized
we have a check like this:

    if (page->flags) {
            VM_BUG_ON(page_zone(page) != zone);
            goto free_range;
    }

This way we are checking if the current deferred page has already been
initialized. It works, because memory for struct pages has been zeroed, and
the only way flags are not zero if it went through __init_single_page()
before.  But, once we change the current behavior and won't zero the memory
in memblock allocator, we cannot trust anything inside "struct page"es
until they are initialized. This patch fixes this.

This patch defines a new accessor memblock_get_reserved_pfn_range()
which returns successive ranges of reserved PFNs.  deferred_init_memmap()
calls it to determine if a PFN and its struct page has already been
initialized.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 include/linux/memblock.h |  3 +++
 mm/memblock.c            | 54 ++++++++++++++++++++++++++++++++++++++++++------
 mm/page_alloc.c          | 11 +++++++++-
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index c89d16c88512..9d8dabedf5ba 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -321,6 +321,9 @@ int memblock_is_map_memory(phys_addr_t addr);
 int memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
 bool memblock_is_reserved(phys_addr_t addr);
 bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);
+void memblock_get_reserved_pfn_range(unsigned long pfn,
+				     unsigned long *pfn_start,
+				     unsigned long *pfn_end);
 
 extern void __memblock_dump_all(void);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 3a2707914064..e6df054e3180 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1580,7 +1580,13 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 	memblock_cap_memory_range(0, max_addr);
 }
 
-static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
+/**
+ * Return index in regions array if addr is within the region. Otherwise
+ * return -1. If -1 is returned and *next_idx is not %NULL, sets it to the
+ * next region index or -1 if there is none.
+ */
+static int __init_memblock memblock_search(struct memblock_type *type,
+					   phys_addr_t addr, int *next_idx)
 {
 	unsigned int left = 0, right = type->cnt;
 
@@ -1595,22 +1601,26 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr
 		else
 			return mid;
 	} while (left < right);
+
+	if (next_idx)
+		*next_idx = (right == type->cnt) ? -1 : right;
+
 	return -1;
 }
 
 bool __init memblock_is_reserved(phys_addr_t addr)
 {
-	return memblock_search(&memblock.reserved, addr) != -1;
+	return memblock_search(&memblock.reserved, addr, NULL) != -1;
 }
 
 bool __init_memblock memblock_is_memory(phys_addr_t addr)
 {
-	return memblock_search(&memblock.memory, addr) != -1;
+	return memblock_search(&memblock.memory, addr, NULL) != -1;
 }
 
 int __init_memblock memblock_is_map_memory(phys_addr_t addr)
 {
-	int i = memblock_search(&memblock.memory, addr);
+	int i = memblock_search(&memblock.memory, addr, NULL);
 
 	if (i == -1)
 		return false;
@@ -1622,7 +1632,7 @@ int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
 			 unsigned long *start_pfn, unsigned long *end_pfn)
 {
 	struct memblock_type *type = &memblock.memory;
-	int mid = memblock_search(type, PFN_PHYS(pfn));
+	int mid = memblock_search(type, PFN_PHYS(pfn), NULL);
 
 	if (mid == -1)
 		return -1;
@@ -1646,7 +1656,7 @@ int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
  */
 int __init_memblock memblock_is_region_memory(phys_addr_t base, phys_addr_t size)
 {
-	int idx = memblock_search(&memblock.memory, base);
+	int idx = memblock_search(&memblock.memory, base, NULL);
 	phys_addr_t end = base + memblock_cap_size(base, &size);
 
 	if (idx == -1)
@@ -1656,6 +1666,38 @@ int __init_memblock memblock_is_region_memory(phys_addr_t base, phys_addr_t size
 }
 
 /**
+ * memblock_get_reserved_pfn_range - search for the next reserved region
+ *
+ * @pfn: start searching from this pfn.
+ *
+ * RETURNS:
+ * [start_pfn, end_pfn), where start_pfn >= pfn. If none is found
+ * start_pfn, and end_pfn are both set to ULONG_MAX.
+ */
+void __init_memblock memblock_get_reserved_pfn_range(unsigned long pfn,
+						     unsigned long *start_pfn,
+						     unsigned long *end_pfn)
+{
+	struct memblock_type *type = &memblock.reserved;
+	int next_idx, idx;
+
+	idx = memblock_search(type, PFN_PHYS(pfn), &next_idx);
+	if (idx == -1 && next_idx == -1) {
+		*start_pfn = ULONG_MAX;
+		*end_pfn = ULONG_MAX;
+		return;
+	}
+
+	if (idx == -1) {
+		idx = next_idx;
+		*start_pfn = PFN_DOWN(type->regions[idx].base);
+	} else {
+		*start_pfn = pfn;
+	}
+	*end_pfn = PFN_DOWN(type->regions[idx].base + type->regions[idx].size);
+}
+
+/**
  * memblock_is_region_reserved - check if a region intersects reserved memory
  * @base: base of region to check
  * @size: size of region to check
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 87fb35ac0b87..99b9e2e06319 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1447,6 +1447,7 @@ static int __init deferred_init_memmap(void *data)
 	pg_data_t *pgdat = data;
 	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
+	unsigned long resv_start_pfn = 0, resv_end_pfn = 0;
 	unsigned long start = jiffies;
 	unsigned long nr_pages = 0;
 	unsigned long walk_start, walk_end;
@@ -1491,6 +1492,10 @@ static int __init deferred_init_memmap(void *data)
 			pfn = zone->zone_start_pfn;
 
 		for (; pfn < end_pfn; pfn++) {
+			if (pfn >= resv_end_pfn)
+				memblock_get_reserved_pfn_range(pfn,
+								&resv_start_pfn,
+								&resv_end_pfn);
 			if (!pfn_valid_within(pfn))
 				goto free_range;
 
@@ -1524,7 +1529,11 @@ static int __init deferred_init_memmap(void *data)
 				cond_resched();
 			}
 
-			if (page->flags) {
+			/*
+			 * Check if this page has already been initialized due
+			 * to being reserved during boot in memblock.
+			 */
+			if (pfn >= resv_start_pfn) {
 				VM_BUG_ON(page_zone(page) != zone);
 				goto free_range;
 			}
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 06/15] sparc64: simplify vmemmap_populate
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (4 preceding siblings ...)
  2017-08-02 20:38 ` [v4 05/15] mm: don't accessed uninitialized struct pages Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 07/15] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Remove duplicating code by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/sparc/mm/init_64.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index ba957b763c07..e18947e9ab6c 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2567,30 +2567,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 	vstart = vstart & PMD_MASK;
 	vend = ALIGN(vend, PMD_SIZE);
 	for (; vstart < vend; vstart += PMD_SIZE) {
-		pgd_t *pgd = pgd_offset_k(vstart);
+		pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
 		unsigned long pte;
 		pud_t *pud;
 		pmd_t *pmd;
 
-		if (pgd_none(*pgd)) {
-			pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+		if (!pgd)
+			return -ENOMEM;
 
-			if (!new)
-				return -ENOMEM;
-			pgd_populate(&init_mm, pgd, new);
-		}
-
-		pud = pud_offset(pgd, vstart);
-		if (pud_none(*pud)) {
-			pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
-			if (!new)
-				return -ENOMEM;
-			pud_populate(&init_mm, pud, new);
-		}
+		pud = vmemmap_pud_populate(pgd, vstart, node);
+		if (!pud)
+			return -ENOMEM;
 
 		pmd = pmd_offset(pud, vstart);
-
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
 			void *block = vmemmap_alloc_block(PMD_SIZE, node);
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 07/15] mm: defining memblock_virt_alloc_try_nid_raw
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (5 preceding siblings ...)
  2017-08-02 20:38 ` [v4 06/15] sparc64: simplify vmemmap_populate Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 08/15] mm: zero struct pages during initialization Pavel Tatashin
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

A new variant of memblock_virt_alloc_* allocations:
memblock_virt_alloc_try_nid_raw()
    - Does not zero the allocated memory
    - Does not panic if request cannot be satisfied

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 include/linux/bootmem.h | 11 ++++++++++
 mm/memblock.c           | 53 ++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index e223d91b6439..0a0a37f3e292 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 #define BOOTMEM_ALLOC_ANYWHERE		(~(phys_addr_t)0)
 
 /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+				      phys_addr_t min_addr,
+				      phys_addr_t max_addr, int nid);
 void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
 		phys_addr_t align, phys_addr_t min_addr,
 		phys_addr_t max_addr, int nid);
@@ -176,6 +179,14 @@ static inline void * __init memblock_virt_alloc(
 					    NUMA_NO_NODE);
 }
 
+static inline void * __init memblock_virt_alloc_raw(
+					phys_addr_t size,  phys_addr_t align)
+{
+	return memblock_virt_alloc_try_nid_raw(size, align, BOOTMEM_LOW_LIMIT,
+					    BOOTMEM_ALLOC_ACCESSIBLE,
+					    NUMA_NO_NODE);
+}
+
 static inline void * __init memblock_virt_alloc_nopanic(
 					phys_addr_t size, phys_addr_t align)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index e6df054e3180..bdf31f207fa4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1327,7 +1327,6 @@ static void * __init memblock_virt_alloc_internal(
 	return NULL;
 done:
 	ptr = phys_to_virt(alloc);
-	memset(ptr, 0, size);
 
 	/*
 	 * The min_count is set to 0 so that bootmem allocated blocks
@@ -1341,6 +1340,38 @@ static void * __init memblock_virt_alloc_internal(
 }
 
 /**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ *	  is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ *	      is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ *	      allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. Does not zero allocated memory, does not panic if request
+ * cannot be satisfied.
+ *
+ * RETURNS:
+ * Virtual address of allocated memory block on success, NULL on failure.
+ */
+void * __init memblock_virt_alloc_try_nid_raw(
+			phys_addr_t size, phys_addr_t align,
+			phys_addr_t min_addr, phys_addr_t max_addr,
+			int nid)
+{
+	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
+		     (u64)max_addr, (void *)_RET_IP_);
+
+	return memblock_virt_alloc_internal(size, align,
+					    min_addr, max_addr, nid);
+}
+
+/**
  * memblock_virt_alloc_try_nid_nopanic - allocate boot memory block
  * @size: size of memory block to be allocated in bytes
  * @align: alignment of the region and block's size
@@ -1351,8 +1382,8 @@ static void * __init memblock_virt_alloc_internal(
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  *
- * Public version of _memblock_virt_alloc_try_nid_nopanic() which provides
- * additional debug information (including caller info), if enabled.
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. This function zeroes the allocated memory.
  *
  * RETURNS:
  * Virtual address of allocated memory block on success, NULL on failure.
@@ -1362,11 +1393,17 @@ void * __init memblock_virt_alloc_try_nid_nopanic(
 				phys_addr_t min_addr, phys_addr_t max_addr,
 				int nid)
 {
+	void *ptr;
+
 	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
 		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
 		     (u64)max_addr, (void *)_RET_IP_);
-	return memblock_virt_alloc_internal(size, align, min_addr,
-					     max_addr, nid);
+
+	ptr = memblock_virt_alloc_internal(size, align,
+					   min_addr, max_addr, nid);
+	if (ptr)
+		memset(ptr, 0, size);
+	return ptr;
 }
 
 /**
@@ -1380,7 +1417,7 @@ void * __init memblock_virt_alloc_try_nid_nopanic(
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  *
- * Public panicking version of _memblock_virt_alloc_try_nid_nopanic()
+ * Public panicking version of memblock_virt_alloc_try_nid_nopanic()
  * which provides debug information (including caller info), if enabled,
  * and panics if the request can not be satisfied.
  *
@@ -1399,8 +1436,10 @@ void * __init memblock_virt_alloc_try_nid(
 		     (u64)max_addr, (void *)_RET_IP_);
 	ptr = memblock_virt_alloc_internal(size, align,
 					   min_addr, max_addr, nid);
-	if (ptr)
+	if (ptr) {
+		memset(ptr, 0, size);
 		return ptr;
+	}
 
 	panic("%s: Failed to allocate %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx\n",
 	      __func__, (u64)size, (u64)align, nid, (u64)min_addr,
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 08/15] mm: zero struct pages during initialization
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (6 preceding siblings ...)
  2017-08-02 20:38 ` [v4 07/15] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 09/15] sparc64: optimized struct page zeroing Pavel Tatashin
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Add struct page zeroing as a part of initialization of other fields in
__init_single_page().

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 include/linux/mm.h | 9 +++++++++
 mm/page_alloc.c    | 1 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..183ac5e733db 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -94,6 +94,15 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #endif
 
 /*
+ * On some architectures it is expensive to call memset() for small sizes.
+ * Those architectures should provide their own implementation of "struct page"
+ * zeroing by defining this macro in <asm/pgtable.h>.
+ */
+#ifndef mm_zero_struct_page
+#define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
+#endif
+
+/*
  * Default maximum number of active map areas, this limits the number of vmas
  * per mm struct. Users can overwrite this number by sysctl but there is a
  * problem.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 99b9e2e06319..debea7c0febb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,7 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
+	mm_zero_struct_page(page);
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 09/15] sparc64: optimized struct page zeroing
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (7 preceding siblings ...)
  2017-08-02 20:38 ` [v4 08/15] mm: zero struct pages during initialization Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-03  5:15   ` kbuild test robot
  2017-08-02 20:38 ` [v4 10/15] x86/kasan: explicitly zero kasan shadow memory Pavel Tatashin
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight regular stores, thus avoid cost of membar.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/sparc/include/asm/pgtable_64.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931f0570..23ad51ea5340 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,24 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
 extern struct page *mem_map_zero;
 #define ZERO_PAGE(vaddr)	(mem_map_zero)
 
+/* This macro must be updated when the size of struct page changes,
+ * so use static assert to enforce the assumed size.
+ */
+#define	mm_zero_struct_page(pp)					\
+	do {							\
+		unsigned long *_pp = (void *)(pp);		\
+								\
+		BUILD_BUG_ON(sizeof(struct page) != 64);	\
+		_pp[0] = 0;					\
+		_pp[1] = 0;					\
+		_pp[2] = 0;					\
+		_pp[3] = 0;					\
+		_pp[4] = 0;					\
+		_pp[5] = 0;					\
+		_pp[6] = 0;					\
+		_pp[7] = 0;					\
+	} while (0)
+
 /* PFNs are real physical page numbers.  However, mem_map only begins to record
  * per-page information starting at pfn_base.  This is to handle systems where
  * the first physical page in the machine is at some huge physical address,
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 10/15] x86/kasan: explicitly zero kasan shadow memory
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (8 preceding siblings ...)
  2017-08-02 20:38 ` [v4 09/15] sparc64: optimized struct page zeroing Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 11/15] arm64/kasan: " Pavel Tatashin
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

To optimize the performance of struct page initialization,
vmemmap_populate() will no longer zero memory.

We must explicitly zero the memory that is allocated by vmemmap_populate()
for kasan, as this memory does not go through struct page initialization
path.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/x86/mm/kasan_init_64.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 02c9d7553409..7d06cf0b0b6e 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -84,6 +84,28 @@ static struct notifier_block kasan_die_notifier = {
 };
 #endif
 
+/*
+ * Memory that was allocated by vmemmap_populate is not zeroed, so we must
+ * zero it here explicitly.
+ */
+static void
+zero_vemmap_populated_memory(void)
+{
+	u64 i, start, end;
+
+	for (i = 0; i < E820_MAX_ENTRIES && pfn_mapped[i].end; i++) {
+		void *kaddr_start = pfn_to_kaddr(pfn_mapped[i].start);
+		void *kaddr_end = pfn_to_kaddr(pfn_mapped[i].end);
+
+		start = (u64)kasan_mem_to_shadow(kaddr_start);
+		end = (u64)kasan_mem_to_shadow(kaddr_end);
+		memset((void *)start, 0, end - start);
+	}
+	start = (u64)kasan_mem_to_shadow(_stext);
+	end = (u64)kasan_mem_to_shadow(_end);
+	memset((void *)start, 0, end - start);
+}
+
 void __init kasan_early_init(void)
 {
 	int i;
@@ -156,6 +178,13 @@ void __init kasan_init(void)
 		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO);
 		set_pte(&kasan_zero_pte[i], pte);
 	}
+
+	/*
+	 * vmemmap_populate does not zero the memory, so we need to zero it
+	 * explicitly
+	 */
+	zero_vemmap_populated_memory();
+
 	/* Flush TLBs again to be sure that write protection applied. */
 	__flush_tlb_all();
 
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 11/15] arm64/kasan: explicitly zero kasan shadow memory
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (9 preceding siblings ...)
  2017-08-02 20:38 ` [v4 10/15] x86/kasan: explicitly zero kasan shadow memory Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 12/15] mm: explicitly zero pagetable memory Pavel Tatashin
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

To optimize the performance of struct page initialization,
vmemmap_populate() will no longer zero memory.

We must explicitly zero the memory that is allocated by vmemmap_populate()
for kasan, as this memory does not go through struct page initialization
path.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/arm64/mm/kasan_init.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 81f03959a4ab..a57104bc54b8 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -135,6 +135,31 @@ static void __init clear_pgds(unsigned long start,
 		set_pgd(pgd_offset_k(start), __pgd(0));
 }
 
+/*
+ * Memory that was allocated by vmemmap_populate is not zeroed, so we must
+ * zero it here explicitly.
+ */
+static void
+zero_vemmap_populated_memory(void)
+{
+	struct memblock_region *reg;
+	u64 start, end;
+
+	for_each_memblock(memory, reg) {
+		start = __phys_to_virt(reg->base);
+		end = __phys_to_virt(reg->base + reg->size);
+
+		if (start >= end)
+			break;
+
+		memset((void *)start, 0, end - start);
+	}
+
+	start = (u64)kasan_mem_to_shadow(_stext);
+	end = (u64)kasan_mem_to_shadow(_end);
+	memset((void *)start, 0, end - start);
+}
+
 void __init kasan_init(void)
 {
 	u64 kimg_shadow_start, kimg_shadow_end;
@@ -205,6 +230,13 @@ void __init kasan_init(void)
 			pfn_pte(sym_to_pfn(kasan_zero_page), PAGE_KERNEL_RO));
 
 	memset(kasan_zero_page, 0, PAGE_SIZE);
+
+	/*
+	 * vmemmap_populate does not zero the memory, so we need to zero it
+	 * explicitly
+	 */
+	zero_vemmap_populated_memory();
+
 	cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
 
 	/* At this point kasan is fully initialized. Enable error messages */
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 12/15] mm: explicitly zero pagetable memory
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (10 preceding siblings ...)
  2017-08-02 20:38 ` [v4 11/15] arm64/kasan: " Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-02 20:38 ` [v4 13/15] mm: stop zeroing memory during allocation in vmemmap Pavel Tatashin
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Soon vmemmap_alloc_block() will no longer zero the block, so zero memory
at its call sites for everything except struct pages.  Struct page memory
is zero'd by struct page initialization.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 mm/sparse-vmemmap.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index c50b1a14d55e..d40c721ab19f 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -191,6 +191,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
 		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
+		memset(p, 0, PAGE_SIZE);
 		pmd_populate_kernel(&init_mm, pmd, p);
 	}
 	return pmd;
@@ -203,6 +204,7 @@ pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node)
 		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
+		memset(p, 0, PAGE_SIZE);
 		pud_populate(&init_mm, pud, p);
 	}
 	return pud;
@@ -215,6 +217,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
 		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
+		memset(p, 0, PAGE_SIZE);
 		p4d_populate(&init_mm, p4d, p);
 	}
 	return p4d;
@@ -227,6 +230,7 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
 		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
+		memset(p, 0, PAGE_SIZE);
 		pgd_populate(&init_mm, pgd, p);
 	}
 	return pgd;
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 13/15] mm: stop zeroing memory during allocation in vmemmap
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (11 preceding siblings ...)
  2017-08-02 20:38 ` [v4 12/15] mm: explicitly zero pagetable memory Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-03  4:46   ` kbuild test robot
  2017-08-02 20:38 ` [v4 14/15] mm: optimize early system hash allocations Pavel Tatashin
  2017-08-02 20:38 ` [v4 15/15] mm: debug for raw alloctor Pavel Tatashin
  14 siblings, 1 reply; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Replace allocators in sprase-vmemmap to use the non-zeroing version. So,
we will get the performance improvement by zeroing the memory in parallel
when struct pages are zeroed.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 mm/sparse-vmemmap.c | 6 +++---
 mm/sparse.c         | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d40c721ab19f..3b646b5ce1b6 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -41,7 +41,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long align,
 				unsigned long goal)
 {
-	return memblock_virt_alloc_try_nid(size, align, goal,
+	return memblock_virt_alloc_try_nid_raw(size, align, goal,
 					    BOOTMEM_ALLOC_ACCESSIBLE, node);
 }
 
@@ -56,11 +56,11 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 
 		if (node_state(node, N_HIGH_MEMORY))
 			page = alloc_pages_node(
-				node, GFP_KERNEL | __GFP_ZERO | __GFP_RETRY_MAYFAIL,
+				node, GFP_KERNEL | __GFP_RETRY_MAYFAIL,
 				get_order(size));
 		else
 			page = alloc_pages(
-				GFP_KERNEL | __GFP_ZERO | __GFP_RETRY_MAYFAIL,
+				GFP_KERNEL | __GFP_RETRY_MAYFAIL,
 				get_order(size));
 		if (page)
 			return page_address(page);
diff --git a/mm/sparse.c b/mm/sparse.c
index 7b4be3fd5cac..0e315766ad11 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -441,9 +441,9 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	}
 
 	size = PAGE_ALIGN(size);
-	map = memblock_virt_alloc_try_nid(size * map_count,
-					  PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
-					  BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
+	map = memblock_virt_alloc_try_nid_raw(size * map_count,
+					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
+					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 14/15] mm: optimize early system hash allocations
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (12 preceding siblings ...)
  2017-08-02 20:38 ` [v4 13/15] mm: stop zeroing memory during allocation in vmemmap Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  2017-08-03  4:29   ` kbuild test robot
  2017-08-02 20:38 ` [v4 15/15] mm: debug for raw alloctor Pavel Tatashin
  14 siblings, 1 reply; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

Clients can call alloc_large_system_hash() with flag: HASH_ZERO to specify
that memory that was allocated for system hash needs to be zeroed,
otherwise the memory does not need to be zeroed, and client will initialize
it.

If memory does not need to be zero'd, call the new
memblock_virt_alloc_raw() interface, and thus improve the boot performance.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 mm/page_alloc.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index debea7c0febb..623e2f7634e7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7350,18 +7350,17 @@ void *__init alloc_large_system_hash(const char *tablename,
 
 	log2qty = ilog2(numentries);
 
-	/*
-	 * memblock allocator returns zeroed memory already, so HASH_ZERO is
-	 * currently not used when HASH_EARLY is specified.
-	 */
 	gfp_flags = (flags & HASH_ZERO) ? GFP_ATOMIC | __GFP_ZERO : GFP_ATOMIC;
 	do {
 		size = bucketsize << log2qty;
-		if (flags & HASH_EARLY)
-			table = memblock_virt_alloc_nopanic(size, 0);
-		else if (hashdist)
+		if (flags & HASH_EARLY) {
+			if (flags & HASH_ZERO)
+				table = memblock_virt_alloc_nopanic(size, 0);
+			else
+				table = memblock_virt_alloc_raw(size, 0);
+		} else if (hashdist) {
 			table = __vmalloc(size, gfp_flags, PAGE_KERNEL);
-		else {
+		} else {
 			/*
 			 * If bucketsize is not a power-of-two, we may free
 			 * some pages at the end of hash table which
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [v4 15/15] mm: debug for raw alloctor
  2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
                   ` (13 preceding siblings ...)
  2017-08-02 20:38 ` [v4 14/15] mm: optimize early system hash allocations Pavel Tatashin
@ 2017-08-02 20:38 ` Pavel Tatashin
  14 siblings, 0 replies; 20+ messages in thread
From: Pavel Tatashin @ 2017-08-02 20:38 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	linux-arm-kernel, x86, kasan-dev, borntraeger, heiko.carstens,
	davem, willy, mhocko

When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
places excpect zeroed memory.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 mm/memblock.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index bdf31f207fa4..b6f90e75946c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1363,12 +1363,19 @@ void * __init memblock_virt_alloc_try_nid_raw(
 			phys_addr_t min_addr, phys_addr_t max_addr,
 			int nid)
 {
+	void *ptr;
+
 	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
 		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
 		     (u64)max_addr, (void *)_RET_IP_);
 
-	return memblock_virt_alloc_internal(size, align,
-					    min_addr, max_addr, nid);
+	ptr = memblock_virt_alloc_internal(size, align,
+					   min_addr, max_addr, nid);
+#ifdef CONFIG_DEBUG_VM
+	if (ptr && size > 0)
+		memset(ptr, 0xff, size);
+#endif
+	return ptr;
 }
 
 /**
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [v4 14/15] mm: optimize early system hash allocations
  2017-08-02 20:38 ` [v4 14/15] mm: optimize early system hash allocations Pavel Tatashin
@ 2017-08-03  4:29   ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2017-08-03  4:29 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390, linux-arm-kernel, x86, kasan-dev, borntraeger,
	heiko.carstens, davem, willy, mhocko

[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]

Hi Pavel,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.13-rc3]
[cannot apply to next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/complete-deferred-page-initialization/20170803-081025
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: mips-allmodconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=mips 

All error/warnings (new ones prefixed by >>):

   mm/page_alloc.c: In function 'alloc_large_system_hash':
>> mm/page_alloc.c:7369:13: error: implicit declaration of function 'memblock_virt_alloc_raw' [-Werror=implicit-function-declaration]
        table = memblock_virt_alloc_raw(size, 0);
                ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/page_alloc.c:7369:11: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
        table = memblock_virt_alloc_raw(size, 0);
              ^
   cc1: some warnings being treated as errors

vim +/memblock_virt_alloc_raw +7369 mm/page_alloc.c

  7328	
  7329			/* limit to 1 bucket per 2^scale bytes of low memory */
  7330			if (scale > PAGE_SHIFT)
  7331				numentries >>= (scale - PAGE_SHIFT);
  7332			else
  7333				numentries <<= (PAGE_SHIFT - scale);
  7334	
  7335			/* Make sure we've got at least a 0-order allocation.. */
  7336			if (unlikely(flags & HASH_SMALL)) {
  7337				/* Makes no sense without HASH_EARLY */
  7338				WARN_ON(!(flags & HASH_EARLY));
  7339				if (!(numentries >> *_hash_shift)) {
  7340					numentries = 1UL << *_hash_shift;
  7341					BUG_ON(!numentries);
  7342				}
  7343			} else if (unlikely((numentries * bucketsize) < PAGE_SIZE))
  7344				numentries = PAGE_SIZE / bucketsize;
  7345		}
  7346		numentries = roundup_pow_of_two(numentries);
  7347	
  7348		/* limit allocation size to 1/16 total memory by default */
  7349		if (max == 0) {
  7350			max = ((unsigned long long)nr_all_pages << PAGE_SHIFT) >> 4;
  7351			do_div(max, bucketsize);
  7352		}
  7353		max = min(max, 0x80000000ULL);
  7354	
  7355		if (numentries < low_limit)
  7356			numentries = low_limit;
  7357		if (numentries > max)
  7358			numentries = max;
  7359	
  7360		log2qty = ilog2(numentries);
  7361	
  7362		gfp_flags = (flags & HASH_ZERO) ? GFP_ATOMIC | __GFP_ZERO : GFP_ATOMIC;
  7363		do {
  7364			size = bucketsize << log2qty;
  7365			if (flags & HASH_EARLY) {
  7366				if (flags & HASH_ZERO)
  7367					table = memblock_virt_alloc_nopanic(size, 0);
  7368				else
> 7369					table = memblock_virt_alloc_raw(size, 0);
  7370			} else if (hashdist) {
  7371				table = __vmalloc(size, gfp_flags, PAGE_KERNEL);
  7372			} else {
  7373				/*
  7374				 * If bucketsize is not a power-of-two, we may free
  7375				 * some pages at the end of hash table which
  7376				 * alloc_pages_exact() automatically does
  7377				 */
  7378				if (get_order(size) < MAX_ORDER) {
  7379					table = alloc_pages_exact(size, gfp_flags);
  7380					kmemleak_alloc(table, size, 1, gfp_flags);
  7381				}
  7382			}
  7383		} while (!table && size > PAGE_SIZE && --log2qty);
  7384	
  7385		if (!table)
  7386			panic("Failed to allocate %s hash table\n", tablename);
  7387	
  7388		pr_info("%s hash table entries: %ld (order: %d, %lu bytes)\n",
  7389			tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size);
  7390	
  7391		if (_hash_shift)
  7392			*_hash_shift = log2qty;
  7393		if (_hash_mask)
  7394			*_hash_mask = (1 << log2qty) - 1;
  7395	
  7396		return table;
  7397	}
  7398	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 47112 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4 04/15] mm: discard memblock data later
  2017-08-02 20:38 ` [v4 04/15] mm: discard memblock data later Pavel Tatashin
@ 2017-08-03  4:29   ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2017-08-03  4:29 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390, linux-arm-kernel, x86, kasan-dev, borntraeger,
	heiko.carstens, davem, willy, mhocko

[-- Attachment #1: Type: text/plain, Size: 1983 bytes --]

Hi Pavel,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.13-rc3 next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/complete-deferred-page-initialization/20170803-081025
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: tile-allmodconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=tile 

All errors (new ones prefixed by >>):

   mm/page_alloc.c: In function 'page_alloc_init_late':
>> mm/page_alloc.c:1588:2: error: implicit declaration of function 'memblock_discard'
   cc1: some warnings being treated as errors

vim +/memblock_discard +1588 mm/page_alloc.c

  1567	
  1568	void __init page_alloc_init_late(void)
  1569	{
  1570		struct zone *zone;
  1571	
  1572	#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
  1573		int nid;
  1574	
  1575		/* There will be num_node_state(N_MEMORY) threads */
  1576		atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
  1577		for_each_node_state(nid, N_MEMORY) {
  1578			kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
  1579		}
  1580	
  1581		/* Block until all are initialised */
  1582		wait_for_completion(&pgdat_init_all_done_comp);
  1583	
  1584		/* Reinit limits that are based on free pages after the kernel is up */
  1585		files_maxfiles_init();
  1586	#endif
  1587		/* Discard memblock private memory */
> 1588		memblock_discard();
  1589	
  1590		for_each_populated_zone(zone)
  1591			set_zone_contiguous(zone);
  1592	}
  1593	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49784 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4 13/15] mm: stop zeroing memory during allocation in vmemmap
  2017-08-02 20:38 ` [v4 13/15] mm: stop zeroing memory during allocation in vmemmap Pavel Tatashin
@ 2017-08-03  4:46   ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2017-08-03  4:46 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390, linux-arm-kernel, x86, kasan-dev, borntraeger,
	heiko.carstens, davem, willy, mhocko

[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]

Hi Pavel,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.13-rc3]
[cannot apply to next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/complete-deferred-page-initialization/20170803-081025
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All error/warnings (new ones prefixed by >>):

   mm/sparse.c: In function 'sparse_mem_maps_populate_node':
>> mm/sparse.c:444:8: error: implicit declaration of function 'memblock_virt_alloc_try_nid_raw' [-Werror=implicit-function-declaration]
     map = memblock_virt_alloc_try_nid_raw(size * map_count,
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> mm/sparse.c:444:6: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     map = memblock_virt_alloc_try_nid_raw(size * map_count,
         ^
   cc1: some warnings being treated as errors

vim +/memblock_virt_alloc_try_nid_raw +444 mm/sparse.c

   406	
   407	#ifndef CONFIG_SPARSEMEM_VMEMMAP
   408	struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
   409	{
   410		struct page *map;
   411		unsigned long size;
   412	
   413		map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
   414		if (map)
   415			return map;
   416	
   417		size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
   418		map = memblock_virt_alloc_try_nid(size,
   419						  PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
   420						  BOOTMEM_ALLOC_ACCESSIBLE, nid);
   421		return map;
   422	}
   423	void __init sparse_mem_maps_populate_node(struct page **map_map,
   424						  unsigned long pnum_begin,
   425						  unsigned long pnum_end,
   426						  unsigned long map_count, int nodeid)
   427	{
   428		void *map;
   429		unsigned long pnum;
   430		unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
   431	
   432		map = alloc_remap(nodeid, size * map_count);
   433		if (map) {
   434			for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
   435				if (!present_section_nr(pnum))
   436					continue;
   437				map_map[pnum] = map;
   438				map += size;
   439			}
   440			return;
   441		}
   442	
   443		size = PAGE_ALIGN(size);
 > 444		map = memblock_virt_alloc_try_nid_raw(size * map_count,
   445						      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
   446						      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
   447		if (map) {
   448			for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
   449				if (!present_section_nr(pnum))
   450					continue;
   451				map_map[pnum] = map;
   452				map += size;
   453			}
   454			return;
   455		}
   456	
   457		/* fallback */
   458		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
   459			struct mem_section *ms;
   460	
   461			if (!present_section_nr(pnum))
   462				continue;
   463			map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
   464			if (map_map[pnum])
   465				continue;
   466			ms = __nr_to_section(pnum);
   467			pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
   468			       __func__);
   469			ms->section_mem_map = 0;
   470		}
   471	}
   472	#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
   473	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 45971 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [v4 09/15] sparc64: optimized struct page zeroing
  2017-08-02 20:38 ` [v4 09/15] sparc64: optimized struct page zeroing Pavel Tatashin
@ 2017-08-03  5:15   ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2017-08-03  5:15 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390, linux-arm-kernel, x86, kasan-dev, borntraeger,
	heiko.carstens, davem, willy, mhocko

[-- Attachment #1: Type: text/plain, Size: 4664 bytes --]

Hi Pavel,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.13-rc3 next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/complete-deferred-page-initialization/20170803-081025
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sparc64-allmodconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from mm/page_alloc.c:17:
   mm/page_alloc.c: In function '__init_single_page':
>> include/linux/compiler.h:542:38: error: call to '__compiletime_assert_1171' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct page) != 64
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:525:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:542:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:70:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
>> arch/sparc/include/asm/pgtable_64.h:240:3: note: in expansion of macro 'BUILD_BUG_ON'
      BUILD_BUG_ON(sizeof(struct page) != 64); \
      ^~~~~~~~~~~~
>> mm/page_alloc.c:1171:2: note: in expansion of macro 'mm_zero_struct_page'
     mm_zero_struct_page(page);
     ^~~~~~~~~~~~~~~~~~~

vim +/__compiletime_assert_1171 +542 include/linux/compiler.h

c361d3e5 Daniel Santos 2013-02-21  519  
9a8ab1c3 Daniel Santos 2013-02-21  520  #define __compiletime_assert(condition, msg, prefix, suffix)		\
9a8ab1c3 Daniel Santos 2013-02-21  521  	do {								\
9a8ab1c3 Daniel Santos 2013-02-21  522  		bool __cond = !(condition);				\
9a8ab1c3 Daniel Santos 2013-02-21  523  		extern void prefix ## suffix(void) __compiletime_error(msg); \
9a8ab1c3 Daniel Santos 2013-02-21  524  		if (__cond)						\
9a8ab1c3 Daniel Santos 2013-02-21  525  			prefix ## suffix();				\
9a8ab1c3 Daniel Santos 2013-02-21  526  		__compiletime_error_fallback(__cond);			\
9a8ab1c3 Daniel Santos 2013-02-21  527  	} while (0)
9a8ab1c3 Daniel Santos 2013-02-21  528  
9a8ab1c3 Daniel Santos 2013-02-21  529  #define _compiletime_assert(condition, msg, prefix, suffix) \
9a8ab1c3 Daniel Santos 2013-02-21  530  	__compiletime_assert(condition, msg, prefix, suffix)
9a8ab1c3 Daniel Santos 2013-02-21  531  
9a8ab1c3 Daniel Santos 2013-02-21  532  /**
9a8ab1c3 Daniel Santos 2013-02-21  533   * compiletime_assert - break build and emit msg if condition is false
9a8ab1c3 Daniel Santos 2013-02-21  534   * @condition: a compile-time constant condition to check
9a8ab1c3 Daniel Santos 2013-02-21  535   * @msg:       a message to emit if condition is false
9a8ab1c3 Daniel Santos 2013-02-21  536   *
9a8ab1c3 Daniel Santos 2013-02-21  537   * In tradition of POSIX assert, this macro will break the build if the
9a8ab1c3 Daniel Santos 2013-02-21  538   * supplied condition is *false*, emitting the supplied error message if the
9a8ab1c3 Daniel Santos 2013-02-21  539   * compiler has support to do so.
9a8ab1c3 Daniel Santos 2013-02-21  540   */
9a8ab1c3 Daniel Santos 2013-02-21  541  #define compiletime_assert(condition, msg) \
9a8ab1c3 Daniel Santos 2013-02-21 @542  	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
9a8ab1c3 Daniel Santos 2013-02-21  543  

:::::: The code at line 542 was first introduced by commit
:::::: 9a8ab1c39970a4938a72d94e6fd13be88a797590 bug.h, compiler.h: introduce compiletime_assert & BUILD_BUG_ON_MSG

:::::: TO: Daniel Santos <daniel.santos@pobox.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 51272 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-08-03  5:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-02 20:38 [v4 00/15] complete deferred page initialization Pavel Tatashin
2017-08-02 20:38 ` [v4 01/15] x86/mm: reserve only exiting low pages Pavel Tatashin
2017-08-02 20:38 ` [v4 02/15] x86/mm: setting fields in deferred pages Pavel Tatashin
2017-08-02 20:38 ` [v4 03/15] sparc64/mm: " Pavel Tatashin
2017-08-02 20:38 ` [v4 04/15] mm: discard memblock data later Pavel Tatashin
2017-08-03  4:29   ` kbuild test robot
2017-08-02 20:38 ` [v4 05/15] mm: don't accessed uninitialized struct pages Pavel Tatashin
2017-08-02 20:38 ` [v4 06/15] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-08-02 20:38 ` [v4 07/15] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
2017-08-02 20:38 ` [v4 08/15] mm: zero struct pages during initialization Pavel Tatashin
2017-08-02 20:38 ` [v4 09/15] sparc64: optimized struct page zeroing Pavel Tatashin
2017-08-03  5:15   ` kbuild test robot
2017-08-02 20:38 ` [v4 10/15] x86/kasan: explicitly zero kasan shadow memory Pavel Tatashin
2017-08-02 20:38 ` [v4 11/15] arm64/kasan: " Pavel Tatashin
2017-08-02 20:38 ` [v4 12/15] mm: explicitly zero pagetable memory Pavel Tatashin
2017-08-02 20:38 ` [v4 13/15] mm: stop zeroing memory during allocation in vmemmap Pavel Tatashin
2017-08-03  4:46   ` kbuild test robot
2017-08-02 20:38 ` [v4 14/15] mm: optimize early system hash allocations Pavel Tatashin
2017-08-03  4:29   ` kbuild test robot
2017-08-02 20:38 ` [v4 15/15] mm: debug for raw alloctor Pavel Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).