Re: [v3 0/9] parallelized "struct page" zeroing

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@kernel.org>
To: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org,
	linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, borntraeger@de.ibm.com,
	heiko.carstens@de.ibm.com, davem@davemloft.net
Subject: Re: [v3 0/9] parallelized "struct page" zeroing
Date: Tue, 9 May 2017 20:12:34 +0200	[thread overview]
Message-ID: <20170509181234.GA4397@dhcp22.suse.cz> (raw)
In-Reply-To: <1494003796-748672-1-git-send-email-pasha.tatashin@oracle.com>

On Fri 05-05-17 13:03:07, Pavel Tatashin wrote:
> Changelog:
> 	v2 - v3
> 	- Addressed David's comments about one change per patch:
> 		* Splited changes to platforms into 4 patches
> 		* Made "do not zero vmemmap_buf" as a separate patch
> 	v1 - v2
> 	- Per request, added s390 to deferred "struct page" zeroing
> 	- Collected performance data on x86 which proofs the importance to
> 	  keep memset() as prefetch (see below).
> 
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.

I like the idea of postponing the zeroing from the allocation to the
init time. To be honest the improvement looks much larger than I would
expect (Btw. this should be a part of the changelog rather than a
outside link).

The implementation just looks too large to what I would expect. E.g. do
we really need to add zero argument to the large part of the memblock
API? Wouldn't it be easier to simply export memblock_virt_alloc_internal
(or its tiny wrapper memblock_virt_alloc_core) and move the zeroing
outside to its 2 callers? A completely untested scratched version at the
end of the email.

Also it seems that this is not 100% correct either as it only cares
about VMEMMAP while DEFERRED_STRUCT_PAGE_INIT might be enabled also for
SPARSEMEM. This would suggest that we would zero out pages twice,
right?

A similar concern would go to the memory hotplug patch which will
fall back to the slab/page allocator IIRC. On the other hand
__init_single_page is shared with the hotplug code so again we would
initialize 2 times.

So I suspect more changes are needed. I will have a closer look tomorrow.

>  arch/powerpc/mm/init_64.c |    4 +-
>  arch/s390/mm/vmem.c       |    5 ++-
>  arch/sparc/mm/init_64.c   |   26 +++++++----------------
>  arch/x86/mm/init_64.c     |    3 +-
>  include/linux/bootmem.h   |    3 ++
>  include/linux/mm.h        |   15 +++++++++++--
>  mm/memblock.c             |   46 ++++++++++++++++++++++++++++++++++++------
>  mm/page_alloc.c           |    3 ++
>  mm/sparse-vmemmap.c       |   48 +++++++++++++++++++++++++++++---------------
>  9 files changed, 103 insertions(+), 50 deletions(-)


The bootmem API change mentioned above.

 include/linux/bootmem.h |  3 +++
 mm/memblock.c           | 41 ++++++++++++++++++++++++++---------------
 mm/sparse-vmemmap.c     |  2 +-
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 962164d36506..c9a08463d9a8 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 #define BOOTMEM_ALLOC_ANYWHERE		(~(phys_addr_t)0)
 
 /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void * memblock_virt_alloc_core(phys_addr_t size, phys_addr_t align,
+				phys_addr_t min_addr, phys_addr_t max_addr,
+				int nid);
 void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
 		phys_addr_t align, phys_addr_t min_addr,
 		phys_addr_t max_addr, int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index b049c9b2dba8..eab7da94f873 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1271,8 +1271,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
  *
  * The memory block is aligned on SMP_CACHE_BYTES if @align == 0.
  *
- * The phys address of allocated boot memory block is converted to virtual and
- * allocated memory is reset to 0.
+ * The function has to be zeroed out explicitly.
  *
  * In addition, function sets the min_count to 0 using kmemleak_alloc for
  * allocated boot memory block, so that it is never reported as leaks.
@@ -1280,15 +1279,18 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
  * RETURNS:
  * Virtual address of allocated memory block on success, NULL on failure.
  */
-static void * __init memblock_virt_alloc_internal(
+static inline void * __init memblock_virt_alloc_internal(
 				phys_addr_t size, phys_addr_t align,
 				phys_addr_t min_addr, phys_addr_t max_addr,
-				int nid)
+				int nid, void *caller)
 {
 	phys_addr_t alloc;
 	void *ptr;
 	ulong flags = choose_memblock_flags();
 
+	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
+		     (u64)max_addr, caller);
 	if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
 		nid = NUMA_NO_NODE;
 
@@ -1334,7 +1336,6 @@ static void * __init memblock_virt_alloc_internal(
 	return NULL;
 done:
 	ptr = phys_to_virt(alloc);
-	memset(ptr, 0, size);
 
 	/*
 	 * The min_count is set to 0 so that bootmem allocated blocks
@@ -1347,6 +1348,14 @@ static void * __init memblock_virt_alloc_internal(
 	return ptr;
 }
 
+void * __init memblock_virt_alloc_core(phys_addr_t size, phys_addr_t align,
+				phys_addr_t min_addr, phys_addr_t max_addr,
+				int nid)
+{
+	return memblock_virt_alloc_internal(size, align, min_addr, max_addr, nid,
+			(void *)_RET_IP_);
+}
+
 /**
  * memblock_virt_alloc_try_nid_nopanic - allocate boot memory block
  * @size: size of memory block to be allocated in bytes
@@ -1369,11 +1378,14 @@ void * __init memblock_virt_alloc_try_nid_nopanic(
 				phys_addr_t min_addr, phys_addr_t max_addr,
 				int nid)
 {
-	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
-		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
-		     (u64)max_addr, (void *)_RET_IP_);
-	return memblock_virt_alloc_internal(size, align, min_addr,
-					     max_addr, nid);
+	void *ptr;
+
+	ptr = memblock_virt_alloc_internal(size, align, min_addr,
+					     max_addr, nid, (void *)_RET_IP_);
+	if (ptr)
+		memset(ptr, 0, size);
+
+	return ptr;
 }
 
 /**
@@ -1401,13 +1413,12 @@ void * __init memblock_virt_alloc_try_nid(
 {
 	void *ptr;
 
-	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
-		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
-		     (u64)max_addr, (void *)_RET_IP_);
 	ptr = memblock_virt_alloc_internal(size, align,
-					   min_addr, max_addr, nid);
-	if (ptr)
+					   min_addr, max_addr, nid, (void *)_RET_IP_);
+	if (ptr) {
+		memset(ptr, 0, size);
 		return ptr;
+	}
 
 	panic("%s: Failed to allocate %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx\n",
 	      __func__, (u64)size, (u64)align, nid, (u64)min_addr,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a56c3989f773..4e060f0f9fe5 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -41,7 +41,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long align,
 				unsigned long goal)
 {
-	return memblock_virt_alloc_try_nid(size, align, goal,
+	return memblock_virt_alloc_core(size, align, goal,
 					    BOOTMEM_ALLOC_ACCESSIBLE, node);
 }
 
-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2017-05-09 18:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-05 17:03 [v3 0/9] parallelized "struct page" zeroing Pavel Tatashin
2017-05-05 17:03 ` [v3 1/9] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-05-05 17:03 ` [v3 2/9] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
2017-05-05 17:03 ` [v3 3/9] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
2017-05-13 19:17   ` kbuild test robot
2017-05-05 17:03 ` [v3 4/9] mm: do not zero vmemmap_buf Pavel Tatashin
2017-05-05 17:03 ` [v3 5/9] mm: zero struct pages during initialization Pavel Tatashin
2017-05-05 17:03 ` [v3 6/9] sparc64: teach sparc not to zero struct pages memory Pavel Tatashin
2017-05-05 17:03 ` [v3 7/9] x86: teach x86 " Pavel Tatashin
2017-05-05 17:03 ` [v3 8/9] powerpc: teach platforms " Pavel Tatashin
2017-05-05 17:03 ` [v3 9/9] s390: " Pavel Tatashin
2017-05-08 11:36   ` Heiko Carstens
2017-05-15 18:24     ` Pasha Tatashin
2017-05-15 23:17       ` Heiko Carstens
2017-05-16  0:33         ` Pasha Tatashin
2017-05-09 18:12 ` Michal Hocko [this message]
2017-05-09 18:54   ` [v3 0/9] parallelized "struct page" zeroing Pasha Tatashin
2017-05-10  7:24     ` Michal Hocko
2017-05-10 13:42       ` Pasha Tatashin
2017-05-10 14:57         ` Michal Hocko
2017-05-10 15:01           ` Pasha Tatashin
2017-05-10 15:20             ` David Miller
2017-05-11 20:47             ` Pasha Tatashin
2017-05-11 20:59               ` Pasha Tatashin
2017-05-12 16:57                 ` David Miller
2017-05-12 17:24                   ` Pasha Tatashin
2017-05-12 17:37                     ` David Miller
2017-05-16 23:50                       ` Benjamin Herrenschmidt
2017-05-12 16:56               ` David Miller
2017-05-10 15:19           ` David Miller
2017-05-10 17:17             ` Matthew Wilcox
2017-05-10 18:00               ` David Miller
2017-05-10 21:11                 ` Matthew Wilcox
2017-05-11  8:05             ` Michal Hocko
2017-05-11 14:35               ` David Miller
2017-05-15 18:12   ` Pasha Tatashin
2017-05-15 19:38     ` Michal Hocko
2017-05-15 20:44       ` Pasha Tatashin
2017-05-16  8:36         ` Michal Hocko
2017-05-26 16:45           ` Pasha Tatashin
2017-05-29 11:53             ` Michal Hocko
2017-05-30 17:16               ` Pasha Tatashin
2017-05-31 16:31                 ` Michal Hocko
2017-05-31 16:51                   ` David Miller
2017-06-01  3:35                   ` Pasha Tatashin
2017-06-01  8:46                     ` Michal Hocko

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:962164d3650 dfblob:c9a08463d9a dfblob:b049c9b2dba
dfblob:eab7da94f87 dfblob:a56c3989f77 dfblob:4e060f0f9fe )
 OR (
bs:"Re: [v3 0/9] parallelized "
bs:"struct page"
bs:" zeroing" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170509181234.GA4397@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=borntraeger@de.ibm.com \
    --cc=davem@davemloft.net \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=pasha.tatashin@oracle.com \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).