From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>, Nathan Zimmer <nzimmer@sgi.com>,
Daniel Rahn <drahn@suse.com>, Davidlohr Bueso <dbueso@suse.com>,
Dave Hansen <dave.hansen@intel.com>, Tom Vaden <tom.vaden@hp.com>,
Scott Norton <scott.norton@hp.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/14] Parallel memory initialisation
Date: Thu, 16 Apr 2015 09:46:09 +0100 [thread overview]
Message-ID: <20150416084609.GM14842@suse.de> (raw)
In-Reply-To: <20150416002501.e9615db6.akpm@linux-foundation.org>
On Thu, Apr 16, 2015 at 12:25:01AM -0700, Andrew Morton wrote:
> On Mon, 13 Apr 2015 11:16:52 +0100 Mel Gorman <mgorman@suse.de> wrote:
>
> > Memory initialisation
>
> I wish we didn't call this "memory initialization". Because memory
> initialization is memset(), and that isn't what we're doing here.
>
> Installation? Bringup?
>
It's about linking the struct pages to their physical page frame so
"Parallel struct page initialisation"?
> > had been identified as one of the reasons why large
> > machines take a long time to boot. Patches were posted a long time ago
> > that attempted to move deferred initialisation into the page allocator
> > paths. This was rejected on the grounds it should not be necessary to hurt
> > the fast paths to parallelise initialisation. This series reuses much of
> > the work from that time but defers the initialisation of memory to kswapd
> > so that one thread per node initialises memory local to that node. The
> > issue is that on the machines I tested with, memory initialisation was not
> > a major contributor to boot times. I'm posting the RFC to both review the
> > series and see if it actually helps users of very large machines.
> >
> > ...
> >
> > 15 files changed, 507 insertions(+), 98 deletions(-)
>
> Sadface at how large and complex this is.
The vast bulk of the complexity is in one patch "mm: meminit: Initialise
remaining memory in parallel with kswapd" which is
mm/internal.h | 6 +++++
mm/mm_init.c | 1 +
mm/page_alloc.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
mm/vmscan.c | 6 +++--
4 files changed, 123 insertions(+), 6 deletions(-)
Most of that is a fairly straight-forward walk through zones and pfns with
bounds checking. A lot of the rest of the complexity is helpers which are
very similar to existing helpers (but not suitable for sharing code) and
optimisations. The optimisations in later patches cut the parallel struct
page initialisation time by 80%.
> I'd hoped the way we were
> going to do this was by bringing up a bit of memory to get booted up,
> then later on we just fake a bunch of memory hot-add operations. So
> the new code would be pretty small and quite high-level.
That ends up being very complex but of a very different shape. We would
still have to prevent the sections being initialised similar to what this
series does already except the zone boundaries are lower. It's not as
simple as faking mem= because we want local memory on each node during
initialisation.
Later after device_init when sysfs is setup we would then have to walk all
possible sections to discover pluggable memory and hot-add them. However,
when doing it, we would want to first discover what node that section is
local to and ideally skip over the ones that are not local to the thread
doing the work. This means all threads have to scan all sections instead
of this approach which can walk within its own PFN. It then adds pages
one at a time which is slow although obviously that part could be addressed.
This would be harder to co-ordinate as kswapd is up and running before
the memory hot-add structures are finalised so it would need either a
semaphore or different threads to do the initialisation. The user-visible
impact is then that early in boot, the total amount of memory appears to
be rapidly increasing instead of this approach where the amount of free
memory is increasing.
Conceptually it's straight forward but the details end up being a lot
more complex than this approach.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-04-16 8:46 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-13 10:16 [RFC PATCH 0/14] Parallel memory initialisation Mel Gorman
2015-04-13 10:16 ` [PATCH 01/14] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman
2015-04-13 10:16 ` [PATCH 02/14] mm: meminit: Move page initialization into a separate function Mel Gorman
2015-04-13 10:16 ` [PATCH 03/14] mm: meminit: Only set page reserved in the memblock region Mel Gorman
2015-04-13 10:16 ` [PATCH 04/14] mm: page_alloc: Pass PFN to __free_pages_bootmem Mel Gorman
2015-04-13 10:16 ` [PATCH 05/14] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid Mel Gorman
2015-04-13 10:16 ` [PATCH 06/14] mm: meminit: Inline some helper functions Mel Gorman
2015-04-13 10:16 ` [PATCH 07/14] mm: meminit: Partially initialise memory if CONFIG_DEFERRED_MEM_INIT is set Mel Gorman
2015-04-13 10:17 ` [PATCH 08/14] mm: meminit: Initialise remaining memory in parallel with kswapd Mel Gorman
2015-04-13 10:17 ` [PATCH 09/14] mm: meminit: Minimise number of pfn->page lookups during initialisation Mel Gorman
2015-04-13 10:17 ` [PATCH 10/14] x86: mm: Enable deferred memory initialisation on x86-64 Mel Gorman
2015-04-13 18:21 ` Paul Bolle
2015-04-13 10:17 ` [PATCH 11/14] mm: meminit: Control parallel memory initialisation from command line and config Mel Gorman
2015-04-13 10:17 ` [PATCH 12/14] mm: meminit: Free pages in large chunks where possible Mel Gorman
2015-04-13 10:17 ` [PATCH 13/14] mm: meminit: Reduce number of times pageblocks are set during initialisation Mel Gorman
2015-04-13 10:17 ` [PATCH 14/14] mm: meminit: Remove mminit_verify_page_links Mel Gorman
2015-04-13 10:29 ` [RFC PATCH 0/14] Parallel memory initialisation Mel Gorman
2015-04-15 13:15 ` Waiman Long
2015-04-15 13:38 ` Mel Gorman
2015-04-15 14:50 ` Waiman Long
2015-04-15 15:44 ` Mel Gorman
2015-04-15 21:37 ` nzimmer
2015-04-16 18:20 ` Waiman Long
2015-04-15 14:27 ` Peter Zijlstra
2015-04-15 14:34 ` Mel Gorman
2015-04-15 14:48 ` Peter Zijlstra
2015-04-15 16:18 ` Waiman Long
2015-04-15 16:42 ` Norton, Scott J
2015-04-16 7:25 ` Andrew Morton
2015-04-16 8:46 ` Mel Gorman [this message]
2015-04-16 17:26 ` Andrew Morton
2015-04-16 17:37 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2015-04-16 7:51 Daniel J Blueman
2015-04-20 3:15 ` Daniel J Blueman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150416084609.GM14842@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=dbueso@suse.com \
--cc=drahn@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nzimmer@sgi.com \
--cc=scott.norton@hp.com \
--cc=tom.vaden@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).