From: Ingo Molnar <mingo@kernel.org>
To: Nathan Zimmer <nzimmer@sgi.com>
Cc: holt@sgi.com, travis@sgi.com, rob@landley.net,
tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
yinghai@kernel.org, akpm@linux-foundation.org,
gregkh@linuxfoundation.org, x86@kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [RFC 2/2] x86_64, mm: Reinsert the absent memory
Date: Sun, 23 Jun 2013 11:28:40 +0200 [thread overview]
Message-ID: <20130623092840.GB13445@gmail.com> (raw)
In-Reply-To: <1371831934-156971-3-git-send-email-nzimmer@sgi.com>
* Nathan Zimmer <nzimmer@sgi.com> wrote:
> The memory we set aside in the previous patch needs to be reinserted.
> We start this process via late_initcall so we will have multiple cpus to do
> the work.
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> ---
> arch/x86/kernel/e820.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++++
> drivers/base/memory.c | 83 +++++++++++++++++++++++++++++++
> include/linux/memory.h | 5 ++
> 3 files changed, 217 insertions(+)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 3752dc5..d31039d 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -23,6 +23,7 @@
>
> #ifdef CONFIG_DELAY_MEM_INIT
> #include <linux/memory.h>
> +#include <linux/delay.h>
> #endif
>
> #include <asm/e820.h>
> @@ -397,6 +398,22 @@ static u64 min_region_size; /* min size of region to slice from */
> static u64 pre_region_size; /* multiply bsize for node low memory */
> static u64 post_region_size; /* multiply bsize for node high memory */
>
> +static unsigned long add_absent_work_start_time;
> +static unsigned long add_absent_work_stop_time;
> +static unsigned int add_absent_job_count;
> +static atomic_t add_absent_work_count;
> +
> +struct absent_work {
> + struct work_struct work;
> + struct absent_work *next;
> + atomic_t busy;
> + int cpu;
> + int node;
> + int index;
> +};
> +static DEFINE_PER_CPU(struct absent_work, absent_work);
> +static struct absent_work *first_absent_work;
That's 4.5 GB/sec initialization speed - that feels a bit slow and the
boot time effect should be felt on smaller 'a couple of gigabytes' desktop
boxes as well. Do we know exactly where the 2 hours of boot time on a 32
TB system is spent?
While you cannot profile the boot process (yet), you could try your
delayed patch and run a "perf record -g" call-graph profiling of the
late-time initialization routines. What does 'perf report' show?
Delayed initialization makes sense I guess because 32 TB is a lot of
memory - I'm just wondering whether there's some low hanging fruits left
in the mem init code, that code is certainly not optimized for
performance.
Plus with a struct page size of around 64 bytes (?) 32 TB of RAM has 512
GB of struct page arrays alone. Initializing those will take quite some
time as well - and I suspect they are allocated via zeroing them first. If
that memset() exists then getting rid of it might be a good move as well.
Yet another thing to consider would be to implement an initialization
speedup of 3 orders of magnitude: initialize on the large page (2MB)
grandularity and on-demand delay the initialization of the 4K granular
struct pages [but still allocating them] - which I suspect are a good
chunk of the overhead? That way we could initialize in 2MB steps and speed
up the 2 hours bootup of 32 TB of RAM to 14 seconds...
[ The cost would be one more branch in the buddy allocator, to detect
not-yet-initialized 2 MB chunks as we encounter them. Acceptable I
think. ]
Thanks,
Ingo
next prev parent reply other threads:[~2013-06-23 9:28 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-21 16:25 [RFC 0/2] Delay initializing of large sections of memory Nathan Zimmer
2013-06-21 16:25 ` [RFC 1/2] x86_64, mm: Delay initializing large portion " Nathan Zimmer
2013-06-25 4:14 ` Rob Landley
2013-06-21 16:25 ` [RFC 2/2] x86_64, mm: Reinsert the absent memory Nathan Zimmer
2013-06-23 9:28 ` Ingo Molnar [this message]
2013-06-23 9:32 ` Ingo Molnar
2013-06-24 17:38 ` H. Peter Anvin
2013-06-24 19:39 ` Ingo Molnar
2013-06-24 20:08 ` H. Peter Anvin
2013-06-25 7:31 ` Ingo Molnar
2013-06-24 20:36 ` Nathan Zimmer
2013-06-25 7:38 ` Ingo Molnar
2013-06-25 15:07 ` H. Peter Anvin
2013-06-25 17:19 ` Mike Travis
2013-06-25 17:22 ` Mike Travis
2013-06-25 18:43 ` H. Peter Anvin
2013-06-25 18:51 ` Mike Travis
2013-06-26 9:22 ` [RFC] Transparent on-demand memory setup initialization embedded in the (GFP) buddy allocator Ingo Molnar
2013-06-26 13:28 ` Andrew Morton
2013-06-26 13:37 ` Ingo Molnar
2013-06-26 15:02 ` Nathan Zimmer
2013-06-26 16:15 ` Mike Travis
2013-06-26 12:14 ` [RFC 2/2] x86_64, mm: Reinsert the absent memory Ingo Molnar
2013-06-26 14:49 ` Nathan Zimmer
2013-06-26 15:12 ` Dave Hansen
2013-06-26 15:20 ` Nathan Zimmer
2013-06-26 15:58 ` Ingo Molnar
2013-06-26 16:11 ` Nathan Zimmer
2013-06-26 16:07 ` Mike Travis
2013-06-21 16:51 ` [RFC 0/2] Delay initializing of large sections of memory Greg KH
2013-06-21 17:03 ` H. Peter Anvin
2013-06-21 17:18 ` Nathan Zimmer
2013-06-21 17:28 ` H. Peter Anvin
2013-06-21 20:05 ` Nathan Zimmer
2013-06-21 20:08 ` H. Peter Anvin
2013-06-21 20:33 ` Nathan Zimmer
2013-06-21 21:36 ` Mike Travis
2013-06-21 21:07 ` Mike Travis
2013-06-21 18:44 ` Yinghai Lu
2013-06-21 18:50 ` Greg KH
2013-06-21 19:10 ` Yinghai Lu
2013-06-21 19:19 ` Nathan Zimmer
2013-06-21 20:28 ` Yinghai Lu
2013-06-21 20:40 ` Nathan Zimmer
2013-06-21 21:30 ` Mike Travis
2013-06-22 0:23 ` Yinghai Lu
2013-06-25 17:35 ` Mike Travis
2013-06-25 18:17 ` H. Peter Anvin
2013-06-25 18:40 ` Mike Travis
2013-06-25 18:40 ` Yinghai Lu
2013-06-25 18:44 ` H. Peter Anvin
2013-06-25 18:58 ` Mike Travis
2013-06-25 19:03 ` Yinghai Lu
2013-06-25 19:09 ` H. Peter Anvin
2013-06-25 19:28 ` Yinghai Lu
2013-06-27 6:37 ` Yinghai Lu
2013-06-27 11:05 ` Robin Holt
2013-06-27 15:50 ` Mike Travis
2013-06-26 9:23 ` Ingo Molnar
2013-06-25 18:38 ` Yinghai Lu
2013-06-25 18:42 ` Mike Travis
2013-06-21 18:36 ` Yinghai Lu
2013-06-21 18:44 ` Greg Kroah-Hartman
2013-06-21 19:00 ` Yinghai Lu
2013-06-21 21:28 ` Mike Travis
2013-06-21 21:19 ` Mike Travis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130623092840.GB13445@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=holt@sgi.com \
--cc=hpa@zytor.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nzimmer@sgi.com \
--cc=rob@landley.net \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=travis@sgi.com \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.