From: Ingo Molnar <mingo@kernel.org>
To: Mike Travis <travis@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Nathan Zimmer <nzimmer@sgi.com>,
holt@sgi.com, rob@landley.net, tglx@linutronix.de,
mingo@redhat.com, yinghai@kernel.org, akpm@linux-foundation.org,
gregkh@linuxfoundation.org, x86@kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: [RFC] Transparent on-demand memory setup initialization embedded in the (GFP) buddy allocator
Date: Wed, 26 Jun 2013 11:22:48 +0200 [thread overview]
Message-ID: <20130626092248.GB27025@gmail.com> (raw)
In-Reply-To: <51C9E6CD.5080508@sgi.com>
(Changed the subject, to make it more apparent what we are talking about.)
* Mike Travis <travis@sgi.com> wrote:
> On 6/25/2013 11:43 AM, H. Peter Anvin wrote:
> > On 06/25/2013 10:22 AM, Mike Travis wrote:
> >>
> >> On 6/25/2013 12:38 AM, Ingo Molnar wrote:
> >>>
> >>> * Nathan Zimmer <nzimmer@sgi.com> wrote:
> >>>
> >>>> On Sun, Jun 23, 2013 at 11:28:40AM +0200, Ingo Molnar wrote:
> >>>>>
> >>>>> That's 4.5 GB/sec initialization speed - that feels a bit slow and the
> >>>>> boot time effect should be felt on smaller 'a couple of gigabytes'
> >>>>> desktop boxes as well. Do we know exactly where the 2 hours of boot
> >>>>> time on a 32 TB system is spent?
> >>>>
> >>>> There are other several spots that could be improved on a large system
> >>>> but memory initialization is by far the biggest.
> >>>
> >>> My feeling is that deferred/on-demand initialization triggered from the
> >>> buddy allocator is the better long term solution.
> >>
> >> I haven't caught up with all of Nathan's changes yet (just
> >> got back from vacation), but there was an option to either
> >> start the memory insertion on boot, or trigger it later
> >> using the /sys/.../memory interface. There is also a monitor
> >> program that calculates the memory insertion rate. This was
> >> extremely useful to determine how changes in the kernel
> >> affected the rate.
> >>
> >
> > Sorry, I *totally* did not follow that comment. It seemed like a
> > complete non-sequitur?
> >
> > -hpa
>
> It was I who was not following the question. I'm still reverting
> back to "work mode".
>
> [There is more code in a separate patch that Nate has not sent
> yet that instructs the kernel to start adding memory as early
> as possible, or not. That way you can start the insertion process
> later and monitor it's progress to determine how changes in the
> kernel affect that process. It is controlled by a separate
> CONFIG option.]
So, just to repeat (and expand upon) the solution hpa and me suggests:
it's not based on /sys, delayed initialization lists or any similar
(essentially memory hot plug based) approach.
It's a transparent on-demand initialization scheme based on only
initializing the very early memory setup in 1GB (2MB) steps (not in 4K
steps like we do it today).
Any subsequent split-up initialization is done on-demand, in alloc_pages()
et al, initilizing a batch of 512 (or 1024) struct page head's when an
uninitialized portion is first encountered.
This leaves the principle logic of early init largely untouched, we still
have the same amount of RAM during and after bootup, except that on 32 TB
systems we don't spend ~2 hours initializing 8,589,934,592 page heads.
This scheme could be implemented by introducing a new PG_initialized flag,
which is seen by an unlikely() branch in alloc_pages() and which triggers
the on-demand initialization of pages.
[ It could probably be made zero-cost for the post-initialization state:
we already check a bunch of rare PG_ flags, one more flag would not
introduce any new branch in the page allocation hot path. ]
It's a technically different solution from what was submitted in this
thread.
Cons:
- it works after bootup, via GFP. If done in a simple fashion it adds one
more branch to the GFP fastpath. [ If done a bit more cleverly it can
merge into an existing unlikely() branch and become essentially
zero-cost for the fastpath. ]
- it adds an initialization non-determinism to GFP, to the tune of
initializing ~512 page heads when RAM is utilized first.
- initialization is done when memory is needed - not during or shortly
after bootup. This (slightly) increases first-use overhead. [I don't
think this factor is significant - and I think we'll quickly see
speedups to initialization, once the overhead becomes more easily
measurable.]
Pros:
- it's transparent to the boot process. ('free' shows the same full
amount of RAM all the time, there's no weird effects of RAM coming
online asynchronously. You see all the RAM you have - etc.)
- it helps the boot time of every single Linux system, not just large RAM
ones. On a smallish, 4GB system memory init can take up precious
hundreds of milliseconds, so this is a practical issue.
- it spreads initialization overhead to later portions of the system's
life time: when there's typically more idle time and more paralellism
available.
- initialization overhead, because it's a natural part of first-time
memory allocation with this scheme, becomes more measurable (and thus
more prominently optimized) than any deferred lists processed in the
background.
- as an added bonus it probably speeds up your usecase even more than the
patches you are providing: on a 32 TB system the primary initialization
would only have to enumerate memory, allocate page heads and buddy
bitmaps, and initialize the 1GB granular page heads: there's only 32768
of them.
So unless I overlooked some factor this scheme would be unconditional
goodness for everyone.
Thanks,
Ingo
next prev parent reply other threads:[~2013-06-26 9:22 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-21 16:25 [RFC 0/2] Delay initializing of large sections of memory Nathan Zimmer
2013-06-21 16:25 ` [RFC 1/2] x86_64, mm: Delay initializing large portion " Nathan Zimmer
2013-06-25 4:14 ` Rob Landley
2013-06-21 16:25 ` [RFC 2/2] x86_64, mm: Reinsert the absent memory Nathan Zimmer
2013-06-23 9:28 ` Ingo Molnar
2013-06-23 9:32 ` Ingo Molnar
2013-06-24 17:38 ` H. Peter Anvin
2013-06-24 19:39 ` Ingo Molnar
2013-06-24 20:08 ` H. Peter Anvin
2013-06-25 7:31 ` Ingo Molnar
2013-06-24 20:36 ` Nathan Zimmer
2013-06-25 7:38 ` Ingo Molnar
2013-06-25 15:07 ` H. Peter Anvin
2013-06-25 17:19 ` Mike Travis
2013-06-25 17:22 ` Mike Travis
2013-06-25 18:43 ` H. Peter Anvin
2013-06-25 18:51 ` Mike Travis
2013-06-26 9:22 ` Ingo Molnar [this message]
2013-06-26 13:28 ` [RFC] Transparent on-demand memory setup initialization embedded in the (GFP) buddy allocator Andrew Morton
2013-06-26 13:37 ` Ingo Molnar
2013-06-26 15:02 ` Nathan Zimmer
2013-06-26 16:15 ` Mike Travis
2013-06-26 12:14 ` [RFC 2/2] x86_64, mm: Reinsert the absent memory Ingo Molnar
2013-06-26 14:49 ` Nathan Zimmer
2013-06-26 15:12 ` Dave Hansen
2013-06-26 15:20 ` Nathan Zimmer
2013-06-26 15:58 ` Ingo Molnar
2013-06-26 16:11 ` Nathan Zimmer
2013-06-26 16:07 ` Mike Travis
2013-06-21 16:51 ` [RFC 0/2] Delay initializing of large sections of memory Greg KH
2013-06-21 17:03 ` H. Peter Anvin
2013-06-21 17:18 ` Nathan Zimmer
2013-06-21 17:28 ` H. Peter Anvin
2013-06-21 20:05 ` Nathan Zimmer
2013-06-21 20:08 ` H. Peter Anvin
2013-06-21 20:33 ` Nathan Zimmer
2013-06-21 21:36 ` Mike Travis
2013-06-21 21:07 ` Mike Travis
2013-06-21 18:44 ` Yinghai Lu
2013-06-21 18:50 ` Greg KH
2013-06-21 19:10 ` Yinghai Lu
2013-06-21 19:19 ` Nathan Zimmer
2013-06-21 20:28 ` Yinghai Lu
2013-06-21 20:40 ` Nathan Zimmer
2013-06-21 21:30 ` Mike Travis
2013-06-22 0:23 ` Yinghai Lu
2013-06-25 17:35 ` Mike Travis
2013-06-25 18:17 ` H. Peter Anvin
2013-06-25 18:40 ` Mike Travis
2013-06-25 18:40 ` Yinghai Lu
2013-06-25 18:44 ` H. Peter Anvin
2013-06-25 18:58 ` Mike Travis
2013-06-25 19:03 ` Yinghai Lu
2013-06-25 19:09 ` H. Peter Anvin
2013-06-25 19:28 ` Yinghai Lu
2013-06-27 6:37 ` Yinghai Lu
2013-06-27 11:05 ` Robin Holt
2013-06-27 15:50 ` Mike Travis
2013-06-26 9:23 ` Ingo Molnar
2013-06-25 18:38 ` Yinghai Lu
2013-06-25 18:42 ` Mike Travis
2013-06-21 18:36 ` Yinghai Lu
2013-06-21 18:44 ` Greg Kroah-Hartman
2013-06-21 19:00 ` Yinghai Lu
2013-06-21 21:28 ` Mike Travis
2013-06-21 21:19 ` Mike Travis
-- strict thread matches above, loose matches on Subject: below --
2013-06-27 3:35 [RFC] Transparent on-demand memory setup initialization embedded in the (GFP) buddy allocator Daniel J Blueman
2013-06-28 20:37 ` Nathan Zimmer
2013-06-29 7:24 ` Ingo Molnar
2013-06-29 18:03 ` Nathan Zimmer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130626092248.GB27025@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=holt@sgi.com \
--cc=hpa@zytor.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nzimmer@sgi.com \
--cc=rob@landley.net \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=travis@sgi.com \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).