mkinitrd unification across distributions
 help / color / mirror / Atom feed
From: John Reiser <jreiser-Po6cBsTGB2ZWk0Htik3J/w@public.gmane.org>
To: initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: building initramfs is slow
Date: Thu, 18 Aug 2011 11:18:33 -0700	[thread overview]
Message-ID: <4E4D5779.6090209@bitwagon.com> (raw)

Building an initramfs is unreasonably slow.  On Fedora 16
dracut-011 takes almost a minute when installing a new kernel:
   real   56s
   user   20s   \__   dracut is CPU bound, not I/O bound.
   sys    31s   /
The final "gzip -9" takes 12 seconds, and the cpio 1 second,
which leaves 43 seconds for the rest of dracut.  That's a
factor of 3 or 4 too long.  The output initramfs is 14.9MB
(41MB unzipped) and contains 1619 files, including 367 .ko
kernel modules.

Running
   strace -o strace.out -f -e trace=execve dracut test.img
and applying some text processing to strace.out shows
  12518 SIGCHLD   (processes terminated)
So dracut fondles each file with an average of (12518 / 1619)
= 7.7 processes.  No wonder building an initramfs is slow!

Again in strace.out:
   8917 execve    (address-space images instantiated)
and taking (#SIGCHLD - #execve) gives:
   3591 fork-and-no-exec  (shell builtins that need a process)
because there is almost no chaining of execve without a fork.

The sorted histogram of execve begins:
   3803 execve("/bin/egrep"
   1343 execve("/bin/cp"
    858 execve("/lib64/ld-linux-x86-64.so.2"
    760 execve("/usr/bin/ldd"
    375 execve("/sbin/modinfo"
    359 execve("/bin/chmod"
    344 execve("/bin/rm"
    341 execve("/sbin/modprobe"
    256 execve("/bin/mkdir"
    222 execve("/bin/readlink"
    100 execve("/bin/cat"

This data, and a glance at the source of dracut, suggests
considering the bash shell regexp operator "[[ string =~ pattern ]]"
and the expansion substitution operator "${parameter/pattern/string}"
to replace most instances of egrep.

The uses of cp, ldd, chmod, and modinfo should be investigated for
the possibility of batching more than one file at a time.  Operating
inside one directory at a time can effectively remove the threat of
exceeding the 32KB limit on the arglist to execve.

Using pipelines (possibly including bash's "while read fname ; do")
to filter streamed lists of filenames can reduce overhead significantly
in contrast to "for fname in ...; do <<execve>>".  A pipeline may also
introduce effective parallelism.

"sort --uniq" handily removes duplicates.

In most cases "cat filename |" should be replaced with ordinary
redirection "< filename", and similarly "$(cat filename)" should
be "$(< filename)".  If SELinux denies access by dracut (etc.)
but allows /bin/cat, then such a comment is REQUIRED.

Yes, I'm going to work on it.

-- 

             reply	other threads:[~2011-08-18 18:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-18 18:18 John Reiser [this message]
     [not found] ` <CALAkbJOkMTQdkmhBBvqHk3oKRzMHvXcp1MxasMrMpCbTP3+0eg@mail.gmail.com>
     [not found]   ` <CALAkbJOkMTQdkmhBBvqHk3oKRzMHvXcp1MxasMrMpCbTP3+0eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-18 23:09     ` building initramfs is slow John Reiser
2011-08-19  4:53       ` WANG Cong
2011-08-19  6:47         ` Harald Hoyer
     [not found]           ` <4E4E0707.4060504-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-19  7:04             ` Américo Wang
     [not found]               ` <CAM_iQpUr2mVRM+PFeYkefzx9xEAOJKhZh+wpaXgKg6bj+1dozQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-19  7:07                 ` Harald Hoyer
     [not found]                   ` <4E4E0B95.6040909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-19  7:32                     ` Dan Horák
2011-08-19 18:27         ` John Reiser
     [not found] ` <4E4D5779.6090209-Po6cBsTGB2ZWk0Htik3J/w@public.gmane.org>
2011-08-19  7:03   ` Harald Hoyer
2011-08-19  8:24   ` Harald Hoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E4D5779.6090209@bitwagon.com \
    --to=jreiser-po6cbstgb2zwk0htik3j/w@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox