Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook
Date: Sun, 15 Feb 2015 17:59:51 +0100	[thread overview]
Message-ID: <20150215165951.GC4211@free.fr> (raw)
In-Reply-To: <1423171200-24583-3-git-send-email-thomas.petazzoni@free-electrons.com>

Thomas, All,

On 2015-02-05 22:19 +0100, Thomas Petazzoni spake thusly:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
> later be used to determine the size contribution of each package to
> the target root filesystem.
> 
> Note that in order to detect if a file installed by one package is
> later overriden by another package, we calculate the md5 of installed
> files and compare them at each installation of a new package.
> 
> This commit also adds a Config.in option to enable the collection of
> this data, as calculating the md5 of all installed files at the
> beginning and end of the installation of each package can be
> considered a time-consuming process which maybe some users will not be
> willing to suffer from.

Well, I'd like to challenge that assertion, so I did a pretty "big" build:
Kodi on the RPi, with all Kodi addons enabled, plus a few additional
usefull packages (connman, dropbear and the likes).

The config has about 148 packages (make show-targets |wc -w), and takes
roughly 1h and 20min on my machine.

Because I did not time md5sum after each package was installed, I simply
timed the md5sum at the end, on a completely-populated target/ . That
gives a pretty good upper-bound of the overhead each package would incur
(i.e. the very first packages would be so much faster as there are far
fewer files installed).

    $ du -hs target/
    247M    target/

    $ find target -type f |wc -l
    5150

    $ tar cf - target/ |wc -c
    242923520

    $ tar cf - target/ |time md5sum
    1d393aaf76ef6a7a462519f4b8b861e7  -
    0.36user 0.03system 0:00.41elapsed 96%CPU (0avgtext+0avgdata 748maxresident)k
    0inputs+0outputs (0major+233minor)pagefaults 0swaps

    $ date '+%s.%N'; \
      find target -type f -print0 2>/dev/null \
      | xargs -0 md5sum >/dev/null 2>&1; \
      date '+%s.%N'
    1424018960.577764719
    1424018960.994814894

So, the overhead of md5sum-ing each file independently (on a cache-hot
target/) is about less than 0.5s (only so-slightly bigger than md5sum-ing
the whole tarball thereof) .

Yes, 0.5s. Half-a-second. ;-)

That would give an upper-bound of the overhead for the whole build
somewhere in the 2-minute range (148*2*0.5). Out of a 1h 20min build.

Yes, md5 is a very fast hash. For reference, hashing a 512MiB blob takes
about less than a second.

I believe this overhead is negligible and we should unconditionally
enable that feature.

> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  Config.in              |  9 +++++++++
>  package/pkg-generic.mk | 36 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/Config.in b/Config.in
> index f5b6c73..58a5085 100644
> --- a/Config.in
> +++ b/Config.in
> @@ -613,6 +613,15 @@ config BR2_COMPILER_PARANOID_UNSAFE_PATH
>  	  toolchain (through gcc and binutils patches) and external
>  	  toolchain backends (through the external toolchain wrapper).
>  
> +config BR2_COLLECT_FILE_SIZE_STATS
> +	bool "collect statistics about installed file size"
> +	help
> +	  Enable this option to let Buildroot collect data about the
> +	  installed files. When this option is enabled, you will be
> +	  able to use the 'size-stats' make target, which will
> +	  generate a graph and CSV files giving statistics about the
> +	  installed size of each file and each package.
> +
>  endmenu
>  
>  endmenu
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 1b09955..db35a87 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,42 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Hooks to collect statistics about installed files
> +ifeq ($(BR2_COLLECT_FILE_SIZE_STATS),y)
> +
> +# This hook will be called before the target installation of a
> +# package. We store in a file named $(1).filelist_before the list of
> +# files currently installed in the target. Note that the MD5 is also
> +# stored, in order to identify if the files are overwritten.
> +define step_pkg_size_start
> +	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
> +		$(BUILD_DIR)/$(1).filelist_before

Why don't you store that in the package's $(@D) ?

I don't really care, but if we're going to use $(BUILD_DIR) to store
temporary files, it might be time we introduce a better location
(probably somthing like BR2_TMP_DIR=$(BUILD_DIR)/.tmp/ )

We alreaduy ahve some temporary stuff written in there, and I find it
ugly (yes, I added some myself!).

Note: not related to your changes, of course, just prompted by them.

> +endef
> +
> +# This hook will be called after the target installation of a
> +# package. We store in a file named $(1).filelist_after the list
> +# of files (and their MD5) currently installed in the target. We then
> +# do a diff with the $(1).filelist_before to compute the list of
> +# files installed by this package.
> +define step_pkg_size_end
> +	(cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \
> +		$(BUILD_DIR)/$(1).filelist_after
> +	comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \
> +		while read hash file ; do \
> +			echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
> +		done
> +	$(RM) -f $(BUILD_DIR)/$(1).filelist_before \
> +		$(BUILD_DIR)/$(1).filelist_after
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef

When I introduced the instrumentation hooks, I did not envision they
would be used like that, directly as Makefile code.

What I expected is we would be using scripts (python, shell, whatever!)
somewhere in support/ , that would do their own filtering.

It's pretty fascinating how we all differ in reasoning! :-)

Regards,
Yann E. MORIN.

> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +endif
> +
>  # User-supplied script
>  ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
>  define step_user
> -- 
> 2.1.0
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

  reply	other threads:[~2015-02-15 16:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
2015-02-15 16:08   ` Yann E. MORIN
2015-04-03 12:21   ` Thomas Petazzoni
2015-02-05 21:19 ` [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
2015-02-15 16:59   ` Yann E. MORIN [this message]
2015-02-05 21:19 ` [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script Thomas Petazzoni
2015-04-06 14:02   ` Arnout Vandecappelle
2015-02-05 21:19 ` [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target Thomas Petazzoni
2015-04-06 14:09   ` Arnout Vandecappelle
2015-02-05 21:37 ` [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
2015-02-07 14:37 ` Romain Naour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150215165951.GC4211@free.fr \
    --to=yann.morin.1998@free.fr \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox