Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Korsgaard <peter@korsgaard.com>
To: Grant Edwards <grant.b.edwards@gmail.com>
Cc: buildroot@busybox.net,  buildroot@uclibc.org
Subject: Re: [Buildroot] Large number of duplicate files in sdk
Date: Thu, 21 Nov 2024 09:18:56 +0100	[thread overview]
Message-ID: <87ldxdrocv.fsf@dell.be.48ers.dk> (raw)
In-Reply-To: <vhlr9b$svh$1@ciao.gmane.io> (Grant Edwards's message of "Wed, 20 Nov 2024 23:27:39 -0000 (UTC)")

>>>>> "Grant" == Grant Edwards <grant.b.edwards@gmail.com> writes:

 > On 2024-11-20, Grant Edwards <grant.b.edwards@gmail.com> wrote:
 >> When I do a "make sdk" (using 2024.02.6), the resulting tarball
 >> contains tons of duplicate files.  I'm using an external Linaro ARM
 >> toolchain.  With a fairly bare-bones package selection, the sdk tarball
 >> generated by buildroot appears to be about 40% duplicate files by
 >> size (about 20% by count).

 > It's actually a bit worse than that. My app wasn't finding files that
 > were duplicated more than once.

 > Running a simple de-dupe utility on output/host reduced disk space
 > from 928MB to 556MB.

 > That app is pretty conservative: it only links files that have the
 > same name, the same parent directory name, and differing top
 > directory names.

Some duplicates are expected, E.G. we have a number of packages that can
be built for the host and the target (E.G. python3), so if your SDK has
both host and target variant enabled then there will be some duplicated
files.

As a comparison, I have a SDK built with 2024.02.8 and a
Buildroot-generated external toolchain where the SDK .tar.gz is 227MB
and extracted:

du -hs
741M    .

fdupes -rm .
4179 duplicate files (in 3982 sets), occupying 100.6 megabytes

Focusing on the big files I see:

fdupes -rS -G $(( 1000 * 1024 )) .
1467305 bytes each:
./opt/ext-toolchain/share/man/man1/aarch64-buildroot-linux-gnu-g++.1
./opt/ext-toolchain/share/man/man1/aarch64-buildroot-linux-gnu-gcc.1

3334288 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/bin/ld.bfd
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/bin/ld

5427708 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/lib/libc.a
./aarch64-buildroot-linux-gnu/sysroot/usr/lib/libc.a

6316256 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/lib/libstdc++.a
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/lib64/libstdc++.a
./aarch64-buildroot-linux-gnu/sysroot/usr/lib/libstdc++.a

1582810 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/lib/libm-2.38.a
./aarch64-buildroot-linux-gnu/sysroot/usr/lib/libm-2.38.a

2892864 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/lib/libstdc++.so.6.0.30
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/lib64/libstdc++.so.6.0.30
./aarch64-buildroot-linux-gnu/sysroot/usr/lib/libstdc++.so.6.0.30

3386286 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/share/i18n/locales/iso14651_t1_common
./aarch64-buildroot-linux-gnu/sysroot/usr/share/i18n/locales/iso14651_t1_common

1111538 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/share/i18n/locales/iso14651_t1_pinyin
./aarch64-buildroot-linux-gnu/sysroot/usr/share/i18n/locales/iso14651_t1_pinyin

4523291 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/usr/share/i18n/locales/cns11643_stroke
./aarch64-buildroot-linux-gnu/sysroot/usr/share/i18n/locales/cns11643_stroke

2148440 bytes each:
./opt/ext-toolchain/aarch64-buildroot-linux-gnu/sysroot/lib/libc.so.6
./aarch64-buildroot-linux-gnu/sysroot/lib/libc.so.6


So mainly the copy we do of the external toolchain into host/. I think
we could be smarter about using hard links instead of actually copying
files / perhaps use hardlink before creating the SDK tarball:

hardlink .
Mode:                     real
Method:                   sha256
Files:                    15298
Linked:                   2903 files
Compared:                 0 xattrs
Compared:                 14922 files
Saved:                    72.68 MiB
Duration:                 0.855040 seconds

du -hs
661M    .

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

  reply	other threads:[~2024-11-21  8:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 20:00 [Buildroot] Large number of duplicate files in sdk Grant Edwards
2024-11-20 23:27 ` Grant Edwards
2024-11-21  8:18   ` Peter Korsgaard [this message]
2024-11-21 15:06     ` Grant Edwards

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldxdrocv.fsf@dell.be.48ers.dk \
    --to=peter@korsgaard.com \
    --cc=buildroot@busybox.net \
    --cc=buildroot@uclibc.org \
    --cc=grant.b.edwards@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox