Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH 1/1] size-stats: don't count hard links
Date: Sat, 15 Oct 2016 08:17:50 +0200	[thread overview]
Message-ID: <20161015081750.75b59fb2@free-electrons.com> (raw)
In-Reply-To: <1476489930-10456-1-git-send-email-fhunleth@troodon-software.com>

Hello,

On Fri, 14 Oct 2016 20:05:30 -0400, Frank Hunleth wrote:
> This change adds inode tracking to the size-stats script so that hard
> links don't cause files to be double counted. This has a significant
> effect on the size computation for some packages. For example, git has
> around a dozen hard links to a large file. Before this change, git would
> weigh in at about 170 MB with the total filesystem size reported as
> 175 MB. The actual rootfs.ext2 size was around 16 MB. With the change,
> the git package registers at 10.5 MB with a total filesystem size of
> 15.8 MB.
> 
> Signed-off-by: Frank Hunleth <fhunleth@troodon-software.com>

Thanks a lot for this change! Definitely this is something that needs
to be handled.

> -def add_file(filesdict, relpath, abspath, pkg):
> +def add_file(filesdict, seeninodes, relpath, abspath, pkg):
>      if not os.path.exists(abspath):
>          return
>      if os.path.islink(abspath):
>          return
> -    sz = os.stat(abspath).st_size
> +    if relpath in filesdict:
> +        return

I'm not sure why this test is being added, or at least why it's related
to the inode tracking.

> @@ -97,10 +113,11 @@ def build_package_size(filesdict, builddir):
>              if not frelpath in filesdict:
>                  print("WARNING: %s is not part of any package" % frelpath)
>                  pkg = "unknown"
> +                sz = os.path.getsize(fpath)

So for files not belonging to packages, we do not track inodes?

Maybe we should instead have our own filesize() helper function that
takes care of returning the right size if we have never seen this
inode, or 0 if we have already seen it. It could then be used in both
places.

Another concern is that some files will now be reported as having a 0
size, while it's not entirely correct. This does not matter at all for
the per-package graph or CSV file, but is a bit more annoying for the
per-file CSV file. Indeed, a user inspecting this CSV file will wonder
what those zero-size files are. So, another option is to divide the
size of the file by the number of hard-links, and spread the size over
the different hard-links. But it's also not very nice, as the size
reported in the CSV will not match the visible size of the file.

So, maybe we should just leave it like you propose, unless others have
a better idea about this.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

  reply	other threads:[~2016-10-15  6:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-15  0:05 [Buildroot] [PATCH 1/1] size-stats: don't count hard links Frank Hunleth
2016-10-15  6:17 ` Thomas Petazzoni [this message]
2016-10-15 23:05   ` Frank Hunleth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161015081750.75b59fb2@free-electrons.com \
    --to=thomas.petazzoni@free-electrons.com \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox