All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@myrealbox.com>
To: linux-kernel@vger.kernel.org, andi@firstfloor.org,
	kernel1@cyberdogtech.com
Subject: Re: A little coding style nugget of joy
Date: Wed, 19 Sep 2007 17:22:46 -0400	[thread overview]
Message-ID: <46F19326.1040503@myrealbox.com> (raw)
In-Reply-To: <p73odfyvar1.fsf@bingen.suse.de>

Andi Kleen wrote:
> Matt LaPlante <kernel1@cyberdogtech.com> writes:
> 
>> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
>>
>> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
>> Bytes saved by removing said whitespace: 151809
> 
> You don't actually save anything on disk on most file systems
> (essentially everything except reiserfs on current Linux)
> because all files are rounded to block size (normally 4K) 
> 
> Same in page cache.

This is a terrible assumption in general (i.e. if filesize % blocksize 
is close to uniformly distributed).  If you remove one byte and the data 
is stored with blocksize B, then you either save zero bytes with 
probability 1-1/B or you save B bytes with probability 1/B.  The 
expected number of bytes saved is B*1/B=1.  Since expectation is linear, 
if you remove x bytes, the expected number of bytes saved is x (even if 
there is more than one byte removed per file).

In my tree, about half of the files have size >= 4k, so the assumption 
is probably not _that_ far off the mark.

Alternatively, there are an average of about 16 bytes removed per file, 
and there are 11 which are <= 16 bytes short of a 4k boundary, so it's 
not at all unreasonable that we'd save 40-50k.

> 
> And in tar files bzip2/gzip is very good at compacting them.

That's true.

--Andy

  reply	other threads:[~2007-09-19 21:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-19 16:34 A little coding style nugget of joy Matt LaPlante
2007-09-19 17:13 ` Andi Kleen
2007-09-19 21:22   ` Andy Lutomirski [this message]
2007-09-19 21:30     ` Andi Kleen
2007-09-19 21:39       ` Andrew Lutomirski
2007-09-20  9:20 ` Pádraig Brady
2007-09-20 10:11   ` Robert P. J. Day
2007-09-20 14:04     ` Scott Preece

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46F19326.1040503@myrealbox.com \
    --to=luto@myrealbox.com \
    --cc=andi@firstfloor.org \
    --cc=kernel1@cyberdogtech.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.