public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* A little coding style nugget of joy
@ 2007-09-19 16:34 Matt LaPlante
  2007-09-19 17:13 ` Andi Kleen
  2007-09-20  9:20 ` Pádraig Brady
  0 siblings, 2 replies; 8+ messages in thread
From: Matt LaPlante @ 2007-09-19 16:34 UTC (permalink / raw)
  To: linux-kernel

Since everyone loves random statistics, here are a few gems to give you a break from your busy day:

Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
Bytes saved by removing said whitespace: 151809
Lines in the (unified) diff: 455437
Size of the diff: 15M
People brave enough to submit the patch: ~0

Take care. :)

-
Matt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-19 16:34 A little coding style nugget of joy Matt LaPlante
@ 2007-09-19 17:13 ` Andi Kleen
  2007-09-19 21:22   ` Andy Lutomirski
  2007-09-20  9:20 ` Pádraig Brady
  1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-09-19 17:13 UTC (permalink / raw)
  To: Matt LaPlante; +Cc: linux-kernel

Matt LaPlante <kernel1@cyberdogtech.com> writes:

> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
> 
> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
> Bytes saved by removing said whitespace: 151809

You don't actually save anything on disk on most file systems
(essentially everything except reiserfs on current Linux)
because all files are rounded to block size (normally 4K) 

Same in page cache.

And in tar files bzip2/gzip is very good at compacting them.

> Lines in the (unified) diff: 455437
> Size of the diff: 15M
> People brave enough to submit the patch: ~0

Many kernel maintainers automatically remove trailing white space on any new 
lines these days. So as the kernel keeps changing it should eventually all
disappear; except on essentially dead code.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-19 17:13 ` Andi Kleen
@ 2007-09-19 21:22   ` Andy Lutomirski
  2007-09-19 21:30     ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2007-09-19 21:22 UTC (permalink / raw)
  To: linux-kernel, andi, kernel1

Andi Kleen wrote:
> Matt LaPlante <kernel1@cyberdogtech.com> writes:
> 
>> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
>>
>> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
>> Bytes saved by removing said whitespace: 151809
> 
> You don't actually save anything on disk on most file systems
> (essentially everything except reiserfs on current Linux)
> because all files are rounded to block size (normally 4K) 
> 
> Same in page cache.

This is a terrible assumption in general (i.e. if filesize % blocksize 
is close to uniformly distributed).  If you remove one byte and the data 
is stored with blocksize B, then you either save zero bytes with 
probability 1-1/B or you save B bytes with probability 1/B.  The 
expected number of bytes saved is B*1/B=1.  Since expectation is linear, 
if you remove x bytes, the expected number of bytes saved is x (even if 
there is more than one byte removed per file).

In my tree, about half of the files have size >= 4k, so the assumption 
is probably not _that_ far off the mark.

Alternatively, there are an average of about 16 bytes removed per file, 
and there are 11 which are <= 16 bytes short of a 4k boundary, so it's 
not at all unreasonable that we'd save 40-50k.

> 
> And in tar files bzip2/gzip is very good at compacting them.

That's true.

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-19 21:22   ` Andy Lutomirski
@ 2007-09-19 21:30     ` Andi Kleen
  2007-09-19 21:39       ` Andrew Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-09-19 21:30 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-kernel, andi, kernel1

> This is a terrible assumption in general (i.e. if filesize % blocksize 
> is close to uniformly distributed).  If you remove one byte and the data 
> is stored with blocksize B, then you either save zero bytes with 
> probability 1-1/B or you save B bytes with probability 1/B.  The 
> expected number of bytes saved is B*1/B=1.  Since expectation is linear, 
> if you remove x bytes, the expected number of bytes saved is x (even if 
> there is more than one byte removed per file).

You didn't calculate the probability of actually saving a full block 
or not (that's the only thing that matters). I assumed it's relatively
small and can be ignored in practice since the amount of end white
space is negligible compared to total file size.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-19 21:30     ` Andi Kleen
@ 2007-09-19 21:39       ` Andrew Lutomirski
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Lutomirski @ 2007-09-19 21:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, kernel1

On 9/19/07, Andi Kleen <andi@firstfloor.org> wrote:
> > This is a terrible assumption in general (i.e. if filesize % blocksize
> > is close to uniformly distributed).  If you remove one byte and the data
> > is stored with blocksize B, then you either save zero bytes with
> > probability 1-1/B or you save B bytes with probability 1/B.  The
> > expected number of bytes saved is B*1/B=1.  Since expectation is linear,
> > if you remove x bytes, the expected number of bytes saved is x (even if
> > there is more than one byte removed per file).
>
> You didn't calculate the probability of actually saving a full block
> or not (that's the only thing that matters). I assumed it's relatively
> small and can be ignored in practice since the amount of end white
> space is negligible compared to total file size.

Sure I did.  It's roughly 1/B per byte removed ( = 1/4096 ).

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-19 16:34 A little coding style nugget of joy Matt LaPlante
  2007-09-19 17:13 ` Andi Kleen
@ 2007-09-20  9:20 ` Pádraig Brady
  2007-09-20 10:11   ` Robert P. J. Day
  1 sibling, 1 reply; 8+ messages in thread
From: Pádraig Brady @ 2007-09-20  9:20 UTC (permalink / raw)
  To: Matt LaPlante; +Cc: linux-kernel

Matt LaPlante wrote:
> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
> 
> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
> Bytes saved by removing said whitespace: 151809
> Lines in the (unified) diff: 455437
> Size of the diff: 15M
> People brave enough to submit the patch: ~0

It's gradually getting better so:
http://lwn.net/2001/1129/a/whitespace.php3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-20  9:20 ` Pádraig Brady
@ 2007-09-20 10:11   ` Robert P. J. Day
  2007-09-20 14:04     ` Scott Preece
  0 siblings, 1 reply; 8+ messages in thread
From: Robert P. J. Day @ 2007-09-20 10:11 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: Matt LaPlante, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 968 bytes --]

On Thu, 20 Sep 2007, Pádraig Brady wrote:

> Matt LaPlante wrote:
> > Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
> >
> > Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
> > Bytes saved by removing said whitespace: 151809
> > Lines in the (unified) diff: 455437
> > Size of the diff: 15M
> > People brave enough to submit the patch: ~0
>
> It's gradually getting better so:
> http://lwn.net/2001/1129/a/whitespace.php3

and you wouldn't *believe* how much space you can save by getting rid
of all that annoying indentation.  and don't even get me *started* on
those comments ...

rday
-- 
========================================================================
Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca
========================================================================

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A little coding style nugget of joy
  2007-09-20 10:11   ` Robert P. J. Day
@ 2007-09-20 14:04     ` Scott Preece
  0 siblings, 0 replies; 8+ messages in thread
From: Scott Preece @ 2007-09-20 14:04 UTC (permalink / raw)
  To: Robert P. J. Day; +Cc: Pádraig Brady, Matt LaPlante, linux-kernel

On 9/20/07, Robert P. J. Day <rpjday@mindspring.com> wrote:
> On Thu, 20 Sep 2007, Pádraig Brady wrote:
>
> > Matt LaPlante wrote:
> > > Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
> > >
> > > Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
> > > Bytes saved by removing said whitespace: 151809
> > > Lines in the (unified) diff: 455437
> > > Size of the diff: 15M
> > > People brave enough to submit the patch: ~0
> >
> > It's gradually getting better so:
> > http://lwn.net/2001/1129/a/whitespace.php3
>
> and you wouldn't *believe* how much space you can save by getting rid
> of all that annoying indentation.  and don't even get me *started* on
> those comments ...
>
> rday
---

I think you're on to something here. If we stored the files with all
the non-meaningful whitespace (including non-meaningful newlines)
removed, not only would we save disk space, but we would also
eliminate significant amounts of developer time and LKML bandwidth
currently expended on arguing about formatting. Everybody could just
run things through indent with whatever formatting they preferred.
Might make diffs ugly, though...

scott
-- 
scott preece

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-09-20 14:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-19 16:34 A little coding style nugget of joy Matt LaPlante
2007-09-19 17:13 ` Andi Kleen
2007-09-19 21:22   ` Andy Lutomirski
2007-09-19 21:30     ` Andi Kleen
2007-09-19 21:39       ` Andrew Lutomirski
2007-09-20  9:20 ` Pádraig Brady
2007-09-20 10:11   ` Robert P. J. Day
2007-09-20 14:04     ` Scott Preece

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox