From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>,
Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] xdiff-interface.c (buffer_is_binary): Remove buffer size limitation
Date: Tue, 4 Dec 2007 01:00:39 +0000 (GMT) [thread overview]
Message-ID: <Pine.LNX.4.64.0712040054280.27959@racer.site> (raw)
In-Reply-To: <alpine.LFD.0.9999.0712031559480.8458@woody.linux-foundation.org>
Hi,
On Mon, 3 Dec 2007, Linus Torvalds wrote:
> On Tue, 4 Dec 2007, Dmitry V. Levin wrote:
> >
> > Average file size in the linux-2.6.23.9 kernel tree is 10944 bytes,
>
> Don't do "average" sizes. That's an almost totally meaningless number.
>
> "Average" makes sense if you have some kind of gaussian distribution or
> similar.
To enhance on that: Gaussian is symmetric, which cannot be the proper
distribution for anything that is non-negative.
I see so many mis-applications of statistics/probability theory in my day
job that I cannot resist pointing people to the Poisson distribution here
(in whose context "average" actually makes kind of sense).
But back to the problem: if you have a truly binary file, then _every_
byte (absent further information, of course) has a probability of 1/256 of
being 0.
Which means that if a file is binary, but is unusual enough to have that
property only for half of the first 8192 bytes, you get a probability of
1 - 1 / 256^4096 = 1 - 1 / 2 ^ 32768 that the current test succeeds.
I fail to see how this test can possibly fail for the average case.
So if it fails only for special cases, we are probably (in the common, not
the mathematical, sense) better off asking those people encountering them
to add git-attributes for the files.
IMHO that is not asking for too much.
Ciao,
Dscho
next prev parent reply other threads:[~2007-12-04 1:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-01 16:01 [PATCH] xdiff-interface.c (buffer_is_binary): Remove buffer size limitation Dmitry V. Levin
2007-12-01 19:46 ` Junio C Hamano
2007-12-03 21:50 ` Dmitry V. Levin
2007-12-03 23:24 ` Junio C Hamano
2007-12-04 0:00 ` Linus Torvalds
2007-12-04 1:00 ` Johannes Schindelin [this message]
2007-12-05 10:47 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0712040054280.27959@racer.site \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ldv@altlinux.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).