From: Dmitry Potapov <dpotapov@gmail.com>
To: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Cc: Steffen Prohaska <prohaska@zib.de>, Dmitry Potapov <dpotapov@gmail.com>
Subject: [PATCH] treat any file with NUL as binary
Date: Tue, 15 Jan 2008 17:28:29 +0300 [thread overview]
Message-ID: <1200407309-10992-1-git-send-email-dpotapov@gmail.com> (raw)
There are two heuristics in Git to detect whether a file is binary
or text. One in xdiff-interface.c relied on existing NUL byte at
the beginning. However, convert.c used a different heuristic, which
relied that the number of non-printable symbols is less than 1%.
Due to difference in approaches whether a file is binary or not,
it was possible that a file that diff treats as binary will not be
treated as text by CRLF conversation. This is very confusing for
a user who seeing that 'git diff' shows file as binary expects it
to be added as binary.
This patch makes is_binary to consider any file that contains at
least one NUL character as binary.
---
Junio,
I believe that the current behavior where 'git diff' shows me a file
as binary and then adds it as text with crlf conversation is a bug.
Though, it is not very likely to happen, it still possible cases where
a binary file contains large amount of text. For instance, a tar file
of text files can be such a file. Probably, word processor that store
text in binary format may also generate a file with more 99% printable
characters. So, such files will be considered as text by current convert
heuristic. Still such files are considered by diff due present of a NUL
character. This is very confusing for a user to see 'git diff' saying
that a file is binary and then having it converted as text. Because I
don't think that any real text file (especially one that requires CRLF
conversation) may contain NUL character, I believe this change should
improve binary heuristic and avoid user confusion.
So, please, consider it for inclusion as a bug fix.
convert.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/convert.c b/convert.c
index 5adef4f..a51da1f 100644
--- a/convert.c
+++ b/convert.c
@@ -17,8 +17,8 @@
#define CRLF_INPUT 2
struct text_stat {
- /* CR, LF and CRLF counts */
- unsigned cr, lf, crlf;
+ /* NUL, CR, LF and CRLF counts */
+ unsigned nul, cr, lf, crlf;
/* These are just approximations! */
unsigned printable, nonprintable;
@@ -51,6 +51,9 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
case '\b': case '\t': case '\033': case '\014':
stats->printable++;
break;
+ case 0:
+ stats->nul++;
+ /* fall through */
default:
stats->nonprintable++;
}
@@ -66,6 +69,8 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
static int is_binary(unsigned long size, struct text_stat *stats)
{
+ if (stats->nul)
+ return 1;
if ((stats->printable >> 7) < stats->nonprintable)
return 1;
/*
--
1.5.4.rc3.3.g0321b
next reply other threads:[~2008-01-15 14:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-15 14:28 Dmitry Potapov [this message]
2008-01-15 21:03 ` [PATCH] treat any file with NUL as binary Steffen Prohaska
2008-01-15 23:11 ` Junio C Hamano
2008-01-16 1:13 ` Dmitry Potapov
2008-01-16 1:16 ` Junio C Hamano
2008-01-16 1:21 ` Junio C Hamano
2008-01-16 1:59 ` Dmitry Potapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1200407309-10992-1-git-send-email-dpotapov@gmail.com \
--to=dpotapov@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=prohaska@zib.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).