From: Dmitry Potapov <dpotapov@gmail.com>
To: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Cc: Steffen Prohaska <prohaska@zib.de>, Dmitry Potapov <dpotapov@gmail.com>
Subject: [PATCH] treat any file with NUL as binary
Date: Tue, 15 Jan 2008 17:28:29 +0300 [thread overview]
Message-ID: <1200407309-10992-1-git-send-email-dpotapov@gmail.com> (raw)
There are two heuristics in Git to detect whether a file is binary
or text. One in xdiff-interface.c relied on existing NUL byte at
the beginning. However, convert.c used a different heuristic, which
relied that the number of non-printable symbols is less than 1%.
Due to difference in approaches whether a file is binary or not,
it was possible that a file that diff treats as binary will not be
treated as text by CRLF conversation. This is very confusing for
a user who seeing that 'git diff' shows file as binary expects it
to be added as binary.
This patch makes is_binary to consider any file that contains at
least one NUL character as binary.
---
Junio,
I believe that the current behavior where 'git diff' shows me a file
as binary and then adds it as text with crlf conversation is a bug.
Though, it is not very likely to happen, it still possible cases where
a binary file contains large amount of text. For instance, a tar file
of text files can be such a file. Probably, word processor that store
text in binary format may also generate a file with more 99% printable
characters. So, such files will be considered as text by current convert
heuristic. Still such files are considered by diff due present of a NUL
character. This is very confusing for a user to see 'git diff' saying
that a file is binary and then having it converted as text. Because I
don't think that any real text file (especially one that requires CRLF
conversation) may contain NUL character, I believe this change should
improve binary heuristic and avoid user confusion.
So, please, consider it for inclusion as a bug fix.
convert.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/convert.c b/convert.c
index 5adef4f..a51da1f 100644
--- a/convert.c
+++ b/convert.c
@@ -17,8 +17,8 @@
#define CRLF_INPUT 2
struct text_stat {
- /* CR, LF and CRLF counts */
- unsigned cr, lf, crlf;
+ /* NUL, CR, LF and CRLF counts */
+ unsigned nul, cr, lf, crlf;
/* These are just approximations! */
unsigned printable, nonprintable;
@@ -51,6 +51,9 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
case '\b': case '\t': case '\033': case '\014':
stats->printable++;
break;
+ case 0:
+ stats->nul++;
+ /* fall through */
default:
stats->nonprintable++;
}
@@ -66,6 +69,8 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
static int is_binary(unsigned long size, struct text_stat *stats)
{
+ if (stats->nul)
+ return 1;
if ((stats->printable >> 7) < stats->nonprintable)
return 1;
/*
--
1.5.4.rc3.3.g0321b
next reply other threads:[~2008-01-15 14:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-15 14:28 Dmitry Potapov [this message]
2008-01-15 21:03 ` [PATCH] treat any file with NUL as binary Steffen Prohaska
2008-01-15 23:11 ` Junio C Hamano
2008-01-16 1:13 ` Dmitry Potapov
2008-01-16 1:16 ` Junio C Hamano
2008-01-16 1:21 ` Junio C Hamano
2008-01-16 1:59 ` Dmitry Potapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1200407309-10992-1-git-send-email-dpotapov@gmail.com \
--to=dpotapov@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=prohaska@zib.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.