git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Potapov <dpotapov@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Steffen Prohaska <prohaska@zib.de>,
	Dmitry Potapov <dpotapov@gmail.com>
Subject: [PATCH] treat any file with NUL as binary
Date: Wed, 16 Jan 2008 04:59:12 +0300	[thread overview]
Message-ID: <1200448752-16024-1-git-send-email-dpotapov@gmail.com> (raw)
In-Reply-To: <7vr6gibm56.fsf@gitster.siamese.dyndns.org>

There are two heuristics in Git to detect whether a file is binary
or text. One in xdiff-interface.c (which is taken from GNU diff)
relies on existence of the NUL byte at the beginning. However,
convert.c used a different heuristic, which relied on the percent
of non-printable symbols (less than 1% for text files).

Due to differences in detection whether a file is binary or not,
it was possible that a file that diff treats as binary could be
treated as text by CRLF conversion. This is very confusing for a
user who sees that 'git diff' shows the file as binary expects it
to be added as binary.

This patch makes is_binary to consider any file that contains at
least one NUL character as binary, to ensure that the heuristics
used for CRLF conversion is tighter than what is used by diff.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---
 convert.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/convert.c b/convert.c
index 5adef4f..a51da1f 100644
--- a/convert.c
+++ b/convert.c
@@ -17,8 +17,8 @@
 #define CRLF_INPUT	2
 
 struct text_stat {
-	/* CR, LF and CRLF counts */
-	unsigned cr, lf, crlf;
+	/* NUL, CR, LF and CRLF counts */
+	unsigned nul, cr, lf, crlf;
 
 	/* These are just approximations! */
 	unsigned printable, nonprintable;
@@ -51,6 +51,9 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
 			case '\b': case '\t': case '\033': case '\014':
 				stats->printable++;
 				break;
+			case 0:
+				stats->nul++;
+				/* fall through */
 			default:
 				stats->nonprintable++;
 			}
@@ -66,6 +69,8 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
 static int is_binary(unsigned long size, struct text_stat *stats)
 {
 
+	if (stats->nul)
+		return 1;
 	if ((stats->printable >> 7) < stats->nonprintable)
 		return 1;
 	/*
-- 
1.5.3.5

      reply	other threads:[~2008-01-16  1:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15 14:28 [PATCH] treat any file with NUL as binary Dmitry Potapov
2008-01-15 21:03 ` Steffen Prohaska
2008-01-15 23:11 ` Junio C Hamano
2008-01-16  1:13   ` Dmitry Potapov
2008-01-16  1:16     ` Junio C Hamano
2008-01-16  1:21 ` Junio C Hamano
2008-01-16  1:59   ` Dmitry Potapov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1200448752-16024-1-git-send-email-dpotapov@gmail.com \
    --to=dpotapov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=prohaska@zib.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).