git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Andrzej Borucki <borucki.andrzej@gmail.com>,
	git-for-windows <git-for-windows@googlegroups.com>,
	git@vger.kernel.org
Subject: Re: [git-for-windows] How is detected binary files?
Date: Tue, 1 Dec 2015 19:49:21 -0500	[thread overview]
Message-ID: <20151202004921.GC28197@sigill.intra.peff.net> (raw)
In-Reply-To: <alpine.DEB.1.00.1511271510520.1686@s15462909.onlinehome-server.info>

On Fri, Nov 27, 2015 at 03:14:58PM +0100, Johannes Schindelin wrote:

> On Wed, 25 Nov 2015, Andrzej Borucki wrote:
> 
> > How git detects that file is binary? This must be safe because it not 
> > allowed to change line breaks in binary files. 
> > Binary files can contain byte 0 (zero), but:
> > - 16 bit UTF also can contain zero
> > - short binary files can not contain zero
> 
> It would probably be better to direct this question to the general Git
> mailing list (you reached the Git for Windows one, and this issue is not
> specific to Windows).
> 
> To answer your question, a NUL byte within the first 8000 bytes is indeed
> considered as an indicator for binary files.
> 
> If you use UTF-16, you will need to mark your files as such explicitly
> (Git does not handle UTF-16 internally).

I'm not sure if it is a good idea to treat UTF-16 as text. The rest of
the diff (headers, etc) will all be in ASCII, so one or the other is
going to be mojibake.

You can get readable diffs by textconv-ing them to an ASCII-superset
encoding like UTF-8. Something like:

    echo 'myfile diff=utf16' >.gitattributes
    git config diff.utf16.textconv 'iconv -f utf16 -t utf8'

but of course the resulting patches cannot be applied, and you may miss
any changes that do not make it through the encoding (e.g., using
different bytes to represent the same code point).

-Peff

      reply	other threads:[~2015-12-02  0:49 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <487b881f-9e3e-4d22-ba1e-af3beeaccd6e@googlegroups.com>
2015-11-27 14:14 ` [git-for-windows] How is detected binary files? Johannes Schindelin
2015-12-02  0:49   ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151202004921.GC28197@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=borucki.andrzej@gmail.com \
    --cc=git-for-windows@googlegroups.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).