* Re: [git-for-windows] How is detected binary files?
[not found] <487b881f-9e3e-4d22-ba1e-af3beeaccd6e@googlegroups.com>
@ 2015-11-27 14:14 ` Johannes Schindelin
2015-12-02 0:49 ` Jeff King
0 siblings, 1 reply; 2+ messages in thread
From: Johannes Schindelin @ 2015-11-27 14:14 UTC (permalink / raw)
To: Andrzej Borucki; +Cc: git-for-windows, git
Hi Andrzej,
On Wed, 25 Nov 2015, Andrzej Borucki wrote:
> How git detects that file is binary? This must be safe because it not
> allowed to change line breaks in binary files.
> Binary files can contain byte 0 (zero), but:
> - 16 bit UTF also can contain zero
> - short binary files can not contain zero
It would probably be better to direct this question to the general Git
mailing list (you reached the Git for Windows one, and this issue is not
specific to Windows).
To answer your question, a NUL byte within the first 8000 bytes is indeed
considered as an indicator for binary files.
If you use UTF-16, you will need to mark your files as such explicitly
(Git does not handle UTF-16 internally).
As to short binary files, you will also have to mark those explicitly.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [git-for-windows] How is detected binary files?
2015-11-27 14:14 ` [git-for-windows] How is detected binary files? Johannes Schindelin
@ 2015-12-02 0:49 ` Jeff King
0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2015-12-02 0:49 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Andrzej Borucki, git-for-windows, git
On Fri, Nov 27, 2015 at 03:14:58PM +0100, Johannes Schindelin wrote:
> On Wed, 25 Nov 2015, Andrzej Borucki wrote:
>
> > How git detects that file is binary? This must be safe because it not
> > allowed to change line breaks in binary files.
> > Binary files can contain byte 0 (zero), but:
> > - 16 bit UTF also can contain zero
> > - short binary files can not contain zero
>
> It would probably be better to direct this question to the general Git
> mailing list (you reached the Git for Windows one, and this issue is not
> specific to Windows).
>
> To answer your question, a NUL byte within the first 8000 bytes is indeed
> considered as an indicator for binary files.
>
> If you use UTF-16, you will need to mark your files as such explicitly
> (Git does not handle UTF-16 internally).
I'm not sure if it is a good idea to treat UTF-16 as text. The rest of
the diff (headers, etc) will all be in ASCII, so one or the other is
going to be mojibake.
You can get readable diffs by textconv-ing them to an ASCII-superset
encoding like UTF-8. Something like:
echo 'myfile diff=utf16' >.gitattributes
git config diff.utf16.textconv 'iconv -f utf16 -t utf8'
but of course the resulting patches cannot be applied, and you may miss
any changes that do not make it through the encoding (e.g., using
different bytes to represent the same code point).
-Peff
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-12-02 0:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <487b881f-9e3e-4d22-ba1e-af3beeaccd6e@googlegroups.com>
2015-11-27 14:14 ` [git-for-windows] How is detected binary files? Johannes Schindelin
2015-12-02 0:49 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).