From: Jeff King <peff@peff.net>
To: Drew Northup <drew.northup@maine.edu>
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH resend] Do not create commits whose message contains NUL
Date: Tue, 3 Jan 2012 15:03:31 -0500 [thread overview]
Message-ID: <20120103200331.GG20926@sigill.intra.peff.net> (raw)
In-Reply-To: <1325435251.4752.104.camel@drew-northup.unet.maine.edu>
On Sun, Jan 01, 2012 at 11:27:31AM -0500, Drew Northup wrote:
> I had already started experimenting with automatically detecting decent
> UTF-16 a long while back so that compatible platforms could handle it
> appropriately in terms of creating diffs and dealing with newline
> munging between platforms. There is no 100% sure-fire check for UTF-16
> if you don't already suspect it is possibly UTF-16. If we really want to
> check for possible UTF-16 specifically I can scrape out the check I
> wrote up and send it along.
I also looked into this recently. You can generally detect UTF-16 by the
BOM at the beginning of the file (which will also tell you the
endian-ness). I did a simple test by integrating it into the check for
binary-ness during diffs. However, as I recall, the result wasn't
particularly useful. Some of the diff code wasn't happy with the
embedded NUL bytes (i.e., there is code that assumes that NUL is the end
of a string). Not to mention that ascii newline (0x0a) can appear as
part of other characters in a wide encoding like utf-16. And since git
outputs straight ascii for all of the diff boilerplate, you end up with
a mish-mash of utf-16 and ascii (this is OK with utf-8, of course,
because utf-8 is a superset of ascii).
If anything, I think you would want to do something like "textconv" to
convert the utf-16 into utf-8, then diff that. Git won't do it
automatically based on encoding, but if you know the filenames of the
utf-16 files in your repository, you can do something like:
echo 'foo.txt diff=utf16' >.gitattributes
git config diff.utf16.textconv 'iconv -f utf16 -t utf8'
and get readable diffs. Of course you couldn't use that diff to apply a
patch, though.
I strongly suspect that not many people are really using git for utf-16
files. Git treats them as binary, which makes them unpleasant for
anything except simple storage.
> The is_utf8 check was not written to detect 100% valid UTF-8 per-se. It
> seems to me that it was written as part of the "is this a binary or not"
> check in the add/commit path.
We shouldn't care about binary file content at all in the add or commit
code paths. I would guess we do only if you are using auto-crlf (but
then, I don't think we care about utf8 in that cases, only whether line
endings should be converted or not).
We do check that the commit message itself is utf8, but only to generate
a warning that you should set i81n.commitencoding.
-Peff
next prev parent reply other threads:[~2012-01-03 20:03 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-13 17:59 ` Jeff King
2011-12-14 5:23 ` Miles Bader
2011-12-14 7:17 ` Jeff King
2012-01-01 16:27 ` Drew Northup
2012-01-03 20:03 ` Jeff King [this message]
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
2011-12-14 18:12 ` Junio C Hamano
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy
2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano
2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
2011-12-14 18:13 ` Junio C Hamano
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-14 18:19 ` Junio C Hamano
2011-12-14 18:29 ` Jeff King
2011-12-15 18:46 ` Junio C Hamano
2011-12-15 19:35 ` Junio C Hamano
2011-12-15 1:04 ` Miles Bader
2011-12-15 1:18 ` Jeff King
2011-12-15 3:09 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120103200331.GG20926@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=drew.northup@maine.edu \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).