git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Drew Northup <drew.northup@maine.edu>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	Git mailing list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?
Date: Fri, 22 Oct 2010 21:18:06 +0200	[thread overview]
Message-ID: <201010222118.09212.jnareb@gmail.com> (raw)
In-Reply-To: <1287770805.819.7.camel@drew-northup.unet.maine.edu>

On Fri, 22 Oct 2010, Drew Northup wrote:
> On Fri, 2010-10-22 at 10:48 -0700, Jakub Narebski wrote:
> > Drew Northup <drew.northup@maine.edu> writes:
> 
> > > Well I shall plumb the documentation again.... just in case. I'm not
> > > holding my breath that it will do what I (and frankly a fair number of
> > > other people) want. We just want version control that treats text like
> > > text. FULL STOP. Why isn't UTF-16 text???????
> > 
> > If you are asking why Git detects files with text in UTF-16 / USC-2 as
> > binary, it is because Git (re)uses the same heuristic that e.g. GNU
> > diff (and probably also -T file test in Perl), and one of heuristics
> > is that if file contains NUL ("\0") character, then it is most
> > porbably binary (because legacy C programs for text would have
> > troubles with NUL characters).
> > 
> > That probably doesn't help you any...
> 
> I did find that already. I still have not decided that correct place to
> shoehorn in Unicode detection, but I'll be sure to do that before I
> bother anybody else with it. I already wrote code to detect (reasonably)
> valid UTF-16 (if it isn't obviously valid then I'll just as soon deal
> with it as binary data, so as to avoid a foot-shooting exercise).
> My main motivation here has been to get some feedback as I write stuff
> so as to not waste a lot of time during writing something that could be
> done better. 
>
> (As opposed to not done at all, which is the feeling I'm getting from a
> few people around here...)

Git supports well different encoding used in commit message (which is
always text, as opposed to file contents which might be binary or text).

You specify what encoding you use to format commit messages with
i18n.commitEncoding (defaults to 'utf-8'); if it is different than utf-8
it gets saved in 'encoding' header.  You can even specify that encoding
that your terminal uses is different from i18n.commitEncoding with
i18n.logOutputEncoding

The only support for different encoding of file contents is used by
git-gui.  You provide encoding that a file uses via .gitattributes
(the `encoding` attribute).  You specify what output encoding git-gui
(Tcl/Tk) uses with `gui.encoding` config variable.

I guess that what you need to support for diffs and 'git show <file>'
etc. is respecting `encoding` .gitattribute, and providing encoding
that console uses with e.g. i18n.blobOutputEncoding (or something like
that).

HTH
-- 
Jakub Narebski
Poland

  reply	other threads:[~2010-10-22 19:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-22 16:06 [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...? Drew Northup
2010-10-22 16:18 ` Jonathan Nieder
2010-10-22 17:01   ` Drew Northup
2010-10-22 17:12     ` Jonathan Nieder
2010-10-22 17:27       ` Drew Northup
2010-10-22 17:30         ` Jonathan Nieder
2010-10-22 17:58           ` Jakub Narebski
2010-10-22 17:48         ` Jakub Narebski
2010-10-22 18:06           ` Drew Northup
2010-10-22 19:18             ` Jakub Narebski [this message]
2010-10-22 18:28   ` Joshua Juran
2010-10-22 19:13     ` Jeff King
2010-10-22 19:53     ` Jonathan Nieder
2010-10-22 20:18       ` Git Attribute: File Text Encoding {WAS: Re: [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?} Drew Northup
2010-10-22 21:49         ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201010222118.09212.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=drew.northup@maine.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).