git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com>,
	"Ævar Arnfjörð" <avarab@gmail.com>
Subject: Re: [PATCH 00/22] Refactor to accept NUL in commit messages
Date: Thu, 27 Oct 2011 11:13:03 -0700	[thread overview]
Message-ID: <20111027181303.GF1967@sigill.intra.peff.net> (raw)
In-Reply-To: <7vvcrd411x.fsf@alter.siamese.dyndns.org>

On Tue, Oct 25, 2011 at 07:07:38AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I mean, besides the obvious that UTF-16 is ...
> 
> Yes, you could, besides the obvious. But that obvious reason makes it
> sufficiently different that it may not be so outrageous to draw the line
> between it and all the others.

Yeah, and I'm OK with that. It's just not a satisfying answer to give
Windows people who think UTF-16 is a good idea. But at the very least,
it's still unicode. It should be lossless for them to convert to utf8
and back if they want.

Speaking of which, I've been looking at handling diffing of utf-16
files. Right now we generally just consider them binary, which sucks.
It's easy to identify them by BOM in the is_buffer_binary() code, but
that's only part of it. We do an OK job of diffing them, except that:

  1. The BOM makes some diffs a little noisier.

  2. We split lines on 0x0a. But this byte can appear in other code
     points, like 0x010a (Ċ), or the entire entire 0x0a* code point (the
     entire Gurmukhi charset).

I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
to utf8, and then display them in utf8. Is that too gross for us to
consider?

You can kind-of implement this outside of git using textconv. But you
have to manually mark each file as utf-16, as there's no way to trigger
an alternative diff driver on something like a BOM.

I'm really not clear on how people with utf-16 files work. Even if we
did treat utf-16 like text, the _rest_ of git is outputting ascii, so
it's not like their terminals are utf-16. But we do have projects on
github with utf-16 and utf-32 encodings.

-Peff

  reply	other threads:[~2011-10-27 18:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1319277881-4128-1-git-send-email-pclouds@gmail.com>
2011-10-22 19:09 ` [PATCH 00/22] Refactor to accept NUL in commit messages Jeff King
2011-10-23 10:44   ` Robin Rosenberg
2011-10-23 16:09     ` Jeff King
2011-10-22 22:47 ` Junio C Hamano
2011-10-23  1:24   ` Nguyen Thai Ngoc Duy
2011-10-23  5:51     ` Junio C Hamano
2011-10-23  6:37       ` Nguyen Thai Ngoc Duy
2011-10-23  9:46         ` Junio C Hamano
2011-10-23 10:17           ` Nguyen Thai Ngoc Duy
2011-10-23 16:07           ` Jeff King
2011-10-23 20:16             ` Junio C Hamano
2011-10-24  4:40               ` Junio C Hamano
2011-10-24  5:10                 ` Nguyen Thai Ngoc Duy
2011-10-24 11:09                   ` Štěpán Němec
2011-10-24 22:45                 ` Jeff King
2011-10-25 10:16                   ` Štěpán Němec
2011-10-25 14:07                   ` Junio C Hamano
2011-10-27 18:13                     ` Jeff King [this message]
2011-10-27 18:47                       ` Junio C Hamano
2011-10-27 18:52                         ` Jeff King
2011-10-27 19:14                           ` Junio C Hamano
2011-10-27 23:44                             ` Jeff King
2011-10-28  0:03                               ` Junio C Hamano
2011-10-28  0:19                                 ` Jeff King
2011-10-28  1:40                               ` Miles Bader
2011-10-28  4:07                                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111027181303.GF1967@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).