git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Drew Northup <drew.northup@maine.edu>,
	Git mailing list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?
Date: Fri, 22 Oct 2010 10:58:35 -0700 (PDT)	[thread overview]
Message-ID: <m3wrpajek6.fsf@localhost.localdomain> (raw)
In-Reply-To: <20101022173055.GA11923@burratino>

Jonathan Nieder <jrnieder@gmail.com> writes:

> Drew Northup wrote:
> 
> > Please forgive me for being offended that UTF-16 text is not "generic"
> > enough.
> 
> First some words of explanation.
> 
> By "generic" I did not mean ubiquitous, unbranded, popular, or some
> other almost-synonym.  What I actually meant is that it is not obvious
> what to do with UTF-16.  Should it be converted to UTF-8 for output?
> Should it always be normalized when added to the index, so that
> switching between canonically equivalent sequences does not result
> in spurious diffs?  Should the byte-for-byte representation be
> faithfully preserved, even when it is not valid UTF-16?
> 
> When in such a situation, often a good approach is the following:
> take care of mechanism first, then policy.  So the first thing to do
> is to make sure that the code is _capable_ of what people are trying
> to do; then one can try various configurations and see what is most
> convenient; and finally, one can make sure the program behaves in an
> intuitive way by setting a reasonable default.
> 
> So by "generic" I meant those mechanisms that can be used in the
> context of multiple policies.

It would be nice if there was a way (perhaps stearable via
gitattributes) to change whether Git is to treat file as sequence of
bytes (as it is now), or as sequence of characters (probably like 
Perl 6, i.e. as sequence of graphemes), though this would require
to specify encoding (and normalization) used.

Wishful thinking
-- 
Jakub Narebski
Poland
ShadeHawk on #git

  reply	other threads:[~2010-10-22 17:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-22 16:06 [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...? Drew Northup
2010-10-22 16:18 ` Jonathan Nieder
2010-10-22 17:01   ` Drew Northup
2010-10-22 17:12     ` Jonathan Nieder
2010-10-22 17:27       ` Drew Northup
2010-10-22 17:30         ` Jonathan Nieder
2010-10-22 17:58           ` Jakub Narebski [this message]
2010-10-22 17:48         ` Jakub Narebski
2010-10-22 18:06           ` Drew Northup
2010-10-22 19:18             ` Jakub Narebski
2010-10-22 18:28   ` Joshua Juran
2010-10-22 19:13     ` Jeff King
2010-10-22 19:53     ` Jonathan Nieder
2010-10-22 20:18       ` Git Attribute: File Text Encoding {WAS: Re: [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?} Drew Northup
2010-10-22 21:49         ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3wrpajek6.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=drew.northup@maine.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).