From: Karsten Blees <karsten.blees@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH] Documentation/i18n.txt: clarify character encoding support
Date: Mon, 15 Jun 2015 12:08:33 +0200 [thread overview]
Message-ID: <557EA421.5050706@gmail.com> (raw)
In-Reply-To: <xmqqmw01ltid.fsf@gitster.dls.corp.google.com>
Am 15.06.2015 um 02:12 schrieb Junio C Hamano:
> Karsten Blees <karsten.blees@gmail.com> writes:
>
>> diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt
>> index e9a1d5d..e5f6233 100644
>> --- a/Documentation/i18n.txt
>> +++ b/Documentation/i18n.txt
>> @@ -1,18 +1,28 @@
>> -At the core level, Git is character encoding agnostic.
>> -
>> - - The pathnames recorded in the index and in the tree objects
>> - are treated as uninterpreted sequences of non-NUL bytes.
>> - What readdir(2) returns are what are recorded and compared
>> - with the data Git keeps track of, which in turn are expected
>> - to be what lstat(2) and creat(2) accepts. There is no such
>> - thing as pathname encoding translation.
>> +Git is to some extent character encoding agnostic.
>
> I do not think the removal of the text makes much sense here unless
> you add the equivalent to the new text below.
>
>> - The contents of the blob objects are uninterpreted sequences
>> of bytes. There is no encoding translation at the core
>> level.
>>
>> - - The commit log messages are uninterpreted sequences of non-NUL
>> - bytes.
>> + - Pathnames are encoded in UTF-8 normalization form C. This
>
> That is true only on some systems like OSX (with HFS+) and Windows,
> no? BSDs in general and Linux do not do any such mangling IIRC.
Modern Unices don't need any such mangling because UTF-8 NFC should
be the default system encoding. I'm not sure for BSDs, but it has
been the default on all major Linux distros for more than 10 years.
> I
> am OK with mangling described as a notable oddball to warn users,
> though; i.e. not as a norm as your new text suggests but as an
> exception.
>
I would guess that non-UTF-8 Unices (or file systems) are the oddball
case, which is why I described them last. But I could be wrong.
>> + platforms. If file system APIs don't use UTF-8 (which may be
>> + file system specific), it is recommended to stick to pure
>> + ASCII file names.
>
> Hmph, who endorsed such a recommendation? It is recommended to
> stick to whatever naming scheme that would not cause troubles to
> project participants. If your participants all want to (and can)
> use ISO-8859-1, we do not discourage them from doing so.
>
ISO-8859-x file names may be fine if you won't ever need to:
- use git-web, JGit, gitk, git-gui...
- exchange repos with "normal" (UTF-8) Unices, Mac and Windows systems
- publish your work on a git hosting service (and expect file and
ref names to show up correctly in the web interface)
- store the repo on Unicode-based file systems (JFS, Joliet, UDF,
exFat, NTFS, HFS, CIFS...)
These restrictions are not that obvious when you start a new git
project, and while converting file names after the fact is possible
(e.g. using the recodetree script we shipped with Git for Windows
1.7.10), it will destroy history.
Thus I think we should strongly discourage users from using anything
but UTF-8.
next prev parent reply other threads:[~2015-06-15 10:08 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-13 20:24 [PATCH] Documentation/i18n.txt: clarify character encoding support Karsten Blees
2015-06-15 0:12 ` Junio C Hamano
2015-06-15 10:08 ` Karsten Blees [this message]
2015-06-17 20:45 ` Junio C Hamano
2015-07-01 19:10 ` [PATCH v2] " Karsten Blees
2015-07-02 5:25 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=557EA421.5050706@gmail.com \
--to=karsten.blees@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).