From: Karsten Blees <karsten.blees@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH] Documentation/i18n.txt: clarify character encoding support
Date: Mon, 15 Jun 2015 12:08:33 +0200 [thread overview]
Message-ID: <557EA421.5050706@gmail.com> (raw)
In-Reply-To: <xmqqmw01ltid.fsf@gitster.dls.corp.google.com>
Am 15.06.2015 um 02:12 schrieb Junio C Hamano:
> Karsten Blees <karsten.blees@gmail.com> writes:
>
>> diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt
>> index e9a1d5d..e5f6233 100644
>> --- a/Documentation/i18n.txt
>> +++ b/Documentation/i18n.txt
>> @@ -1,18 +1,28 @@
>> -At the core level, Git is character encoding agnostic.
>> -
>> - - The pathnames recorded in the index and in the tree objects
>> - are treated as uninterpreted sequences of non-NUL bytes.
>> - What readdir(2) returns are what are recorded and compared
>> - with the data Git keeps track of, which in turn are expected
>> - to be what lstat(2) and creat(2) accepts. There is no such
>> - thing as pathname encoding translation.
>> +Git is to some extent character encoding agnostic.
>
> I do not think the removal of the text makes much sense here unless
> you add the equivalent to the new text below.
>
>> - The contents of the blob objects are uninterpreted sequences
>> of bytes. There is no encoding translation at the core
>> level.
>>
>> - - The commit log messages are uninterpreted sequences of non-NUL
>> - bytes.
>> + - Pathnames are encoded in UTF-8 normalization form C. This
>
> That is true only on some systems like OSX (with HFS+) and Windows,
> no? BSDs in general and Linux do not do any such mangling IIRC.
Modern Unices don't need any such mangling because UTF-8 NFC should
be the default system encoding. I'm not sure for BSDs, but it has
been the default on all major Linux distros for more than 10 years.
> I
> am OK with mangling described as a notable oddball to warn users,
> though; i.e. not as a norm as your new text suggests but as an
> exception.
>
I would guess that non-UTF-8 Unices (or file systems) are the oddball
case, which is why I described them last. But I could be wrong.
>> + platforms. If file system APIs don't use UTF-8 (which may be
>> + file system specific), it is recommended to stick to pure
>> + ASCII file names.
>
> Hmph, who endorsed such a recommendation? It is recommended to
> stick to whatever naming scheme that would not cause troubles to
> project participants. If your participants all want to (and can)
> use ISO-8859-1, we do not discourage them from doing so.
>
ISO-8859-x file names may be fine if you won't ever need to:
- use git-web, JGit, gitk, git-gui...
- exchange repos with "normal" (UTF-8) Unices, Mac and Windows systems
- publish your work on a git hosting service (and expect file and
ref names to show up correctly in the web interface)
- store the repo on Unicode-based file systems (JFS, Joliet, UDF,
exFat, NTFS, HFS, CIFS...)
These restrictions are not that obvious when you start a new git
project, and while converting file names after the fact is possible
(e.g. using the recodetree script we shipped with Git for Windows
1.7.10), it will destroy history.
Thus I think we should strongly discourage users from using anything
but UTF-8.
next prev parent reply other threads:[~2015-06-15 10:08 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-13 20:24 [PATCH] Documentation/i18n.txt: clarify character encoding support Karsten Blees
2015-06-15 0:12 ` Junio C Hamano
2015-06-15 10:08 ` Karsten Blees [this message]
2015-06-17 20:45 ` Junio C Hamano
2015-07-01 19:10 ` [PATCH v2] " Karsten Blees
2015-07-02 5:25 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=557EA421.5050706@gmail.com \
--to=karsten.blees@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.