From: Thomas Singer <thomas.singer@syntevo.com>
To: Johannes Sixt <j.sixt@viscovery.net>
Cc: git@vger.kernel.org
Subject: Re: non-US-ASCII file names (e.g. Hiragana) on Windows
Date: Tue, 01 Dec 2009 13:08:39 +0100 [thread overview]
Message-ID: <4B150747.2030900@syntevo.com> (raw)
In-Reply-To: <4B14E934.9090304@viscovery.net>
Johannes Sixt wrote:
> Thomas Singer schrieb:
>> Is it a German Windows limitation, that far-east characters are not
>> supported on it (but work fine on a Japanese Windows), are there different
>> (mysys)Git versions available or is this a configuration issue?
>
> It is a matter of configuration.
>
> Since 8 bits are not sufficient to support Japanese alphabet in addition
> to the German alphabet, programs that are not Unicode aware -- such as git
> -- have to make a decision which alphabet they support. The decision is
> made by picking a "codepage".
>
> On German Windows, you are in codepage 850 (in the console). The filenames
> (that actually are in Unicode) are converted to bytes according to
> codepage 850 *before* git sees them. If your filenames contain Hiragana,
> they are substituted by the "unknown character" marker because there is no
> place for them in codepage 850.
>
> However, you can install Japanese language support on German Windows. Then
> you can change your console to codepage 932:
>
> chcp 932
>
> When you run git from *this* console, Hiragana in the filenames are
> converted to cp932 before git sees them. The resulting byte sequence is
> different from the one in cp850, but git will be able to see that the file
> exists and was modified, and you can 'git add' it.
>
> But if you have files with umlauts, they will not be recognized anymore
> because umlauts have no place in cp932.
>
> In neither case can you exchange the repository with Linux if you have
> your locale set to UTF-8 on Linux, because neither byte sequence (umlauts
> from cp850 or Hiragana from cp932) are valid UTF-8 sequences, let alone
> result in the expected glyphs.
>
> Corollary: Stick to ASCII file names.
>
> There have been suggestions to switch the console to codepage 65001
> (UTF-8), but I have never heard of success reports. I'm not saying it does
> not work, though.
Thanks for the detailed explanation. I know the differences between bytes
and characters and the needed *encoding* to convert from one to another, but
I did not know how Git handles it. I'm quite surprised, that -- as I
understand you -- msys-Git (or Git at all?) is not able to handle all
characters (aka unicode) at the same time. I expected it would be better
than older tools, e.g. SVN.
BTW, we are invoking the Git executable from Java. Is there automatically a
console "around" Git? Should we invoke a shell-script (which sets the
console's code page) instead of the Git executable directly?
--
Tom
next prev parent reply other threads:[~2009-12-01 12:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-28 18:15 non-US-ASCII file names (e.g. Hiragana) on Windows Thomas Singer
2009-11-28 20:00 ` Johannes Sixt
2009-12-01 8:57 ` Thomas Singer
2009-12-01 9:04 ` Thomas Singer
2009-12-01 10:08 ` Johannes Sixt
2009-12-01 16:26 ` Shawn O. Pearce
2009-12-01 22:11 ` Robin Rosenberg
2009-11-28 23:07 ` Maximilien Noal
2009-11-29 9:18 ` Thomas Singer
2009-12-01 7:49 ` Thomas Singer
2009-12-01 8:27 ` Johannes Sixt
2009-12-01 8:55 ` Thomas Singer
2009-12-01 10:00 ` Johannes Sixt
2009-12-01 12:08 ` Thomas Singer [this message]
2009-12-01 13:17 ` Johannes Sixt
2009-12-01 15:41 ` Thomas Singer
2009-12-01 15:50 ` Erik Faye-Lund
2009-12-01 16:33 ` Thomas Singer
2010-10-30 4:02 ` brad12
2010-10-30 8:58 ` Jakub Narebski
2009-12-01 17:24 ` Jakub Narebski
2009-12-01 18:55 ` Thomas Singer
2009-12-02 16:22 ` Shawn Pearce
2010-10-30 9:52 ` demerphq
2009-12-01 9:12 ` Erik Faye-Lund
2009-12-01 12:11 ` Thomas Singer
2009-11-28 23:37 ` Reece Dunn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B150747.2030900@syntevo.com \
--to=thomas.singer@syntevo.com \
--cc=git@vger.kernel.org \
--cc=j.sixt@viscovery.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).