From: Thomas Singer <thomas.singer@syntevo.com>
To: Johannes Sixt <j.sixt@viscovery.net>
Cc: git@vger.kernel.org
Subject: Re: non-US-ASCII file names (e.g. Hiragana) on Windows
Date: Tue, 01 Dec 2009 13:08:39 +0100 [thread overview]
Message-ID: <4B150747.2030900@syntevo.com> (raw)
In-Reply-To: <4B14E934.9090304@viscovery.net>
Johannes Sixt wrote:
> Thomas Singer schrieb:
>> Is it a German Windows limitation, that far-east characters are not
>> supported on it (but work fine on a Japanese Windows), are there different
>> (mysys)Git versions available or is this a configuration issue?
>
> It is a matter of configuration.
>
> Since 8 bits are not sufficient to support Japanese alphabet in addition
> to the German alphabet, programs that are not Unicode aware -- such as git
> -- have to make a decision which alphabet they support. The decision is
> made by picking a "codepage".
>
> On German Windows, you are in codepage 850 (in the console). The filenames
> (that actually are in Unicode) are converted to bytes according to
> codepage 850 *before* git sees them. If your filenames contain Hiragana,
> they are substituted by the "unknown character" marker because there is no
> place for them in codepage 850.
>
> However, you can install Japanese language support on German Windows. Then
> you can change your console to codepage 932:
>
> chcp 932
>
> When you run git from *this* console, Hiragana in the filenames are
> converted to cp932 before git sees them. The resulting byte sequence is
> different from the one in cp850, but git will be able to see that the file
> exists and was modified, and you can 'git add' it.
>
> But if you have files with umlauts, they will not be recognized anymore
> because umlauts have no place in cp932.
>
> In neither case can you exchange the repository with Linux if you have
> your locale set to UTF-8 on Linux, because neither byte sequence (umlauts
> from cp850 or Hiragana from cp932) are valid UTF-8 sequences, let alone
> result in the expected glyphs.
>
> Corollary: Stick to ASCII file names.
>
> There have been suggestions to switch the console to codepage 65001
> (UTF-8), but I have never heard of success reports. I'm not saying it does
> not work, though.
Thanks for the detailed explanation. I know the differences between bytes
and characters and the needed *encoding* to convert from one to another, but
I did not know how Git handles it. I'm quite surprised, that -- as I
understand you -- msys-Git (or Git at all?) is not able to handle all
characters (aka unicode) at the same time. I expected it would be better
than older tools, e.g. SVN.
BTW, we are invoking the Git executable from Java. Is there automatically a
console "around" Git? Should we invoke a shell-script (which sets the
console's code page) instead of the Git executable directly?
--
Tom
next prev parent reply other threads:[~2009-12-01 12:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-28 18:15 non-US-ASCII file names (e.g. Hiragana) on Windows Thomas Singer
2009-11-28 20:00 ` Johannes Sixt
2009-12-01 8:57 ` Thomas Singer
2009-12-01 9:04 ` Thomas Singer
2009-12-01 10:08 ` Johannes Sixt
2009-12-01 16:26 ` Shawn O. Pearce
2009-12-01 22:11 ` Robin Rosenberg
2009-11-28 23:07 ` Maximilien Noal
2009-11-29 9:18 ` Thomas Singer
2009-12-01 7:49 ` Thomas Singer
2009-12-01 8:27 ` Johannes Sixt
2009-12-01 8:55 ` Thomas Singer
2009-12-01 10:00 ` Johannes Sixt
2009-12-01 12:08 ` Thomas Singer [this message]
2009-12-01 13:17 ` Johannes Sixt
2009-12-01 15:41 ` Thomas Singer
2009-12-01 15:50 ` Erik Faye-Lund
2009-12-01 16:33 ` Thomas Singer
2010-10-30 4:02 ` brad12
2010-10-30 8:58 ` Jakub Narebski
2009-12-01 17:24 ` Jakub Narebski
2009-12-01 18:55 ` Thomas Singer
2009-12-02 16:22 ` Shawn Pearce
2010-10-30 9:52 ` demerphq
2009-12-01 9:12 ` Erik Faye-Lund
2009-12-01 12:11 ` Thomas Singer
2009-11-28 23:37 ` Reece Dunn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B150747.2030900@syntevo.com \
--to=thomas.singer@syntevo.com \
--cc=git@vger.kernel.org \
--cc=j.sixt@viscovery.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.