git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Thomas Singer <thomas.singer@syntevo.com>
Cc: Johannes Sixt <j.sixt@viscovery.net>, git@vger.kernel.org
Subject: Re: non-US-ASCII file names (e.g. Hiragana) on Windows
Date: Tue, 01 Dec 2009 09:24:58 -0800 (PST)	[thread overview]
Message-ID: <m3k4x6na81.fsf@localhost.localdomain> (raw)
In-Reply-To: <4B150747.2030900@syntevo.com>

Thomas Singer <thomas.singer@syntevo.com> writes:

> Johannes Sixt wrote:
>> Thomas Singer schrieb:
>>>
>>> Is it a German Windows limitation, that far-east characters are not
>>> supported on it (but work fine on a Japanese Windows), are there different
>>> (mysys)Git versions available or is this a configuration issue?
>> 
>> It is a matter of configuration.
>> 
>> Since 8 bits are not sufficient to support Japanese alphabet in addition
>> to the German alphabet, programs that are not Unicode aware -- such as git
>> -- have to make a decision which alphabet they support. The decision is
>> made by picking a "codepage".
>> 
>> On German Windows, you are in codepage 850 (in the console). The filenames
>>  (that actually are in Unicode) are converted to bytes according to
>> codepage 850 *before* git sees them. If your filenames contain Hiragana,
>> they are substituted by the "unknown character" marker because there is no
>> place for them in codepage 850.
[...]

>> Corollary: Stick to ASCII file names.
>> 
>> There have been suggestions to switch the console to codepage 65001
>> (UTF-8), but I have never heard of success reports. I'm not saying it does
>> not work, though.
> 
> Thanks for the detailed explanation. I know the differences between bytes
> and characters and the needed *encoding* to convert from one to another, but
> I did not know how Git handles it. I'm quite surprised, that -- as I
> understand you -- msys-Git (or Git at all?) is not able to handle all
> characters (aka unicode) at the same time. I expected it would be better
> than older tools, e.g. SVN.

The problem is not with Git, as Git is (currently) agnostic with
respect to filename encoding; for Git filenames are opaque NUL ('\0)
terminated binary data.  There is some infrastructure to convert
between filename encodings and other filename quirks (like
case-insensivity), though...

The problem is with MS Windows *console*, from which you invoke git
commands, and which does translation from filename encoding used by
the filesystem to encoding / codepage used by console.

> BTW, we are invoking the Git executable from Java. Is there automatically a
> console "around" Git? Should we invoke a shell-script (which sets the
> console's code page) instead of the Git executable directly?

If you use Git from Java, why don't you just use JGit (www.jgit.org),
which is Git implementation in Java?

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2009-12-01 17:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-28 18:15 non-US-ASCII file names (e.g. Hiragana) on Windows Thomas Singer
2009-11-28 20:00 ` Johannes Sixt
2009-12-01  8:57   ` Thomas Singer
2009-12-01  9:04     ` Thomas Singer
2009-12-01 10:08       ` Johannes Sixt
2009-12-01 16:26         ` Shawn O. Pearce
2009-12-01 22:11           ` Robin Rosenberg
2009-11-28 23:07 ` Maximilien Noal
2009-11-29  9:18   ` Thomas Singer
2009-12-01  7:49     ` Thomas Singer
2009-12-01  8:27       ` Johannes Sixt
2009-12-01  8:55         ` Thomas Singer
2009-12-01 10:00           ` Johannes Sixt
2009-12-01 12:08             ` Thomas Singer
2009-12-01 13:17               ` Johannes Sixt
2009-12-01 15:41                 ` Thomas Singer
2009-12-01 15:50                   ` Erik Faye-Lund
2009-12-01 16:33                     ` Thomas Singer
2010-10-30  4:02                       ` brad12
2010-10-30  8:58                         ` Jakub Narebski
2009-12-01 17:24               ` Jakub Narebski [this message]
2009-12-01 18:55                 ` Thomas Singer
2009-12-02 16:22                   ` Shawn Pearce
2010-10-30  9:52                 ` demerphq
2009-12-01  9:12     ` Erik Faye-Lund
2009-12-01 12:11       ` Thomas Singer
2009-11-28 23:37 ` Reece Dunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3k4x6na81.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=j.sixt@viscovery.net \
    --cc=thomas.singer@syntevo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).