All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Thomas Singer <thomas.singer@syntevo.com>
Cc: Johannes Sixt <j.sixt@viscovery.net>, git@vger.kernel.org
Subject: Re: non-US-ASCII file names (e.g. Hiragana) on Windows
Date: Tue, 01 Dec 2009 09:24:58 -0800 (PST)	[thread overview]
Message-ID: <m3k4x6na81.fsf@localhost.localdomain> (raw)
In-Reply-To: <4B150747.2030900@syntevo.com>

Thomas Singer <thomas.singer@syntevo.com> writes:

> Johannes Sixt wrote:
>> Thomas Singer schrieb:
>>>
>>> Is it a German Windows limitation, that far-east characters are not
>>> supported on it (but work fine on a Japanese Windows), are there different
>>> (mysys)Git versions available or is this a configuration issue?
>> 
>> It is a matter of configuration.
>> 
>> Since 8 bits are not sufficient to support Japanese alphabet in addition
>> to the German alphabet, programs that are not Unicode aware -- such as git
>> -- have to make a decision which alphabet they support. The decision is
>> made by picking a "codepage".
>> 
>> On German Windows, you are in codepage 850 (in the console). The filenames
>>  (that actually are in Unicode) are converted to bytes according to
>> codepage 850 *before* git sees them. If your filenames contain Hiragana,
>> they are substituted by the "unknown character" marker because there is no
>> place for them in codepage 850.
[...]

>> Corollary: Stick to ASCII file names.
>> 
>> There have been suggestions to switch the console to codepage 65001
>> (UTF-8), but I have never heard of success reports. I'm not saying it does
>> not work, though.
> 
> Thanks for the detailed explanation. I know the differences between bytes
> and characters and the needed *encoding* to convert from one to another, but
> I did not know how Git handles it. I'm quite surprised, that -- as I
> understand you -- msys-Git (or Git at all?) is not able to handle all
> characters (aka unicode) at the same time. I expected it would be better
> than older tools, e.g. SVN.

The problem is not with Git, as Git is (currently) agnostic with
respect to filename encoding; for Git filenames are opaque NUL ('\0)
terminated binary data.  There is some infrastructure to convert
between filename encodings and other filename quirks (like
case-insensivity), though...

The problem is with MS Windows *console*, from which you invoke git
commands, and which does translation from filename encoding used by
the filesystem to encoding / codepage used by console.

> BTW, we are invoking the Git executable from Java. Is there automatically a
> console "around" Git? Should we invoke a shell-script (which sets the
> console's code page) instead of the Git executable directly?

If you use Git from Java, why don't you just use JGit (www.jgit.org),
which is Git implementation in Java?

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2009-12-01 17:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-28 18:15 non-US-ASCII file names (e.g. Hiragana) on Windows Thomas Singer
2009-11-28 20:00 ` Johannes Sixt
2009-12-01  8:57   ` Thomas Singer
2009-12-01  9:04     ` Thomas Singer
2009-12-01 10:08       ` Johannes Sixt
2009-12-01 16:26         ` Shawn O. Pearce
2009-12-01 22:11           ` Robin Rosenberg
2009-11-28 23:07 ` Maximilien Noal
2009-11-29  9:18   ` Thomas Singer
2009-12-01  7:49     ` Thomas Singer
2009-12-01  8:27       ` Johannes Sixt
2009-12-01  8:55         ` Thomas Singer
2009-12-01 10:00           ` Johannes Sixt
2009-12-01 12:08             ` Thomas Singer
2009-12-01 13:17               ` Johannes Sixt
2009-12-01 15:41                 ` Thomas Singer
2009-12-01 15:50                   ` Erik Faye-Lund
2009-12-01 16:33                     ` Thomas Singer
2010-10-30  4:02                       ` brad12
2010-10-30  8:58                         ` Jakub Narebski
2009-12-01 17:24               ` Jakub Narebski [this message]
2009-12-01 18:55                 ` Thomas Singer
2009-12-02 16:22                   ` Shawn Pearce
2010-10-30  9:52                 ` demerphq
2009-12-01  9:12     ` Erik Faye-Lund
2009-12-01 12:11       ` Thomas Singer
2009-11-28 23:37 ` Reece Dunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3k4x6na81.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=j.sixt@viscovery.net \
    --cc=thomas.singer@syntevo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.