* non-ascii filenames issue
@ 2009-04-05 9:36 Gregory Petrosyan
2009-04-05 9:54 ` Teemu Likonen
0 siblings, 1 reply; 11+ messages in thread
From: Gregory Petrosyan @ 2009-04-05 9:36 UTC (permalink / raw)
To: git
gregory@home:~$ git --version
git version 1.6.2.2.404.ge96f3
gregory@home:~$ mkdir git-test
gregory@home:~$ cd git-test
gregory@home:~/git-test$ touch файл
gregory@home:~/git-test$ ls -a
. .. файл
gregory@home:~/git-test$ git init
Initialized empty Git repository in /home/gregory/git-test/.git/
gregory@home:~/git-test$ git add .
gregory@home:~/git-test$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: "\321\204\320\260\320\271\320\273"
#
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"файл" should be here instead
This is on Ubuntu Jaunty beta, with latest git built from source.
Please CC me, I am not subscribed.
Gregory
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 9:36 non-ascii filenames issue Gregory Petrosyan
@ 2009-04-05 9:54 ` Teemu Likonen
2009-04-05 10:01 ` Gregory Petrosyan
0 siblings, 1 reply; 11+ messages in thread
From: Teemu Likonen @ 2009-04-05 9:54 UTC (permalink / raw)
To: Gregory Petrosyan; +Cc: git
On 2009-04-05 13:36 (+0400), Gregory Petrosyan wrote:
> # Changes to be committed:
> # (use "git rm --cached <file>..." to unstage)
> #
> # new file: "\321\204\320\260\320\271\320\273"
> #
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> "файл" should be here instead
It can be fixed with command:
git config --global core.quotepath false
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 9:54 ` Teemu Likonen
@ 2009-04-05 10:01 ` Gregory Petrosyan
2009-04-05 10:51 ` John Tapsell
0 siblings, 1 reply; 11+ messages in thread
From: Gregory Petrosyan @ 2009-04-05 10:01 UTC (permalink / raw)
To: Teemu Likonen; +Cc: git
On Sun, Apr 05, 2009 at 12:54:28PM +0300, Teemu Likonen wrote:
> On 2009-04-05 13:36 (+0400), Gregory Petrosyan wrote:
>
> > # Changes to be committed:
> > # (use "git rm --cached <file>..." to unstage)
> > #
> > # new file: "\321\204\320\260\320\271\320\273"
> > #
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > "файл" should be here instead
>
> It can be fixed with command:
>
> git config --global core.quotepath false
Thanks! That works. Does it make sence to set it to "false" by default?
Gregory
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 10:01 ` Gregory Petrosyan
@ 2009-04-05 10:51 ` John Tapsell
2009-04-05 16:23 ` Jay Soffian
2009-04-06 7:28 ` Peter Krefting
0 siblings, 2 replies; 11+ messages in thread
From: John Tapsell @ 2009-04-05 10:51 UTC (permalink / raw)
To: Teemu Likonen, git
2009/4/5 Gregory Petrosyan <gregory.petrosyan@gmail.com>:
> On Sun, Apr 05, 2009 at 12:54:28PM +0300, Teemu Likonen wrote:
>> On 2009-04-05 13:36 (+0400), Gregory Petrosyan wrote:
>>
>> > # Changes to be committed:
>> > # (use "git rm --cached <file>..." to unstage)
>> > #
>> > # new file: "\321\204\320\260\320\271\320\273"
>> > #
>> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> > "файл" should be here instead
>>
>> It can be fixed with command:
>>
>> git config --global core.quotepath false
>
> Thanks! That works. Does it make sence to set it to "false" by default?
Unfortunately not, because for some absolutely crazy reason, there is
no way at all to tell what encoding the string is in. It never
occured to anyone that it might actually be useful to be able to read
the filename in an unambiguous way. The result is this sort of mess.
Just wait until you try to checkout that file on a new filesystem with
a different encoding. Or try to checkout that file in Windows. It's
like git decided to step backwards 30 years.
John
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 10:51 ` John Tapsell
@ 2009-04-05 16:23 ` Jay Soffian
2009-04-05 19:29 ` Junio C Hamano
2009-04-06 7:28 ` Peter Krefting
1 sibling, 1 reply; 11+ messages in thread
From: Jay Soffian @ 2009-04-05 16:23 UTC (permalink / raw)
To: John Tapsell; +Cc: Teemu Likonen, git
On Sun, Apr 5, 2009 at 6:51 AM, John Tapsell <johnflux@gmail.com> wrote:
> Unfortunately not, because for some absolutely crazy reason
Bzzt. http://article.gmane.org/gmane.comp.version-control.git/50830
And, as always, patches welcomed.
j.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 16:23 ` Jay Soffian
@ 2009-04-05 19:29 ` Junio C Hamano
2009-04-05 20:22 ` Jay Soffian
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2009-04-05 19:29 UTC (permalink / raw)
To: Jay Soffian; +Cc: John Tapsell, Teemu Likonen, git
Jay Soffian <jaysoffian@gmail.com> writes:
> On Sun, Apr 5, 2009 at 6:51 AM, John Tapsell <johnflux@gmail.com> wrote:
>> Unfortunately not, because for some absolutely crazy reason
>
> Bzzt. http://article.gmane.org/gmane.comp.version-control.git/50830
I do not think the message gives enough information on the issue, as "a
pathname is a slash separated sequence of path components terminated with
a NUL, and a path component is an uninterpreted sequence of bytes
excluding NUL and slash" is simply a UNIX tradition the original git
design took as _given_, so the "some absolutely crazy reason" comment does
not even deserve refuting.
There is _no_ reason, crazy or otherwise. If you start from "a pathname
is an uninterpreted sequence of bytes" tradition, it is a design parameter
and "how things are", and you simply do not argue with them. And the
message you quoted doesn't, either.
Side note: I am not saying that we should not ever change that
particular design parameter. I am just explaining why 50830 is
not a good counterargument to quote against the "some absolutely
crazy reason" accusation.
> And, as always, patches welcomed.
Before patches, you need a sound design and justification.
At least you need to consider the following (the early ones are easier):
- Do we unify them to some canonical encoding internally and do the
matching in the canonical space? What's the internal representation
(presumably UTF-8)?
- How should a user tell the pathname conversion rules between the
internal repreasentation and the filesystem representation to git? A
config variable per a repository?
- How should this interact with patch+apply dataflow (including "rebase"
without -i/-m)? Should pathnames in diffs be in canonical form?
- How should this interact with case challenged and/or unicode corrupting
filesystems such as NTFS and HFSplus whose creat(), readdir(), and
stat() contradict with each other?
- What should happen when the pathname in the canonical representation
recorded in the history cannot be externalized on a particular
filesystem? Does it gracefully degenerate and give some escape hatch,
and if so how?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 19:29 ` Junio C Hamano
@ 2009-04-05 20:22 ` Jay Soffian
0 siblings, 0 replies; 11+ messages in thread
From: Jay Soffian @ 2009-04-05 20:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: John Tapsell, Teemu Likonen, git
> I do not think the message gives enough information on the issue
Of course you are correct. I was perturbed by John's message, but your
thoughtful reply is much more beneficial than my silly link. Thank you
for providing the level-headed response as always.
j.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-05 10:51 ` John Tapsell
2009-04-05 16:23 ` Jay Soffian
@ 2009-04-06 7:28 ` Peter Krefting
2009-04-06 9:12 ` Johannes Schindelin
2009-04-07 8:26 ` demerphq
1 sibling, 2 replies; 11+ messages in thread
From: Peter Krefting @ 2009-04-06 7:28 UTC (permalink / raw)
To: John Tapsell; +Cc: Teemu Likonen, git
John Tapsell:
> Unfortunately not, because for some absolutely crazy reason, there is no
> way at all to tell what encoding the string is in. It never occured to
> anyone that it might actually be useful to be able to read the filename in
> an unambiguous way.
It comes from the Unix tradition, unfortunately, that file names are just a
stream of bytes, instead of a stream of characters mapped to a byte
sequence. The "stream of bytes" think worked back when everyone used ASCII,
but as soon as other character encodings were used (i.e back in the 1970s or
so), that assumption broke.
> The result is this sort of mess. Just wait until you try to checkout that
> file on a new filesystem with a different encoding. Or try to checkout
> that file in Windows. It's like git decided to step backwards 30 years.
Since most people on Linux nowadays probably are running in a UTF-8-based
locale, I tried introducing some (very incomplete) patches for the Windows
port to make this assumption, to allow Windows users to make use of
non-ASCII file names (Windows uses Unicode strings for file names). Mac OS
uses (semi-decomposed) UTF-8 strings, so it should also be able to make use
of this.
Unfortunately, there seems to be quite some resistance towards deciding on
a platform- and language-independent way of storing file names in Git, but
rather just going the "Unix" way and making it someone elses problem. I find
this a bit sad.
--
\\// Peter - http://www.softwolves.pp.se/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-06 7:28 ` Peter Krefting
@ 2009-04-06 9:12 ` Johannes Schindelin
2009-04-06 22:33 ` Dmitry Potapov
2009-04-07 8:26 ` demerphq
1 sibling, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2009-04-06 9:12 UTC (permalink / raw)
To: Peter Krefting; +Cc: Teemu Likonen, git
Hi,
On Mon, 6 Apr 2009, Peter Krefting wrote:
> It comes from the Unix tradition, unfortunately, that file names are
> just a stream of bytes, instead of a stream of characters mapped to a
> byte sequence.
How is that different from .txt not having a defined locale?
Really, please, do not add to the non-information.
> Since most people on Linux nowadays probably are running in a
> UTF-8-based locale, I tried introducing some (very incomplete) patches
> for the Windows port to make this assumption, to allow Windows users to
> make use of non-ASCII file names (Windows uses Unicode strings for file
> names). Mac OS uses (semi-decomposed) UTF-8 strings, so it should also
> be able to make use of this.
Most Russian programmers I know do not run in a UTF-8 locale.
> Unfortunately, there seems to be quite some resistance towards deciding
> on a platform- and language-independent way of storing file names in
> Git, but rather just going the "Unix" way and making it someone elses
> problem. I find this a bit sad.
I find it a bit unfair that you say that, after many people participated
in that very informative thread, and after I tried to work with you
personally on getting the stuff into 4msysgit.git.
Actually, not just only a bit.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-06 9:12 ` Johannes Schindelin
@ 2009-04-06 22:33 ` Dmitry Potapov
0 siblings, 0 replies; 11+ messages in thread
From: Dmitry Potapov @ 2009-04-06 22:33 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Peter Krefting, Teemu Likonen, git
On Mon, Apr 06, 2009 at 11:12:35AM +0200, Johannes Schindelin wrote:
>
> Most Russian programmers I know do not run in a UTF-8 locale.
Actually, on Linux, people gradually switching to UTF-8 from koi8-r,
but on Windows MSCRT does not support UTF-8, so you do have much choice
here but to use Windows-1251. BTW, the upcoming Cygwin 1.7 is going to
have UTF-8 as the default locale. So, IMHO, UTF-8 is the only reasonable
choice for internal file name representation...
Dmitry
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue
2009-04-06 7:28 ` Peter Krefting
2009-04-06 9:12 ` Johannes Schindelin
@ 2009-04-07 8:26 ` demerphq
1 sibling, 0 replies; 11+ messages in thread
From: demerphq @ 2009-04-07 8:26 UTC (permalink / raw)
To: Peter Krefting; +Cc: John Tapsell, Git
2009/4/6 Peter Krefting <peter@softwolves.pp.se>:
> John Tapsell:
>
>> Unfortunately not, because for some absolutely crazy reason, there is no
>> way at all to tell what encoding the string is in. It never occured to
>> anyone that it might actually be useful to be able to read the filename in
>> an unambiguous way.
>
> It comes from the Unix tradition, unfortunately, that file names are just a
> stream of bytes, instead of a stream of characters mapped to a byte
> sequence. The "stream of bytes" think worked back when everyone used ASCII,
> but as soon as other character encodings were used (i.e back in the 1970s or
> so), that assumption broke.
Those interested in this subject may find the following document on
the creation of utf8 interesting.
http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-04-07 8:28 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-05 9:36 non-ascii filenames issue Gregory Petrosyan
2009-04-05 9:54 ` Teemu Likonen
2009-04-05 10:01 ` Gregory Petrosyan
2009-04-05 10:51 ` John Tapsell
2009-04-05 16:23 ` Jay Soffian
2009-04-05 19:29 ` Junio C Hamano
2009-04-05 20:22 ` Jay Soffian
2009-04-06 7:28 ` Peter Krefting
2009-04-06 9:12 ` Johannes Schindelin
2009-04-06 22:33 ` Dmitry Potapov
2009-04-07 8:26 ` demerphq
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).