* non-ascii filenames issue @ 2009-04-05 9:36 Gregory Petrosyan 2009-04-05 9:54 ` Teemu Likonen 0 siblings, 1 reply; 11+ messages in thread From: Gregory Petrosyan @ 2009-04-05 9:36 UTC (permalink / raw) To: git gregory@home:~$ git --version git version 1.6.2.2.404.ge96f3 gregory@home:~$ mkdir git-test gregory@home:~$ cd git-test gregory@home:~/git-test$ touch файл gregory@home:~/git-test$ ls -a . .. файл gregory@home:~/git-test$ git init Initialized empty Git repository in /home/gregory/git-test/.git/ gregory@home:~/git-test$ git add . gregory@home:~/git-test$ git status # On branch master # # Initial commit # # Changes to be committed: # (use "git rm --cached <file>..." to unstage) # # new file: "\321\204\320\260\320\271\320\273" # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ "файл" should be here instead This is on Ubuntu Jaunty beta, with latest git built from source. Please CC me, I am not subscribed. Gregory ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 9:36 non-ascii filenames issue Gregory Petrosyan @ 2009-04-05 9:54 ` Teemu Likonen 2009-04-05 10:01 ` Gregory Petrosyan 0 siblings, 1 reply; 11+ messages in thread From: Teemu Likonen @ 2009-04-05 9:54 UTC (permalink / raw) To: Gregory Petrosyan; +Cc: git On 2009-04-05 13:36 (+0400), Gregory Petrosyan wrote: > # Changes to be committed: > # (use "git rm --cached <file>..." to unstage) > # > # new file: "\321\204\320\260\320\271\320\273" > # > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > "файл" should be here instead It can be fixed with command: git config --global core.quotepath false ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 9:54 ` Teemu Likonen @ 2009-04-05 10:01 ` Gregory Petrosyan 2009-04-05 10:51 ` John Tapsell 0 siblings, 1 reply; 11+ messages in thread From: Gregory Petrosyan @ 2009-04-05 10:01 UTC (permalink / raw) To: Teemu Likonen; +Cc: git On Sun, Apr 05, 2009 at 12:54:28PM +0300, Teemu Likonen wrote: > On 2009-04-05 13:36 (+0400), Gregory Petrosyan wrote: > > > # Changes to be committed: > > # (use "git rm --cached <file>..." to unstage) > > # > > # new file: "\321\204\320\260\320\271\320\273" > > # > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > "файл" should be here instead > > It can be fixed with command: > > git config --global core.quotepath false Thanks! That works. Does it make sence to set it to "false" by default? Gregory ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 10:01 ` Gregory Petrosyan @ 2009-04-05 10:51 ` John Tapsell 2009-04-05 16:23 ` Jay Soffian 2009-04-06 7:28 ` Peter Krefting 0 siblings, 2 replies; 11+ messages in thread From: John Tapsell @ 2009-04-05 10:51 UTC (permalink / raw) To: Teemu Likonen, git 2009/4/5 Gregory Petrosyan <gregory.petrosyan@gmail.com>: > On Sun, Apr 05, 2009 at 12:54:28PM +0300, Teemu Likonen wrote: >> On 2009-04-05 13:36 (+0400), Gregory Petrosyan wrote: >> >> > # Changes to be committed: >> > # (use "git rm --cached <file>..." to unstage) >> > # >> > # new file: "\321\204\320\260\320\271\320\273" >> > # >> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> > "файл" should be here instead >> >> It can be fixed with command: >> >> git config --global core.quotepath false > > Thanks! That works. Does it make sence to set it to "false" by default? Unfortunately not, because for some absolutely crazy reason, there is no way at all to tell what encoding the string is in. It never occured to anyone that it might actually be useful to be able to read the filename in an unambiguous way. The result is this sort of mess. Just wait until you try to checkout that file on a new filesystem with a different encoding. Or try to checkout that file in Windows. It's like git decided to step backwards 30 years. John ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 10:51 ` John Tapsell @ 2009-04-05 16:23 ` Jay Soffian 2009-04-05 19:29 ` Junio C Hamano 2009-04-06 7:28 ` Peter Krefting 1 sibling, 1 reply; 11+ messages in thread From: Jay Soffian @ 2009-04-05 16:23 UTC (permalink / raw) To: John Tapsell; +Cc: Teemu Likonen, git On Sun, Apr 5, 2009 at 6:51 AM, John Tapsell <johnflux@gmail.com> wrote: > Unfortunately not, because for some absolutely crazy reason Bzzt. http://article.gmane.org/gmane.comp.version-control.git/50830 And, as always, patches welcomed. j. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 16:23 ` Jay Soffian @ 2009-04-05 19:29 ` Junio C Hamano 2009-04-05 20:22 ` Jay Soffian 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2009-04-05 19:29 UTC (permalink / raw) To: Jay Soffian; +Cc: John Tapsell, Teemu Likonen, git Jay Soffian <jaysoffian@gmail.com> writes: > On Sun, Apr 5, 2009 at 6:51 AM, John Tapsell <johnflux@gmail.com> wrote: >> Unfortunately not, because for some absolutely crazy reason > > Bzzt. http://article.gmane.org/gmane.comp.version-control.git/50830 I do not think the message gives enough information on the issue, as "a pathname is a slash separated sequence of path components terminated with a NUL, and a path component is an uninterpreted sequence of bytes excluding NUL and slash" is simply a UNIX tradition the original git design took as _given_, so the "some absolutely crazy reason" comment does not even deserve refuting. There is _no_ reason, crazy or otherwise. If you start from "a pathname is an uninterpreted sequence of bytes" tradition, it is a design parameter and "how things are", and you simply do not argue with them. And the message you quoted doesn't, either. Side note: I am not saying that we should not ever change that particular design parameter. I am just explaining why 50830 is not a good counterargument to quote against the "some absolutely crazy reason" accusation. > And, as always, patches welcomed. Before patches, you need a sound design and justification. At least you need to consider the following (the early ones are easier): - Do we unify them to some canonical encoding internally and do the matching in the canonical space? What's the internal representation (presumably UTF-8)? - How should a user tell the pathname conversion rules between the internal repreasentation and the filesystem representation to git? A config variable per a repository? - How should this interact with patch+apply dataflow (including "rebase" without -i/-m)? Should pathnames in diffs be in canonical form? - How should this interact with case challenged and/or unicode corrupting filesystems such as NTFS and HFSplus whose creat(), readdir(), and stat() contradict with each other? - What should happen when the pathname in the canonical representation recorded in the history cannot be externalized on a particular filesystem? Does it gracefully degenerate and give some escape hatch, and if so how? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 19:29 ` Junio C Hamano @ 2009-04-05 20:22 ` Jay Soffian 0 siblings, 0 replies; 11+ messages in thread From: Jay Soffian @ 2009-04-05 20:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: John Tapsell, Teemu Likonen, git > I do not think the message gives enough information on the issue Of course you are correct. I was perturbed by John's message, but your thoughtful reply is much more beneficial than my silly link. Thank you for providing the level-headed response as always. j. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-05 10:51 ` John Tapsell 2009-04-05 16:23 ` Jay Soffian @ 2009-04-06 7:28 ` Peter Krefting 2009-04-06 9:12 ` Johannes Schindelin 2009-04-07 8:26 ` demerphq 1 sibling, 2 replies; 11+ messages in thread From: Peter Krefting @ 2009-04-06 7:28 UTC (permalink / raw) To: John Tapsell; +Cc: Teemu Likonen, git John Tapsell: > Unfortunately not, because for some absolutely crazy reason, there is no > way at all to tell what encoding the string is in. It never occured to > anyone that it might actually be useful to be able to read the filename in > an unambiguous way. It comes from the Unix tradition, unfortunately, that file names are just a stream of bytes, instead of a stream of characters mapped to a byte sequence. The "stream of bytes" think worked back when everyone used ASCII, but as soon as other character encodings were used (i.e back in the 1970s or so), that assumption broke. > The result is this sort of mess. Just wait until you try to checkout that > file on a new filesystem with a different encoding. Or try to checkout > that file in Windows. It's like git decided to step backwards 30 years. Since most people on Linux nowadays probably are running in a UTF-8-based locale, I tried introducing some (very incomplete) patches for the Windows port to make this assumption, to allow Windows users to make use of non-ASCII file names (Windows uses Unicode strings for file names). Mac OS uses (semi-decomposed) UTF-8 strings, so it should also be able to make use of this. Unfortunately, there seems to be quite some resistance towards deciding on a platform- and language-independent way of storing file names in Git, but rather just going the "Unix" way and making it someone elses problem. I find this a bit sad. -- \\// Peter - http://www.softwolves.pp.se/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-06 7:28 ` Peter Krefting @ 2009-04-06 9:12 ` Johannes Schindelin 2009-04-06 22:33 ` Dmitry Potapov 2009-04-07 8:26 ` demerphq 1 sibling, 1 reply; 11+ messages in thread From: Johannes Schindelin @ 2009-04-06 9:12 UTC (permalink / raw) To: Peter Krefting; +Cc: Teemu Likonen, git Hi, On Mon, 6 Apr 2009, Peter Krefting wrote: > It comes from the Unix tradition, unfortunately, that file names are > just a stream of bytes, instead of a stream of characters mapped to a > byte sequence. How is that different from .txt not having a defined locale? Really, please, do not add to the non-information. > Since most people on Linux nowadays probably are running in a > UTF-8-based locale, I tried introducing some (very incomplete) patches > for the Windows port to make this assumption, to allow Windows users to > make use of non-ASCII file names (Windows uses Unicode strings for file > names). Mac OS uses (semi-decomposed) UTF-8 strings, so it should also > be able to make use of this. Most Russian programmers I know do not run in a UTF-8 locale. > Unfortunately, there seems to be quite some resistance towards deciding > on a platform- and language-independent way of storing file names in > Git, but rather just going the "Unix" way and making it someone elses > problem. I find this a bit sad. I find it a bit unfair that you say that, after many people participated in that very informative thread, and after I tried to work with you personally on getting the stuff into 4msysgit.git. Actually, not just only a bit. Ciao, Dscho ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-06 9:12 ` Johannes Schindelin @ 2009-04-06 22:33 ` Dmitry Potapov 0 siblings, 0 replies; 11+ messages in thread From: Dmitry Potapov @ 2009-04-06 22:33 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Peter Krefting, Teemu Likonen, git On Mon, Apr 06, 2009 at 11:12:35AM +0200, Johannes Schindelin wrote: > > Most Russian programmers I know do not run in a UTF-8 locale. Actually, on Linux, people gradually switching to UTF-8 from koi8-r, but on Windows MSCRT does not support UTF-8, so you do have much choice here but to use Windows-1251. BTW, the upcoming Cygwin 1.7 is going to have UTF-8 as the default locale. So, IMHO, UTF-8 is the only reasonable choice for internal file name representation... Dmitry ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: non-ascii filenames issue 2009-04-06 7:28 ` Peter Krefting 2009-04-06 9:12 ` Johannes Schindelin @ 2009-04-07 8:26 ` demerphq 1 sibling, 0 replies; 11+ messages in thread From: demerphq @ 2009-04-07 8:26 UTC (permalink / raw) To: Peter Krefting; +Cc: John Tapsell, Git 2009/4/6 Peter Krefting <peter@softwolves.pp.se>: > John Tapsell: > >> Unfortunately not, because for some absolutely crazy reason, there is no >> way at all to tell what encoding the string is in. It never occured to >> anyone that it might actually be useful to be able to read the filename in >> an unambiguous way. > > It comes from the Unix tradition, unfortunately, that file names are just a > stream of bytes, instead of a stream of characters mapped to a byte > sequence. The "stream of bytes" think worked back when everyone used ASCII, > but as soon as other character encodings were used (i.e back in the 1970s or > so), that assumption broke. Those interested in this subject may find the following document on the creation of utf8 interesting. http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-04-07 8:28 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-05 9:36 non-ascii filenames issue Gregory Petrosyan 2009-04-05 9:54 ` Teemu Likonen 2009-04-05 10:01 ` Gregory Petrosyan 2009-04-05 10:51 ` John Tapsell 2009-04-05 16:23 ` Jay Soffian 2009-04-05 19:29 ` Junio C Hamano 2009-04-05 20:22 ` Jay Soffian 2009-04-06 7:28 ` Peter Krefting 2009-04-06 9:12 ` Johannes Schindelin 2009-04-06 22:33 ` Dmitry Potapov 2009-04-07 8:26 ` demerphq
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).