* Moving a directory into another fails @ 2006-07-26 15:00 Jon Smirl 2006-07-26 22:34 ` Nicolas Vilz 0 siblings, 1 reply; 24+ messages in thread From: Jon Smirl @ 2006-07-26 15:00 UTC (permalink / raw) To: git I cloned a git project. Then in the original I did mkdir for a new directory and use git mv to move an existing directory into it. I then used cg diff to generate a patch for the move. When I use cg patch to apply this patch to the cloned tree it fails. This seems to be a problem in the git code, not cg. It is not picking up the creation of the new intervening subdirectory correctly. I just synced and this does not work in the current code. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Moving a directory into another fails 2006-07-26 15:00 Moving a directory into another fails Jon Smirl @ 2006-07-26 22:34 ` Nicolas Vilz 2006-07-26 23:03 ` Jon Smirl 0 siblings, 1 reply; 24+ messages in thread From: Nicolas Vilz @ 2006-07-26 22:34 UTC (permalink / raw) To: Jon Smirl; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1887 bytes --] On Wed, Jul 26, 2006 at 11:00:48AM -0400, Jon Smirl wrote: > I cloned a git project. Then in the original I did mkdir for a new > directory and use git mv to move an existing directory into it. I then > used cg diff to generate a patch for the move. > > When I use cg patch to apply this patch to the cloned tree it fails. > This seems to be a problem in the git code, not cg. It is not picking > up the creation of the new intervening subdirectory correctly. > > I just synced and this does not work in the current code. I tried to reproduce your scenario and before that I setup a test repository. (1) mkdir git_test (2) cd git_test (3) git init-db (4) vim test.txt # fill in some bogus text (5) mkdir testing (6) cd testing (7) vim test1.txt # again, fill in some bogus text (8) cd .. (9) cg add test.txt testing/test1.txt (10) cg commit -C # just give a fancy commit message... (11) cd .. (12) mkdir bare_git (13) cd bare_git (14) mkdir git_test.git (15) GIT_DIR=git_test.git git init-db (16) cd ../git_test (17) git push ../bare_git/git_test.git --all (18) cd ../ (19) git clone bare_git/git_test.git git_test2 (20) cd git_test (21) mkdir blah_test (22) git mv testing/ blah_test/ (23) cg diff > ../mkdir_patch.diff (24) cd .. (25) cd git_test2/ (26) cg patch < ../mkdir_patch.diff from the last one (26) i get patching file blah_test/testing/test1.txt patching file testing/test1.txt touch: cannot touch `testing/test1.txt': No such file or directory Adding file blah_test/testing/test1.txt Removing file testing/test1.txt but the result is correct. There is no testing-directory in here anymore, and inside blah_test, there is my testing dir with the file test1.txt in it... did I miss something? I use cogito-0.17.3 with git version 1.4.1 (obviously without that recently rewritten git-mv...) Nicolas [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Moving a directory into another fails 2006-07-26 22:34 ` Nicolas Vilz @ 2006-07-26 23:03 ` Jon Smirl 2006-07-26 23:25 ` Nicolas Vilz 2006-07-28 1:43 ` Petr Baudis 0 siblings, 2 replies; 24+ messages in thread From: Jon Smirl @ 2006-07-26 23:03 UTC (permalink / raw) To: Nicolas Vilz; +Cc: git This is a simpler sequence cg clone git foo cg clone git foo1 cd foo mkdir zzz git mv gitweb zzz cg diff >patch cg ../foo1 cg patch <../foo/patch Fails with these errors. We have determined that git apply patch is ok and this is a bug in cg patch. [jonsmirl@jonsmirl foo1]$ cg patch <../foo/patch mv: cannot move `gitweb/README' to `zzz/gitweb/README': No such file or directory mv: cannot move `gitweb/gitweb.cgi' to `zzz/gitweb/gitweb.cgi': No such file or directory mv: cannot move `gitweb/gitweb.css' to `zzz/gitweb/gitweb.css': No such file or directory mv: cannot stat `"gitweb/test/M\\303\\244rchen"': No such file or directory mv: cannot move `gitweb/test/file with spaces' to `zzz/gitweb/test/file with spaces': No such file or directory mv: cannot move `gitweb/test/file+plus+sign' to `zzz/gitweb/test/file+plus+sign': No such file or directory patch: **** Only garbage was found in the patch input. Removing file gitweb/README Adding file zzz/gitweb/README error: zzz/gitweb/README: does not exist and --remove not passed fatal: Unable to process file zzz/gitweb/README cg-add: warning: not all items could have been added Removing file gitweb/gitweb.cgi Adding file zzz/gitweb/gitweb.cgi error: zzz/gitweb/gitweb.cgi: does not exist and --remove not passed fatal: Unable to process file zzz/gitweb/gitweb.cgi cg-add: warning: not all items could have been added Removing file gitweb/gitweb.css Adding file zzz/gitweb/gitweb.css error: zzz/gitweb/gitweb.css: does not exist and --remove not passed fatal: Unable to process file zzz/gitweb/gitweb.css cg-add: warning: not all items could have been added Removing file "gitweb/test/Märchen" Adding file "zzz/gitweb/test/Märchen" error: "zzz/gitweb/test/Märchen": does not exist and --remove not passed fatal: Unable to process file "zzz/gitweb/test/Märchen" cg-add: warning: not all items could have been added Removing file gitweb/test/file with spaces Adding file zzz/gitweb/test/file with spaces error: zzz/gitweb/test/file with spaces: does not exist and --remove not passed fatal: Unable to process file zzz/gitweb/test/file with spaces cg-add: warning: not all items could have been added Removing file gitweb/test/file+plus+sign Adding file zzz/gitweb/test/file+plus+sign error: zzz/gitweb/test/file+plus+sign: does not exist and --remove not passed fatal: Unable to process file zzz/gitweb/test/file+plus+sign cg-add: warning: not all items could have been added [jonsmirl@jonsmirl foo1]$ -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Moving a directory into another fails 2006-07-26 23:03 ` Jon Smirl @ 2006-07-26 23:25 ` Nicolas Vilz 2006-07-28 1:43 ` Petr Baudis 1 sibling, 0 replies; 24+ messages in thread From: Nicolas Vilz @ 2006-07-26 23:25 UTC (permalink / raw) To: Jon Smirl; +Cc: git [-- Attachment #1: Type: text/plain, Size: 509 bytes --] On Wed, Jul 26, 2006 at 07:03:30PM -0400, Jon Smirl wrote: > This is a simpler sequence > > cg clone git foo > cg clone git foo1 > cd foo > mkdir zzz > git mv gitweb zzz > cg diff >patch > cg ../foo1 > cg patch <../foo/patch > > Fails with these errors. We have determined that git apply patch is ok > and this is a bug in cg patch. Well, perhaps i should react faster and I shouldn't pause my fetchmail for 2 or 3 hours... this is bad for this list :) You are kind of fast :) Nicolas [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Moving a directory into another fails 2006-07-26 23:03 ` Jon Smirl 2006-07-26 23:25 ` Nicolas Vilz @ 2006-07-28 1:43 ` Petr Baudis 2006-12-04 18:19 ` Stefan Pfetzing 1 sibling, 1 reply; 24+ messages in thread From: Petr Baudis @ 2006-07-28 1:43 UTC (permalink / raw) To: Jon Smirl; +Cc: Nicolas Vilz, git Dear diary, on Thu, Jul 27, 2006 at 01:03:30AM CEST, I got a letter where Jon Smirl <jonsmirl@gmail.com> said that... > This is a simpler sequence > > cg clone git foo > cg clone git foo1 > cd foo > mkdir zzz > git mv gitweb zzz > cg diff >patch > cg ../foo1 > cg patch <../foo/patch Even simpler one: mkdir zzz cg-mv gitweb zzz cg-diff | cg-patch -R (which would even undo the mess supposing that it worked properly) > [jonsmirl@jonsmirl foo1]$ cg patch <../foo/patch > mv: cannot move `gitweb/README' to `zzz/gitweb/README': No such file > or directory Oops. Thanks, fixed with this: diff --git a/cg-patch b/cg-patch index cc82f1f..923df0e 100755 --- a/cg-patch +++ b/cg-patch @@ -145,6 +145,8 @@ redzone_border() echo "$file1: rename destination $file2 already exists, NOT RENAMING" >&2 return fi + # FIXME: Remove stale empty directories related to $mvfrom + case $mvto in */*) mkdir -p "${mvto%/*}";; esac mv "$mvfrom" "$mvto" fi if [ "$op" = "delete" -o "$op" = "rename" ]; then > mv: cannot stat `"gitweb/test/M\\303\\244rchen"': No such file or directory Junio, how am I supposed to unmangle this *censored* stuff? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Snow falling on Perl. White noise covering line noise. Hides all the bugs too. -- J. Putnam ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-07-28 1:43 ` Petr Baudis @ 2006-12-04 18:19 ` Stefan Pfetzing 2006-12-04 18:56 ` Jakub Narebski 0 siblings, 1 reply; 24+ messages in thread From: Stefan Pfetzing @ 2006-12-04 18:19 UTC (permalink / raw) To: git Hi Folks, 2006/7/28, Petr Baudis <pasky@suse.cz>: > > mv: cannot stat `"gitweb/test/M\\303\\244rchen"': No such file or directory since when is this file in the official git.git tree? Its quite problematic when used on HFS+ because it uses UTF-16 internally IMHO. Git always thinks there is a new file in my git.git clone. --- snip --- dreamind@paris:~/src/git% git status # Untracked files: # (use "git add" to add to commit) # # gitweb/test/MaÌrchen nothing to commit --- snap --- bye dreamind -- http://www.dreamind.de/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 18:19 ` Stefan Pfetzing @ 2006-12-04 18:56 ` Jakub Narebski 2006-12-04 19:03 ` Johannes Schindelin 0 siblings, 1 reply; 24+ messages in thread From: Jakub Narebski @ 2006-12-04 18:56 UTC (permalink / raw) To: git Stefan Pfetzing wrote: > 2006/7/28, Petr Baudis <pasky@suse.cz>: >>> >>> mv: cannot stat `"gitweb/test/M\\303\\244rchen"': No such file or directory >>> > since when is this file in the official git.git tree? Since merging in gitweb, Sat Jun 10, 2006. > Its quite problematic when used on HFS+ because it uses UTF-16 internally IMHO. > > Git always thinks there is a new file in my git.git clone. That is the problem that git tries to be content agnostict, and it includes being coding agnostic. I personally think that because the same repository might be deployed on different systems with different file name encoding (and this is not something you have control over, contrary to commit/tag message encoding, and encoding in files), git should acquire core.filesystemEncoding configuration variable which would encode from filesystem encoding used in working directory and perhaps index to UTF-8 encoding used in repository (in tree objects) and perhaps index. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 18:56 ` Jakub Narebski @ 2006-12-04 19:03 ` Johannes Schindelin 2006-12-04 19:10 ` Jakub Narebski 0 siblings, 1 reply; 24+ messages in thread From: Johannes Schindelin @ 2006-12-04 19:03 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Hi, On Mon, 4 Dec 2006, Jakub Narebski wrote: > [...] git should acquire core.filesystemEncoding configuration variable > which would encode from filesystem encoding used in working directory > and perhaps index to UTF-8 encoding used in repository (in tree objects) > and perhaps index. So, you want to pull in all thinkable encodings? Of course, you could rely on libiconv, adding yet another dependency to git. (Yes, I know, mailinfo uses it already. But I never use mailinfo, so I do not need libiconv.) Ciao, Dscho ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 19:03 ` Johannes Schindelin @ 2006-12-04 19:10 ` Jakub Narebski 2006-12-04 19:10 ` Johannes Schindelin 0 siblings, 1 reply; 24+ messages in thread From: Jakub Narebski @ 2006-12-04 19:10 UTC (permalink / raw) To: git Johannes Schindelin wrote: > On Mon, 4 Dec 2006, Jakub Narebski wrote: > >> [...] git should acquire core.filesystemEncoding configuration variable >> which would encode from filesystem encoding used in working directory >> and perhaps index to UTF-8 encoding used in repository (in tree objects) >> and perhaps index. > > So, you want to pull in all thinkable encodings? Of course, you could rely > on libiconv, adding yet another dependency to git. (Yes, I know, mailinfo > uses it already. But I never use mailinfo, so I do not need libiconv.) A conditional dependency. If you don't have libiconv, this feature wouldn't be available. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 19:10 ` Jakub Narebski @ 2006-12-04 19:10 ` Johannes Schindelin 2006-12-04 19:37 ` Jakub Narebski 2006-12-04 20:26 ` Linus Torvalds 0 siblings, 2 replies; 24+ messages in thread From: Johannes Schindelin @ 2006-12-04 19:10 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Hi, On Mon, 4 Dec 2006, Jakub Narebski wrote: > Johannes Schindelin wrote: > > > On Mon, 4 Dec 2006, Jakub Narebski wrote: > > > >> [...] git should acquire core.filesystemEncoding configuration variable > >> which would encode from filesystem encoding used in working directory > >> and perhaps index to UTF-8 encoding used in repository (in tree objects) > >> and perhaps index. > > > > So, you want to pull in all thinkable encodings? Of course, you could rely > > on libiconv, adding yet another dependency to git. (Yes, I know, mailinfo > > uses it already. But I never use mailinfo, so I do not need libiconv.) > > A conditional dependency. If you don't have libiconv, this feature wouldn't > be available. You are speaking as somebody compiling git from source. We are a minority. Ciao, Dscho ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 19:10 ` Johannes Schindelin @ 2006-12-04 19:37 ` Jakub Narebski [not found] ` <7617FA7E-D49A-4A4C-B033-C2CB20623F5F@wf227.com> 2006-12-04 20:26 ` Linus Torvalds 1 sibling, 1 reply; 24+ messages in thread From: Jakub Narebski @ 2006-12-04 19:37 UTC (permalink / raw) To: git Johannes Schindelin wrote: > On Mon, 4 Dec 2006, Jakub Narebski wrote: > >> Johannes Schindelin wrote: >> >>> On Mon, 4 Dec 2006, Jakub Narebski wrote: >>> >>>> [...] git should acquire core.filesystemEncoding configuration variable >>>> which would encode from filesystem encoding used in working directory >>>> and perhaps index to UTF-8 encoding used in repository (in tree objects) >>>> and perhaps index. >>> >>> So, you want to pull in all thinkable encodings? Of course, you could rely >>> on libiconv, adding yet another dependency to git. (Yes, I know, mailinfo >>> uses it already. But I never use mailinfo, so I do not need libiconv.) >> >> A conditional dependency. If you don't have libiconv, this feature wouldn't >> be available. > > You are speaking as somebody compiling git from source. We are a minority. Usually iconv is in libc. # Define NEEDS_LIBICONV if linking with libc is not enough (Darwin). Hmm... perhaps not that usually. The uname based configuration in Makefile (not the test based configuration provided by autoconf generated ./configure script) sets NEEDS_LIBICONV for: Darwin, SunOS 5.8, Cygwin, FreeBSD and OpenBSD, some versions of NetBSD, AIX. And HFS+ is on MacOS X / Darwin, without iconv in libc... -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <7617FA7E-D49A-4A4C-B033-C2CB20623F5F@wf227.com>]
* Re: Moving a directory into another fails [not found] ` <7617FA7E-D49A-4A4C-B033-C2CB20623F5F@wf227.com> @ 2006-12-04 21:01 ` Johannes Schindelin 0 siblings, 0 replies; 24+ messages in thread From: Johannes Schindelin @ 2006-12-04 21:01 UTC (permalink / raw) To: Wolfgang Fischer; +Cc: Jakub Narebski, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 1855 bytes --] Hi, thank you, Wolfgang, for back Cc'ing me (Jakub never does that...), but I am Cc'ing the git list here, also. Hope both of you don't mind. On Mon, 4 Dec 2006, Wolfgang Fischer wrote: > On 04.12.2006, at 20:37, Jakub Narebski wrote: > > > And HFS+ is on MacOS X / Darwin, without iconv in libc... > > And, what is even worse, is the fact that HFS+ uses an encoding, which is not > represented in libiconv. > > If you CREATE a file, you can use UTF8-NFC > (Normalization-Form-Composed), but if you later READDIR a directory, you > will get the very same name back in the encoding used by HFS+, which is > UTF8-NFD Normalization-Form-Decomposed. The difference is noticeable for > some non-ASCII characters like e.g. > > LATIN SMALL LETTER A WITH DIAERESIS U+00E4 or U+0061 U+0308 in Unicode. > > If you need a sane backward mapping, one has to use some CoreFoundation > interface, for which I removed the details out of my brain, in order to > reclaim that memory area (garbage collection!). But I can help you with > some details and probably code, if you really need that conversion > direction. Yes. When my iBook was still alive, I saw that problem, too: writing and reading filenames were completely different issues. Worse, you can experience the same on USB-Sticks when accessing them with different OSes. For example, when checking out a git repo on a stick with Linux, and then calling git-status on the same stick with Windows XP, you see an issue with the file "Märchen", like you did on MacOSX. So, please, please, please do not try to be smart about filename encodings in git, but just DO NOT USE ANYTHING BUT ASCII IN FILENAMES IF THE REPOSITORY IS GOING TO BE PUT ON DIFFERENT OPERATING SYSTEMS/FILE SYSTEMS. (Wow, the Caps Lock key is _not_ dead after all. I must have been infected by Linus...) Ciao, Dscho ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 19:10 ` Johannes Schindelin 2006-12-04 19:37 ` Jakub Narebski @ 2006-12-04 20:26 ` Linus Torvalds 2006-12-04 20:51 ` Linus Torvalds ` (3 more replies) 1 sibling, 4 replies; 24+ messages in thread From: Linus Torvalds @ 2006-12-04 20:26 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jakub Narebski, git On Mon, 4 Dec 2006, Johannes Schindelin wrote: > > On Mon, 4 Dec 2006, Jakub Narebski wrote: > > > Johannes Schindelin wrote: > > > > > On Mon, 4 Dec 2006, Jakub Narebski wrote: > > > > > >> [...] git should acquire core.filesystemEncoding configuration variable > > >> which would encode from filesystem encoding used in working directory > > >> and perhaps index to UTF-8 encoding used in repository (in tree objects) > > >> and perhaps index. > > > > > > So, you want to pull in all thinkable encodings? Of course, you could rely > > > on libiconv, adding yet another dependency to git. (Yes, I know, mailinfo > > > uses it already. But I never use mailinfo, so I do not need libiconv.) > > > > A conditional dependency. If you don't have libiconv, this feature wouldn't > > be available. > > You are speaking as somebody compiling git from source. We are a minority. You guys are ignoring the _real_ problem. It has nothing at all to do with dependencies on external packages. The REAL problem is that if you do locale-dependent trees and other git objects, git will STOP WORKING. A filename in a tree object _has_ to be see as a pure 8-bit character stream. They _have_ to be compared with "memcmp()", and they have to sort the same way and mean EXACTLY the same thing for everybody. If a filesystem cannot represent that name AS THAT BYTE SEQUENCE then the filesystem is broken. No ifs, buts, maybes about it. I'm sorry, but that's how it is. This is _exactly_ the same issue as case independence. Git does not ignore case, and it really CANNOT ignore case. Ignoring case would cause horrible and deep problems, and it has nothing to do with dependencies on libraries (although it _would_ get much much worse from locale settings, and again having different locales compare the same name differently because case rules are different). So it really boils down to one one: git saves a byte stream. Not text. This is true for all levels of the git archive. It's true for blob content, it's true for filenames in trees, and it is true for commits. The commit message is actually somewhat easier (because we have nothing to "compare" it to afterwards in the checked-out tree), so the commit message is the _one_ thing we can kind of play games with, but even there, once it's done, it's done, and it's just a stream of bytes. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 20:26 ` Linus Torvalds @ 2006-12-04 20:51 ` Linus Torvalds 2006-12-04 20:54 ` Shawn Pearce ` (2 subsequent siblings) 3 siblings, 0 replies; 24+ messages in thread From: Linus Torvalds @ 2006-12-04 20:51 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jakub Narebski, git On Mon, 4 Dec 2006, Linus Torvalds wrote: > > If a filesystem cannot represent that name AS THAT BYTE SEQUENCE then the > filesystem is broken. No ifs, buts, maybes about it. I'm sorry, but that's > how it is. Btw, what this means in practice is that when git creates a file with a certain sequence of bytes, then (a) readdir had better return _that_ sequence of bytes, or git will see it as somethign else. (b) opening it with that same sequence of bytes had better work. This does not mean that a filesystem may not internally use some other encoding. It just means that if the filesystem - when converting back and forth between the internal encoding and the one it shows to user space - had better convert back to the exact same thing. Also, note that for most projects, even a broken filesystem doesn't actually matter - it's enough that the filesystem gets the conversions right for the particular set of names in a particular project. So any project that just has 7-bit filenames will obviously never even see any issues at all, even if the filesystem it runs on then does something strange with 8-bit filenames. This is one reason why UNIX's "everything is a stream of bytes" is so important, and whyprograms should generally work with byte streams, not "wide strings" or similar. It's the only way that you can reliably work across different locales. Use wide strings and locale-specific stuff _only_ for actually showing users something on the tty, for example. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 20:26 ` Linus Torvalds 2006-12-04 20:51 ` Linus Torvalds @ 2006-12-04 20:54 ` Shawn Pearce 2006-12-04 20:56 ` Jakub Narebski 2006-12-04 21:05 ` Johannes Schindelin 3 siblings, 0 replies; 24+ messages in thread From: Shawn Pearce @ 2006-12-04 20:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, Jakub Narebski, git Linus Torvalds <torvalds@osdl.org> wrote: > You guys are ignoring the _real_ problem. > > It has nothing at all to do with dependencies on external packages. The > REAL problem is that if you do locale-dependent trees and other git > objects, git will STOP WORKING. Yes! In jgit I assumed all tree entry names were encoded in UTF8. Then I later learned they aren't. Foolish me. As Linus points out its a HUGE problem that the caller of git-write-tree gets to decide what encoding should be used for that tree. Especially if someone else wants to use a different encoding for the same filename (think ISO-8859-1 vs. UTF-8)! I'd rather just force the tree entry names to be encoded in UTF-8 always, as its compact for most western texts (which many filenames are), and at least degrades to supporting the non western texts. A per-project setting is essentially impossible as we have no such concept today, and a per-repository setting (like i18n.commitEncoding) lets two different users encode the same filename differently, which means two different tree SHA1s with the exact same content... not correct! > This is true for all levels of the git archive. It's true for blob > content, it's true for filenames in trees, and it is true for commits. The > commit message is actually somewhat easier (because we have nothing to > "compare" it to afterwards in the checked-out tree), so the commit message > is the _one_ thing we can kind of play games with, but even there, once > it's done, it's done, and it's just a stream of bytes. Commit encoding is a problem. Clearly the "header parts" (tree, parent) are US-ASCII but the author and committer lines can be anything. So can the body. And we have no way of knowing what encoding was used years later, we can only guess and display it wrong. We really should either normalize all commit messages to a single encoding (again, UTF-8) or embed the encoding as part of the headers somehow (e.g. look at how XML embeds the document encoding in the start of the document). -- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Moving a directory into another fails 2006-12-04 20:26 ` Linus Torvalds 2006-12-04 20:51 ` Linus Torvalds 2006-12-04 20:54 ` Shawn Pearce @ 2006-12-04 20:56 ` Jakub Narebski 2006-12-04 21:05 ` Johannes Schindelin 3 siblings, 0 replies; 24+ messages in thread From: Jakub Narebski @ 2006-12-04 20:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git Dnia poniedziałek 4. grudnia 2006 21:26, Linus Torvalds napisał: >>>> On Mon, 4 Dec 2006, Jakub Narebski wrote: >>>> >>>>> [...] git should acquire core.filesystemEncoding configuration variable >>>>> which would encode from filesystem encoding used in working directory >>>>> and perhaps index to UTF-8 encoding used in repository (in tree objects) >>>>> and perhaps index. > You guys are ignoring the _real_ problem. > > It has nothing at all to do with dependencies on external packages. The > REAL problem is that if you do locale-dependent trees and other git > objects, git will STOP WORKING. > > A filename in a tree object _has_ to be see as a pure 8-bit character > stream. They _have_ to be compared with "memcmp()", and they have to sort > the same way and mean EXACTLY the same thing for everybody. What I propose is having filename in tree object UTF-8 encoded. I don't know if git relies heavily that filename encoding on filesystem (in working area) is the same as in the index, is the same as in a tree object. Although I'm not sure what is the problem. You checkout non US-ASCII filename out of git; the file can have strange characters in a name, but should encode to the filename as is in git. The problem migh be some forbidden by filesystem characters in a filename perhaps. Although Wolfgang Fischer wrote (to me and Johannes Schindelin) that HFS+ uses UTF8-NFC (Normalization-Form-Composed) when creating a file, while readdir returns encoding used by HFS+, which is UTF8-NFD (Normalization-Form- Decomposed). [Explitive censored] > If a filesystem cannot represent that name AS THAT BYTE SEQUENCE then the > filesystem is broken. No ifs, buts, maybes about it. I'm sorry, but that's > how it is. We have some configuration variables to work around broken filesystems, like core.ignoreStat, so why not core.filesystemEncoding. -- Jakub Narebski ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 20:26 ` Linus Torvalds ` (2 preceding siblings ...) 2006-12-04 20:56 ` Jakub Narebski @ 2006-12-04 21:05 ` Johannes Schindelin 2006-12-04 21:23 ` Linus Torvalds 3 siblings, 1 reply; 24+ messages in thread From: Johannes Schindelin @ 2006-12-04 21:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Hi, On Mon, 4 Dec 2006, Linus Torvalds wrote: > On Mon, 4 Dec 2006, Johannes Schindelin wrote: > > > > On Mon, 4 Dec 2006, Jakub Narebski wrote: > > > > > Johannes Schindelin wrote: > > > > > > > On Mon, 4 Dec 2006, Jakub Narebski wrote: > > > > > > > >> [...] git should acquire core.filesystemEncoding configuration variable > > > >> which would encode from filesystem encoding used in working directory > > > >> and perhaps index to UTF-8 encoding used in repository (in tree objects) > > > >> and perhaps index. > > > > > > > > So, you want to pull in all thinkable encodings? Of course, you could rely > > > > on libiconv, adding yet another dependency to git. (Yes, I know, mailinfo > > > > uses it already. But I never use mailinfo, so I do not need libiconv.) > > > > > > A conditional dependency. If you don't have libiconv, this feature wouldn't > > > be available. > > > > You are speaking as somebody compiling git from source. We are a minority. > > You guys are ignoring the _real_ problem. > > It has nothing at all to do with dependencies on external packages. The > REAL problem is that if you do locale-dependent trees and other git > objects, git will STOP WORKING. The issue was _not_ locale-dependent trees, but file systems which _change_ the encoding. And even then, Jakub's proposition reencoding could work, because it is an _encoding_ after all, i.e. bijective (reversable mapping for you non-Math guys). Not at all comparable to cases insensitivity, which _loses_ information. But for reasons described in another mail, there are more fundamental problems with encodings, especially with MacOSX which (braindeadly) encodes _differently_ when writing and reading. So, we reach the same conclusion, but for different reasons. Ciao, ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 21:05 ` Johannes Schindelin @ 2006-12-04 21:23 ` Linus Torvalds 2006-12-05 7:34 ` Johannes Schindelin 0 siblings, 1 reply; 24+ messages in thread From: Linus Torvalds @ 2006-12-04 21:23 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jakub Narebski, git On Mon, 4 Dec 2006, Johannes Schindelin wrote: > > The issue was _not_ locale-dependent trees, but file systems which > _change_ the encoding. Correct. However, it doesn't really change the issue: some byte streams may simply not work in certain encodings. You could, of course, basically do some kind of "escape high characters" on the filename if it has characters in it that you suspect might cause problems, but you'd better make 100% sure that it really is 100% reversible (and you need to do all the real operations on the _native_git_ version of the filename). So we _could_ use a flag that says "escape all filenames", but it would not be a _locale_ setting, it would really be a per-repository setting, and it wouldn't be "iconv", it would be something similar to what we do for "git diff" when we escape filenames with strange characters in them. We could do it by changing ever "open()/creat()" and "[l]stat()" on the working tree with somethign that first escapes the filename. Then, people with broken filesystems could set [core] escapefilenames = true and instead of seeing 8-bit filenames, they'd see filenames with 7 bits and escapes. They could work with such a repo, for sure. It would be ugly as hell, though. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-04 21:23 ` Linus Torvalds @ 2006-12-05 7:34 ` Johannes Schindelin 2006-12-05 9:36 ` Jakub Narebski 2006-12-05 17:11 ` Linus Torvalds 0 siblings, 2 replies; 24+ messages in thread From: Johannes Schindelin @ 2006-12-05 7:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Hi, On Mon, 4 Dec 2006, Linus Torvalds wrote: > [core] > escapefilenames = true I think this goes too far. The problem _only_ showed up with a made-up test case for gitweb. Let's bite the apple when we _have_ to (which I doubt will happen, because for the most part, developers understand that spaces and umlauts have _no_ place in filenames, basically since UNIX was invented by stupid US Americans who did not know anything about nice filenames, let alone other languages than English and C). Ciao, Dscho ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Moving a directory into another fails 2006-12-05 7:34 ` Johannes Schindelin @ 2006-12-05 9:36 ` Jakub Narebski 2006-12-05 14:11 ` filesystem encodings and gitweb tests, was " Johannes Schindelin 2006-12-05 17:11 ` Linus Torvalds 1 sibling, 1 reply; 24+ messages in thread From: Jakub Narebski @ 2006-12-05 9:36 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, git Johannes Schindelin wrote: > On Mon, 4 Dec 2006, Linus Torvalds wrote: > >> [core] >> escapefilenames = true > > I think this goes too far. The problem _only_ showed up with a made-up > test case for gitweb. Let's bite the apple when we _have_ to (which I > doubt will happen, because for the most part, developers understand that > spaces and umlauts have _no_ place in filenames, basically since UNIX was > invented by stupid US Americans who did not know anything about nice > filenames, let alone other languages than English and C). No, the problem showed with stupid HFS+ which uses different encoding for creating file, and different for readdir. Perhaps we should remove gitweb/test directory, and move testing gitweb to proper place, t/ directory. By the way, would it be correct to use external tools (if they exist), namely HTMLtidy in gitweb output test to-be-written? -- Jakub Narebski ^ permalink raw reply [flat|nested] 24+ messages in thread
* filesystem encodings and gitweb tests, was Re: Moving a directory into another fails 2006-12-05 9:36 ` Jakub Narebski @ 2006-12-05 14:11 ` Johannes Schindelin 2006-12-05 14:29 ` Jakub Narebski 0 siblings, 1 reply; 24+ messages in thread From: Johannes Schindelin @ 2006-12-05 14:11 UTC (permalink / raw) To: Jakub Narebski; +Cc: Linus Torvalds, git Hi, On Tue, 5 Dec 2006, Jakub Narebski wrote: > Johannes Schindelin wrote: > > > On Mon, 4 Dec 2006, Linus Torvalds wrote: > > > >> [core] > >> escapefilenames = true > > > > I think this goes too far. The problem _only_ showed up with a made-up > > test case for gitweb. Let's bite the apple when we _have_ to (which I > > doubt will happen, because for the most part, developers understand that > > spaces and umlauts have _no_ place in filenames, basically since UNIX was > > invented by stupid US Americans who did not know anything about nice > > filenames, let alone other languages than English and C). > > No, the problem showed with stupid HFS+ which uses different encoding > for creating file, and different for readdir. This is just one of the problems. I described another problem in this thread, namely a repo on a usb stick being accessed from different hosts. > Perhaps we should remove gitweb/test directory, and move testing gitweb > to proper place, t/ directory. If you do that, please make sure that these tests can be disabled (a la svn tests), so that people not being interested in gitweb, or lacking the programs to test it, do not have to suffer. > By the way, would it be correct to use external tools (if they exist), > namely HTMLtidy in gitweb output test to-be-written? (Yes, they exist. HTMLtidy for example ;-) IMHO if such a tool is common enough, you should use it. If anybody steps forward providing automated HTTP tests, I will not complain, and certainly not about testing with things like HTMLtidy. Ciao, Dscho ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: filesystem encodings and gitweb tests, was Re: Moving a directory into another fails 2006-12-05 14:11 ` filesystem encodings and gitweb tests, was " Johannes Schindelin @ 2006-12-05 14:29 ` Jakub Narebski 2006-12-05 14:40 ` Johannes Schindelin 0 siblings, 1 reply; 24+ messages in thread From: Jakub Narebski @ 2006-12-05 14:29 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, git Johannes Schindelin wrote: > On Tue, 5 Dec 2006, Jakub Narebski wrote: > >> No, the problem showed with stupid HFS+ which uses different encoding >> for creating file, and different for readdir. > > This is just one of the problems. I described another problem in this > thread, namely a repo on a usb stick being accessed from different hosts. That is not much a problem. Yes, the filenames on different hosts would _look_ different, but shouldn't be detected as new file. -- Jakub Narebski ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: filesystem encodings and gitweb tests, was Re: Moving a directory into another fails 2006-12-05 14:29 ` Jakub Narebski @ 2006-12-05 14:40 ` Johannes Schindelin 0 siblings, 0 replies; 24+ messages in thread From: Johannes Schindelin @ 2006-12-05 14:40 UTC (permalink / raw) To: Jakub Narebski; +Cc: Linus Torvalds, git Hi, On Tue, 5 Dec 2006, Jakub Narebski wrote: > Johannes Schindelin wrote: > > > On Tue, 5 Dec 2006, Jakub Narebski wrote: > > > >> No, the problem showed with stupid HFS+ which uses different encoding > >> for creating file, and different for readdir. > > > > This is just one of the problems. I described another problem in this > > thread, namely a repo on a usb stick being accessed from different hosts. > > That is not much a problem. Yes, the filenames on different hosts would > _look_ different, but shouldn't be detected as new file. Sorry, I should have been clearer. I meant a repo _and_ a working directory going along with it, on a USB stick. They do look different on different hosts, and git-status looks different as a consequence ;-) Ciao, Dscho ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Re: Moving a directory into another fails 2006-12-05 7:34 ` Johannes Schindelin 2006-12-05 9:36 ` Jakub Narebski @ 2006-12-05 17:11 ` Linus Torvalds 1 sibling, 0 replies; 24+ messages in thread From: Linus Torvalds @ 2006-12-05 17:11 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jakub Narebski, git On Tue, 5 Dec 2006, Johannes Schindelin wrote: > > On Mon, 4 Dec 2006, Linus Torvalds wrote: > > > [core] > > escapefilenames = true > > I think this goes too far. Sure., I agree that in _practice_ this isn't actually a problem, because people have long since learnt to avoid strange filenames in SCM's, simply because you can't get it right with insane filesystems. That said, it might be a good idea to abstract out the create/read phase for filenames in the working tree regardless, since that also tends to be an area where other issues can come up (whoops - '/' vs '\' as the directory separator etc). ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-12-05 17:11 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-26 15:00 Moving a directory into another fails Jon Smirl 2006-07-26 22:34 ` Nicolas Vilz 2006-07-26 23:03 ` Jon Smirl 2006-07-26 23:25 ` Nicolas Vilz 2006-07-28 1:43 ` Petr Baudis 2006-12-04 18:19 ` Stefan Pfetzing 2006-12-04 18:56 ` Jakub Narebski 2006-12-04 19:03 ` Johannes Schindelin 2006-12-04 19:10 ` Jakub Narebski 2006-12-04 19:10 ` Johannes Schindelin 2006-12-04 19:37 ` Jakub Narebski [not found] ` <7617FA7E-D49A-4A4C-B033-C2CB20623F5F@wf227.com> 2006-12-04 21:01 ` Johannes Schindelin 2006-12-04 20:26 ` Linus Torvalds 2006-12-04 20:51 ` Linus Torvalds 2006-12-04 20:54 ` Shawn Pearce 2006-12-04 20:56 ` Jakub Narebski 2006-12-04 21:05 ` Johannes Schindelin 2006-12-04 21:23 ` Linus Torvalds 2006-12-05 7:34 ` Johannes Schindelin 2006-12-05 9:36 ` Jakub Narebski 2006-12-05 14:11 ` filesystem encodings and gitweb tests, was " Johannes Schindelin 2006-12-05 14:29 ` Jakub Narebski 2006-12-05 14:40 ` Johannes Schindelin 2006-12-05 17:11 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).