From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jeff King <peff@peff.net>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Esko Luontola <esko.luontola@gmail.com>,
git@vger.kernel.org
Subject: Re: Cross-Platform Version Control
Date: Wed, 13 May 2009 09:26:19 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0905130915540.3343@localhost.localdomain> (raw)
In-Reply-To: <20090512161638.GB29566@coredump.intra.peff.net>
On Tue, 12 May 2009, Jeff King wrote:
>
> Or they use a single encoding like utf8 so that there are no surprises.
> You can still run into normalization problems with filenames on some
> filesystems, though. Linus's name_hash code sets up the framework to
> handle "these two names are actually equivalent", but right now I think
> there is just code for handling case-sensitivity, not utf8 normalization
> (but I just skimmed the code, so I might be wrong).
utf-8 normalization was one goal, and shouldn't be _that_ hard to do. But
quite frankly, the index is only part of it, and probably not the worst
part.
The real pain of filename handling is all the "read tree recursively with
readdir()" issues. Along with just an absolute sh*t-load of issues about
what to do when people ended up using different versions of the "same"
name in different branches.
There's also the issue that "cross-platform" really can be a pretty damn
big pain. What do you do for platforms that simply are pure shit? I
realize that OS X people have a hard time accepting it, but OS X
filesystems are generally total and utter crap - even more so than
Windows.
Yes, yes, you can tell OS X that case matters, but that's not the normal
case - and what do you do with projects that simply _do_ care about case.
The kernel is one such project.
Sure, you can "encode" the filenames on such broken filesystems in a way
that they'd be different - but that won't really help the project, since
makefiles etc won't work anyway.
So one reason I didn't bother with utf-8 is that the much more fundamental
issues are simply in plain old 7-bit US-ASCII.
That said, if the only issue is that you want to encode regular utf-8 in a
coherent way (and ignore the case issues), then we could probably do that
part fairly easily with a "convert_to_internal()" and
"convert_to_filename()" thing that acts very much like the CRLF conversion
(except on filenames, not data).
And yes, it's probably worth doing, since we'd need that for fuller case
support anyway.
It's just a fair amount of churn - not fundamentally _hard_, but not
trivial either. And it needs a _lot_ of care, and a fair amount of
testing that is probably hard to do on sane filesystems (ie the case where
the filesystem actually _changes_ the name is going to be hard to test on
anything sane).
Linus
next prev parent reply other threads:[~2009-05-13 16:29 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-12 15:06 Cross-Platform Version Control Esko Luontola
2009-05-12 15:14 ` Shawn O. Pearce
2009-05-12 16:13 ` Johannes Schindelin
2009-05-12 17:56 ` Esko Luontola
2009-05-12 20:38 ` Johannes Schindelin
2009-05-12 21:16 ` Esko Luontola
2009-05-13 0:23 ` Johannes Schindelin
2009-05-13 5:34 ` Esko Luontola
2009-05-13 6:49 ` Alex Riesen
2009-05-13 10:15 ` Johannes Schindelin
[not found] ` <43d8ce650905130340q596043d5g45b342b62fe20e8d@mail.gmail.com>
2009-05-13 10:41 ` John Tapsell
2009-05-13 13:42 ` Jay Soffian
2009-05-13 13:44 ` Alex Riesen
2009-05-13 13:50 ` Jay Soffian
2009-05-13 13:57 ` John Tapsell
2009-05-13 15:27 ` Nicolas Pitre
2009-05-13 16:22 ` Johannes Schindelin
2009-05-13 17:24 ` Andreas Ericsson
2009-05-14 1:49 ` Miles Bader
2009-05-12 16:16 ` Jeff King
2009-05-12 16:57 ` Johannes Schindelin
2009-05-13 16:26 ` Linus Torvalds [this message]
2009-05-13 17:12 ` Linus Torvalds
2009-05-13 17:31 ` Andreas Ericsson
2009-05-13 17:46 ` Linus Torvalds
2009-05-13 18:26 ` Martin Langhoff
2009-05-13 18:37 ` Linus Torvalds
2009-05-13 21:04 ` Theodore Tso
2009-05-13 21:20 ` Linus Torvalds
2009-05-13 21:08 ` Daniel Barkalow
2009-05-13 21:29 ` Linus Torvalds
2009-05-13 20:57 ` Matthias Andree
2009-05-13 21:10 ` Linus Torvalds
2009-05-13 21:30 ` Jay Soffian
2009-05-13 21:47 ` Matthias Andree
2009-05-12 18:28 ` Dmitry Potapov
2009-05-12 18:40 ` Martin Langhoff
2009-05-12 18:55 ` Jakub Narebski
2009-05-12 21:43 ` [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames Heiko Voigt
2009-05-12 21:55 ` Jakub Narebski
2009-05-14 17:59 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt
2009-05-15 10:52 ` Martin Langhoff
2009-05-18 9:37 ` Heiko Voigt
2009-05-18 22:26 ` Jakub Narebski
2009-06-20 12:14 ` [RFC PATCH] check for filenames that only differ in case to sample pre-commit hook Heiko Voigt
2009-05-15 14:57 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Jakub Narebski
2009-05-18 9:50 ` [PATCH] " Heiko Voigt
2009-05-18 10:40 ` Johannes Sixt
2009-05-18 11:50 ` Heiko Voigt
2009-05-18 12:04 ` Johannes Sixt
2009-05-19 20:01 ` [PATCH v4] " Heiko Voigt
2009-05-18 14:42 ` [PATCH] " Junio C Hamano
2009-05-18 20:35 ` Julian Phillips
2009-05-15 18:11 ` [PATCH v2] " Junio C Hamano
2009-05-14 13:48 ` Cross-Platform Version Control Peter Krefting
2009-05-14 19:58 ` Esko Luontola
2009-05-14 20:21 ` Andreas Ericsson
2009-05-14 22:25 ` Johannes Schindelin
2009-05-15 11:18 ` Dmitry Potapov
-- strict thread matches above, loose matches on Subject: below --
2009-04-27 8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff
2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-29 6:55 ` Martin Langhoff
2009-04-29 7:52 ` Cross-Platform Version Control Jakub Narebski
2009-04-29 8:25 ` Martin Langhoff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0905130915540.3343@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=esko.luontola@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).