From: Jakub Narebski <jnareb@gmail.com>
To: Martin Langhoff <martin.langhoff@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach)
Date: Tue, 28 Apr 2009 04:24:31 -0700 (PDT) [thread overview]
Message-ID: <m3r5zdnhqu.fsf@localhost.localdomain> (raw)
In-Reply-To: <46a038f90904270155i6c802fceoffc73eb5ab57130e@mail.gmail.com>
Martin Langhoff <martin.langhoff@gmail.com> writes:
> Eric Sink hs been working on the (commercial, proprietary) centralised
> SCM Vault for a while. He's written recently about his explorations
> around the new crop of DSCMs, and I think it's quite interesting. A
> quick search of the list archives makes me thing it wasn't discussed
> before.
>
> The guy is knowledgeable, and writes quite witty posts -- naturally,
> there's plenty to disagree on, but I'd like to encourage readers not
> to nitpick or focus on where Eric is wrong. It is interesting to read
> where he thinks git and other DSCMs are missing the mark.
>
> Maybe he's right, maybe he's wrong, but damn he's interesting :-)
>
> So here's the blog - http://www.ericsink.com/
"Here's a blog"... and therefore my dilemma. Should I post my reply
as a comment to this blog, or should I reply here on git mailing list?
> These are the best entry points
Because those two entries are quite different, I'll reply separately
1. "Ten Quirky Issues with Cross-Platform Version Control"
> http://www.ericsink.com/entries/quirky.html
which is generic comment about (mainly) using version control
in heterogenic environment, where different machines have different
filesystem limitations. I'll concentrate here on that issue.
2. "Mercurial, Subversion, and Wesley Snipes"
> http://www.ericsink.com/entries/hg_denzel.html
where, paraphrasing, Eric Sink says that he doesn't write about
Mercurial and Subversion because they are perfect. Or at least not
as controversial (and controversial means interesting).
>
> To be frank, I think he's wrong in some details (as he's admittedly
> only spent limited time with it) but right on the larger-picture
> (large userbases want it integrated and foolproof, bugtracking needs
> to go distributed alongside the code, git is as powerful^Wdangerous as
> C).
Neither of mentioned above blog posts touches those issues, BTW...
----------------------------------------------------------------------
Ad 1. "Ten Quirky Issues with Cross-Platform Version Control"
Actually those are two issues: troubles with different limitations of
different filesystems, and different dealing with line endings in text
files on different platforms.
Line endings (issue 8.) is in theory and in practice (at least for
Git) a non-issue.
In theory you should use project's convention for end of line
character in text files, and use smart editor that can deal (or can be
configured to deal) with this issue correctly.
In practice this is a matter of correctly setting up core.autocrlf
(and in more complicated cases, where more complicated means for git
very very rare, configuring which files are text and which are not).
There are a few classes of troubles with filesystems (with filenames).
1. Different limitations on file names (e.g. pathname length),
different special characters, different special filenames (if any).
Those are issues 2. (special basename PRN on MS Windows),
issue 3. (trailing dot, trailing whitespace), issue 4. (pathname
and filename length limit), issue 6. (special characters, in this
case colon being path element delimiter on MacOS, but it is also
about special characters like colon, asterisk and question mark
on MS Windows) and also issue 7. (name that begins with dash)
in Eric Sink article.
The answer is convention for filenames in a project. Simply DON'T
use filenames which can cause problems. There is no way to simply
solve this problem in version control system, although I think if
you really, really, really need it you should be able to cobble
something together using low-level git tools to have different name
for filename in working directory from the one used in repository
(and index).
See also David A. Wheeler essay "Fixing Unix/Linux/POSIX Filenames:
Control Characters (such as Newline), Leading Dashes, and Other Problems"
http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
DON'T DO THAT.
2. "Case-insensitive" but "case-preserving" filesystems; the case
where some different filenames are equivalent (like 'README' and
'readme' on case-insensitive filesystem), but are returned as you
created them (so if you created 'README', you would get 'README' in
directory listing, but filesystem would return that 'readme' exists
too). This is issue 1. ('README' and 'readme' in the same
directory) in Eric Sink article.
The answer is like for previous issue: don't. Simply DO NOT create
files with filenames which differ only in case (like unfortunate
ct_conntrack.h and cn_CONNTRACK.h or similar in Linux kernel).
But I think that even in case where such unfortunate incident (two
filenames differing only in case) occur, you can deal with it in
Git by using lower level tools (and editing only one of two such
files at once). You would get spurious info about modified files
in git-status, though... perhaps that could be improved using
infrastructure created (IIRC) by Linus for dealing with 'insane'
filesystems.
DON'T DO THAT, SOLVABLE.
3. Non "Case-preserving" filesystems, where filename as sequence of
bytes differ between what you created, and what you get from
filesystem. An example here is MacOS X filesystem, which accepts
filenames in NFC composed normalized form of Unicode, but stores
them internally and returns them in NFD decomposed form. This is
issue 9. (Español being "Espa\u00f1ol" in NFC, but "Espan\u0303ol"
in NFD).
In this case 'don't do this' might be not acceptable answer.
Perhaps you need non-ASCII characters in filenames. Not always can
you use filesystem or specify mount point option that makes it not
a problem.
I remember that this issue was discussed extensively on git mailing
list, but I don't remember what was the conclusion (beside agreeing
that filesystem that is not "*-preserving" is not sane filesystem ;).
In particular I do not remember if Git can deal with this issue
sanely (I remember Linus adding infrastructure for that, but did it
solve this problem...).
PROBABLY SOLVED.
4. Filesystems which cannot store all SCM-sane metainfo, for example
filesystems without support for symbolic links, or without support
for executable permission (executable bit). This is extension of
issue 10. (which is limited to symbolic links) in Eric Sink
article.
In Git you have core.fileMode to ignore executable bit differences
(you would need to use SCM tools and not filesystem tools to
maniulate it), and core.symlinks to be able to checkout symlinks as
plain text files (again using SCM tools to manipulate).
SOLVED.
There is also mistaken implicit assumption that version control
systems have (and should) preserve all metadata.
5. The issue of extra metadata that is not SCM-sane, and which
different filesystems can or cannot store. Examples include full
Unix permissions, Unix ownership (and groups file belongs to),
other permission-related metadata such as ACL, extra resources tied
to file such as EA (extended attributes) for some Linux filesystems
or (in)famous resource form in MacOS. This is issue 5. (resource
fork on MacOS vs. xattrs on Linux) in Eric Sink article.
This is not an issue for SCM: _source_ code management system
to solve. Preserving extra metadata indiscrimitedly can cause
problems, like e.g. full permissions and ownership. Therefore
SCM preserve only limited SCM-sane subset of metadata. If you
need to preserve extra metadata, you can use (in good SCMs) hooks
for that, like e.g. etckeeper uses metastore (in Git).
NOT A PROBLEM.
--
Jakub Narebski
Poland
ShadeHawk on #git
next prev parent reply other threads:[~2009-04-28 11:24 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-27 8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff
2009-04-28 11:24 ` Jakub Narebski [this message]
2009-04-28 21:00 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Robin Rosenberg
2009-04-29 6:55 ` Martin Langhoff
2009-04-29 7:21 ` Jeff King
2009-04-29 20:05 ` Markus Heidelberg
2009-04-29 7:52 ` Cross-Platform Version Control Jakub Narebski
2009-04-29 8:25 ` Martin Langhoff
2009-04-28 18:16 ` Eric Sink's blog - notes on git, dscms and a "whole product" approach Jakub Narebski
2009-04-29 7:54 ` Sitaram Chamarty
2009-04-30 12:17 ` Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-30 12:56 ` Michael Witten
2009-04-30 15:28 ` Why Git is so fast Jakub Narebski
2009-04-30 18:52 ` Shawn O. Pearce
2009-04-30 20:36 ` Kjetil Barvik
2009-04-30 20:40 ` Shawn O. Pearce
2009-04-30 21:36 ` Kjetil Barvik
2009-05-01 0:23 ` Steven Noonan
2009-05-01 1:25 ` James Pickens
2009-05-01 9:19 ` Kjetil Barvik
2009-05-01 9:34 ` Mike Hommey
2009-05-01 9:42 ` Kjetil Barvik
2009-05-01 17:42 ` Tony Finch
2009-05-01 5:24 ` Dmitry Potapov
2009-05-01 9:42 ` Mike Hommey
2009-05-01 10:46 ` Dmitry Potapov
2009-04-30 18:43 ` Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Shawn O. Pearce
2009-04-30 14:22 ` Jeff King
2009-05-01 18:43 ` Linus Torvalds
2009-05-01 19:08 ` Jeff King
2009-05-01 19:13 ` david
2009-05-01 19:32 ` Nicolas Pitre
2009-05-01 21:17 ` Daniel Barkalow
2009-05-01 21:37 ` Linus Torvalds
2009-05-01 22:11 ` david
2009-04-30 18:56 ` Nicolas Pitre
2009-04-30 19:16 ` Alex Riesen
2009-05-04 8:01 ` Why Git is so fast Andreas Ericsson
2009-04-30 19:33 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3r5zdnhqu.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).