Git at Better SCM Initiative comparison of VCS (long)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Git at Better SCM Initiative comparison of VCS (long)
@ 2008-09-13 17:06 Jakub Narebski
  2008-09-14 14:43 ` Dmitry Potapov
  2008-10-01 18:45 ` Jakub Narebski
  0 siblings, 2 replies; 9+ messages in thread
From: Jakub Narebski @ 2008-09-13 17:06 UTC (permalink / raw)
  To: git

I have tried a few times to add information about Git to comparison
table of SCMs at 'Better SCM Initiative' (http://better-scm.berlios.de)

  http://thread.gmane.org/gmane.comp.version-control.git/66445
  http://thread.gmane.org/gmane.comp.version-control.git/67708

but somehow I didn't lead it to conclusion, namely adding Git to
the comparison table.  (Sidenote: Data from 'Better SCM : Comparison'
is used also for versioncontrolblog "Version control systems comparison"
at http://versioncontrolblog.com/comparison).

I have thought about trying yet another time... but Git was already
added; see http://better-scm.berlios.de/news/changes-2008-08-07/
Now I have checked information about Git and think that this table
needs a few corrections and, in some places, extra explanation.

Let us here come together with a version we can be happy with, which
I would be then able to send as correction for Better SCM Initiative
comparison (http://better-scm.berlios.de/contribute/).

Below there are excerpts from source of comparison table (from SVN)

  http://opensvn.csie.org/betterscm/better-scm-site/trunk/src/comparison/scm-comparison.xml

marked as quoted text (with 'scm>'), optionally un-indented and
re-wrapped for better readibility. My comments follow as if they
were replies to an email.

---
scm> <?xml version='1.0' encoding='utf-8'?>
scm> <?xml-stylesheet type="text/xml" href="compare-ml.xsl"?>
scm> <!DOCTYPE comparison SYSTEM "comparison.dtd">
scm> <!--
scm> TODO:
scm> 
scm> * Add intelligent merging of renamed paths.

The comparison has a new criterion: "Intelligent Merging after Moves
or Renames" since 2008-08-07, so the first item in this TODO list
should have been removed, I think.

scm> * Add IDE integration.
scm> * Integration with build/testing management.
scm> * Check-In policies.
scm> * Add Speed (?)
scm> -->

The problem of course with adding new criterion is that it should be
added for _all_ (currently 27) version control systems (SCMs) covered.

scm> <comparison>
scm>     <meta>
scm>         <implementations>

scm>             <impl id="git">
scm>                 <name>Git</name>
scm>                 <url>http://git.or.cz/</url>
scm>             </impl>

Hmmm... what to do about the fact that currently Git has _two_ forks of
a homepage: http://git.or.cz (aka http://git-scm.org) by Petr 'Pasky'
Baudis and new http://git-scm.com by Scott Chacon, I do wonder...

But those are just aimless musings... the above is O.K.

scm>         </implementations>
scm>         <timestamp>
scm>             $Id: scm-comparison.xml 322 2008-08-09 05:47:26Z shlomif $
scm>         </timestamp>
scm>     </meta>
scm>     <contents>
scm> <section id="main">
scm>     <title>Version Control System Comparison</title>
scm>     <expl>
scm>         This is a comparison of version-control systems. It is split
scm>         into several categories and sub-categories under which the 
scm>         systems are checked.
scm>     </expl>
scm>     <section id="repos_operations">
scm>         <title>Repository Operations</title>
scm>         <section id="atomic_commits">
scm>             <title>Atomic Commits</title>
scm>             <expl>
scm>                 Support for atomic commits means that if an
scm>                 operation on the repository is interrupted
scm>                 in the middle, the repository will not be
scm>                 left in an inconsistent state. Are the
scm>                 check-in operations atomic, or can
scm>                 interrupting an operation leave the
scm>                 repository in an intermediate state?
scm>             </expl>

Here I think the explanation of a criterion (feature) is clear enough.
I might have added that "interruption" include killing of a process
during for example commit, lack of disk space for a full commit, or
a network fail during network operation (fetch or push, or equivalent).

scm>             <compare>
scm>                 <s id="git">Yes. Commits are atomic.</s>
scm>             </compare>

O.K.

scm>         </section>
scm>         <section id="move">
scm>             <title>Files and Directories Moves or Renames</title>
scm>             <expl>
scm>                 Does the system support moving a file or directory to
scm>                 a different location while still retaining the history
scm>                 of the file? <b>Note:</b> also see the next section
scm>                 about intelligent merging of renamed paths.
scm>             </expl>

In my opinion this criterion is next to worthess without more in depth
clarification of what does it mean to "support" moves or renames; as
entries for different systems are written by different people, if it
is not clear how to check if some feature is supported, some might
write 'no' for some system A, and some other person can write 'yes'
for other system B, even if the support is better in system A than in
system B (and would be considered enough, i.e. 'yes' answer, by the
creator of this criterion).

For me the support for renames/moves and copying (see next section)
means that:

 0.) When examining or going to some point in the history (some old
     revision/version of a project) the state you get is _exactly_
     the same as it was at that time, exactly the same as it was
     recorded (comitted) then.

     For example tricks with moving *,v files in the CVS repository
     break this assertion.

 1.) When examining history of a project as a whole version control
     system tells you that file was renamed (moved). I would consider
     having there renaming represented as copy + delete to be only
     a partial support of this feature.

     Note that while tool might correctly notify about file renames
     (I would consider heuristics which give correct answer in 99%
     or so "true life" example to 'correctly notify'), it might notice
     full directory renames only as renames of individual files.
     I guess that at least for some systems this issue was not taken
     into account...

 2.) When examining history of an individual file (or perhaps even of
     an individual directory), either in the form of list of revisions
     which touch given file in the form of "$scm log <file>" output or
     some graphical history viewer output, or in the form of annotations
     of file contents (so called here 'per-line history') in the form
     of "$scm blame <file>" / "$scm annotate <file>", we would want for
     SCM to follow history of contents across file renames (and other
     code movements if possible; but that is outside of scope of this
     criterion).

     Side note: history of two files can be more than sum (union) of
     histories of individual files.

From the comments I have heard it looks like at least for some version
control systems contributors used the meaning '0', while most users
(readers) would think of '1+2', good if not forgetting about '0'.

Here (and in other places) it would be nice to have actual *TEST*, which
can be used to determine if given version control system "supports"
'Files and Directories Moves or Renames' criterion/feature. Attention!
because Git does similarity based rename detection (contents + pathname
based similarity score), one should use better some larger test vector,
otherwise Git and other systems using rename detection would be at
disadvantage. An example of such test would be t/t*rename* tests from
git; we could also use 'Lorem ipsum' or 'Dominus regit me' test vectors.

So for example 1.) could be tested as:
 $ scm add A
 $ ...
 $ scm mv A B
 $ ...
 $ scm log [options]   # <- has info about A => B rename

while 2.) could be tested as:
 $ scm add A
 $ ...
 $ scm mv A B
 $ ...
 $ scm log [options] B  # <- goes to initial revision of A

By the way, there is even simpler operation than support for renames
that SCM can screw up (file-history based SCM are specially
susceptible).  Try to delete a file, and then later create _different_
file (separate history) with the same filename.

scm>             <compare>

scm>                 <s id="git">
scm>                     Renames are supported for most practical
scm>                     purposes.  Git even detects renames when a file has been
scm>                     changed afterward the rename.  However, due to a peculiar
scm>                     repository structure, renames are not recorded
scm>                     explicitly, and Git has to deduce them (which works well
scm>                     in practice).
scm>                 </s>

First, a correction to above statement.  It is not due to "a peculiar
repository structure", but due to "a design decision" (perhaps with
link to some explanation why it was implemented this way; I planned
to make a wiki page about 'rename tracking' vs. 'rename detection'
with references to various mailing list messages etc., but to this
day it was not created).

Second, we can think about how the above statement could be improved.

For example Git fullfils '0' even without rename detection, due to
the fact that it is whole-tree snapshot-based VCS.  From descriptions
for other version control systems (see "Version Control System
Comparison" subpage of "Better SCM Initiative : Comparison" at
http://better-scm.berlios.de/comparison/comparison.html) it looks like
at least some contributors thought that having '0' supported is enough
to say 'Yes' to this question.

Git uses rename detection, not rename tracking (usually file-id/inode
based) to be able to notify about renames in the diff / whatchanged /
diffstat or summary output.  So I would say that in practice (with
some unfortunate exceptions) Git fills '1', which means showing renames
in whole project log well.

When talking about rename detection for a single file history, here
the situation gets difficult.  On the one hand "git log --follow <file>"
is a bit of hack and works only for simple histories, failing for
example on subtree merge; other example would be 'gitweb/gitweb.perl'
file in git repository, which '--follow' doesn't follow to initial
'gitweb.cgi' file from what once been gitweb repository.  One has
to use then "git log -- <old name> <new name>"; this is caused by the
fact that git always concentrated more on full repository history, and
by how path limiting works.  On the other hand Git has as far as I know
_unique_ blame tool which is able to follow code movement; this covers
more than only following contents across wholesame file rename.  This
feature IMHO is best examined using "git gui blame <file>" or other
graphical blame/annotate viewers (QGit has one, for example).

To be honest git currently does not have _directory_ rename detection
(which for example leads to some quirks in dealing with renames during
merge, to be more exact dealing with new files in a directory which
got renamed by other side); it currently supports directory renames
by detecting renames of files it contains (path similarity is part of
rename-detection similarity score).  But this is not insurmountable
obstacle, and does not require changing design and tracking renames.

...Now only put the above in a few short sentences to be used in
"Better SCM Initiative" comparison table...

scm>            </compare>
scm>         </section>
scm>         <section id="intelligent_renames">
scm>             <title>Intelligent Merging after Moves or Renames</title>
scm>             <expl>
scm>                 If the system keeps tracks of renames, does it support
scm>                 intelligent merging of the files in the history after
scm>                 the rename? (For example, changing a file in a renamed
scm>                 directory, and trying to merge it).
scm>             </expl>

Here also the criterion is not completly clear.  The example helps
a little, but it should perhaps be expanded a little.  I don't know
also why the example is unnecessary complicated, with renaming
directory; perhaps this version is shorter to describe.

For me "Intelligent Merging after Moves or Renames" consist of the
following items: merging renames, applying change to correct file,
dealing with renamed directories, and new merge conflict types related
to renames and similar things.

Let me explain each concept with a little test case checking if given
SCM support respective feature:
 * merging renames: if one side renamed file you should get rename on
   merge; renaming a file and then merging that rename.
     [on branch b]$ scm mv foo bar
     [on branch a]$ scm commit ... # to not have fast-forward case
     [on branch a]$ scm merge b
     expected result> you have file 'bar', and do not have file 'foo'

 * applying change to correct file: if our side renamed a file (or, as
   in above example rename directory it is in, which does rename full
   pathname of a file indirectly), and possibly change it, and the other
   side changed file, we would want merge to bring changes to file after
   rename.
     [on branch a]$ scm mv foo bar
     [on branch a]$ edit bar && scm commit # optionally
     [on branch b]$ edit foo
     [on branch b]$ scm commit -m 'FOO'
     [on branch a]$ scm merge b
     expected result> you have changes made on branch 'b' to file 'foo'
                      (commit 'FOO') in file 'bar'
   Note that like in example in previous item all operations take place
   _after_ branching point (after creation of branch b off branch a).

   This is I guess what most people think when talking about
   rename-aware (intelligent) merging.

 * renamed directories bring another complication (described for example
   on Mark Shuttleworth blog in articles about DVCS, promoting Bazaar-NG),
   namely how to deal with merging changes where other side creates
   _new files_ in renamed directory.
     [on branch a]$ scm mv subdir-foo/ subdir-bar/
     [on branch b]$ scm add subdir-foo/baz
     [on branch a]$ scm merge b
     expected result> New file subdir-bar/baz
   There is a bit of controversy about this feature, as for example in
   some programming languages (e.g. Java) or in some project build tool
   info it is not posible to simply move a file (or create new file in
   different directory) without changing file contents.  Some say that
   is better to fail than to do wrongly clean merge.

scm>             <compare>
scm>                 <s id="accurev">
scm>                     Unknown. FILL IN.
scm>                 </s>

As you can see it is new criterion :-)

scm>                 <s id="git">
scm>                     No. As detailed in the <a
scm>                         href="http://git.or.cz/gitwiki/GitFaq#rename-tracking">Git
scm>                         FAQ</a>:
scm>                     "Git has a rename command git mv, but that is just a
scm>                     convenience. The effect is indistinguishable from removing
scm>                     the file and adding another with different name and the
scm>                     same content."
scm>                 </s>

This is of course NOT TRUE.  If the author bother checking (which
would be helped if there was available simple shell script, or simple
Perl script, testing 'intelligent_renames' criterion) he/she would
notice that git does apply change to renamed file, both if file
itself is renamed, and if directory it is in gets renamed.

If I understand correctly dealing with file renames and moving files
around (one could say: refactoring directory hierarchy/structure) was
main reason (or one of main reasons) for adding rename detection to
Git.  In practice it works quite well (which for the test mean testing
with large enough contents to be able to use similarity based rename
detection).

What Git _currently_ doesn't support (at least for now, with lack of
detection of directories as a whole) is with adding new files to the
renamed directory, as described a bit above.

scm>         <section id="copy">
scm>             <title>File and Directories Copies</title>
scm>             <expl>
scm>                 Does the version control system support copying
scm>                 files or directories to a different location at the
scm>                 repository level, while retaining the history?
scm>             </expl>

The same complaint as with the "File and Directory Moves or Renames".
What does "support copying" mean for SCM in question, in this context?

scm>             <compare>
scm>                 <s id="git">No.  Copies are not supported.</s>
scm>            </compare>

To a large extent NOT TRUE.  Copies _ARE_ supported in Git using the
same mechanism of similarity based detection as for renames.

There are however some caveats and limitations compared to rename
detection.  

First, you have to enable copies detection.  While it is not uncommon
to have rename detection turned on (I'm not sure if it is not on by
default, for example for git-show; nevertheless you can turn it on for
diffs using diff.renames configuration variable, and for example gitweb
web interface by default detects renames), it is much less common to
have copies detection turned on by default, as it is more expensive
operation.

Second, for performance reasons Git finds copies only if the original
file of the copy was modified in the same changeset.  You can search
for copies in all files, but it is much more expensive operation.

On the other hand git-blame can be asked to deal with code copying,
even across files; as far as I know Git is the _only_ SCM which has
file line provenance annotation tool which supports this.

scm>         <section id="repos_clone">
scm>             <title>Remote Repository Replication</title>
scm>             <expl>
scm>                 Does the system support cloning a remote repository to get
scm>                 a functionally equivalent copy in the local system? That 
scm>                 should be done without any special access to the remote 
scm>                 server except for normal repository access.
scm>             </expl>

This means either that SCM in question is distributed, or that there
exists some replication / morroring tool (for centralized SCMs).

scm>             <compare>
scm>                 <s id="bazaar">Yes.</s>
scm>                 <s id="darcs">Yes.</s>
scm>                 <s id="mercurial">Yes.</s>
scm>                 <s id="monotone">Yes.</s>

scm>                 <s id="git">Yes.  This is very intrinsic feature of Git.</s>

In fact this is 'very intrinsic feature' of each distributed SCM...
well, unless one takes into account difference between single-branch
or workdir-per-branch distributed SCM and multiple-branch-per-repository
distributed SCM.  Then this is a bit more complicated.

In short: I think that simple 'Yes.' answer for Git would be better.

scm>         <section id="push">
scm>             <title>Propagating Changes to Parent Repositories</title>
scm>             <expl>
scm>                 Can the system propagate changes from one repository to 
scm>                 another?
scm>             </expl>

O.K.

scm>             <compare>
scm>                 <s id="mercurial">Yes.</s>
scm>                 <s id="monotone">Yes.</s>
scm>                 <s id="git">Yes.  (The Linux kernel development process uses this extremely often).</s>
scm>             </compare>
scm>         </section>

I'm not sure if this comment is there really necessary.  I would avoid
it, especially that as far as I understand Linux kernel development
uses patch+email based system as extensively or even more extensively,
at least onlietenants level.

scm>         <section id="permissions">
scm>             <title>Repository Permissions</title>
scm>             <expl>
scm>                 Is it possible to define permissions on access to different
scm>                 parts of a remote repository? Or is access open for all? 
scm>             </expl>

Side note: Karl Fogel in his book "Producing Open Source Software. 
How to Run a Successful Free Software Project" (http://producingoss.com)
wrote basing on his work on _Subversion_ (which is centralized SCM),
that there are usually many advantages to use 'honor system' instead
of repository permission, i.e. use social solution than technological
solution, see "Chapter 3. Technical Infrastructure", section "Version
Control", subsection "Authorization"
  http://producingoss.com/en/vc.html#vc-authz

Distributed version control systems like Git, Mercurial or Bazaar-NG
offers even wider selection of ways to implement 'honor system', and
solve "Repository Permissions" problem using social solution.

[Here would be nice to have link to discussion of "Prodicting OSS" book
on git mailing list, and to article discussion it]

scm>             <compare>
scm>                 <s id="bazaar">
scm>                     Basic access control can be implemented through a
scm>                     contributed hook script.  ACL support for the
scm>                     Bazaar server is planned.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     Yes. It is possible to lock down repositories,
scm>                     subdirectories, or files using hooks.
scm>                 </s>
scm>                 <s id="monotone">
scm>                     Yes. It is possible to restrict incoming changes
scm>                     from certain sources to be performed only in certain
scm>                     parts of the repository.
scm>                 </s>
[...]
scm>                 <s id="git">
scm>                     No, but a single server can serve many repositories.
scm>                     Also, UNIX permissions can be used to some extent.</s>
scm>             </compare>
scm>         </section>

Side note: why Git entry was not word-wrapped like the entries for most
other SCM, but used single long line? I have rewrapped it for better
readibility.

First, there is possible to lock down repositories, using permissions
of underlying protocols (SSH, WebDAV), or using additional tools like
Gitosis, ssh_acl or example hook contrib/hooks/update-paranoid.  It
is possible to lock down (limit access to) branches and tags, which is
not mentioned as scope of this criterion, and I think is more important
feature.

Second, I think it is possible to restrict incoming changes from certain
sources to subdirectories or files using hooks; but as far as I know
there doesn't exist any such example hook.

And third, it is not as important for distributed SCM to have
fine-grained technical solution when there are many social solutions
to this problem; for example in Git when you do a pull from other
repository it would (usually) show you diffstat of changes, so you
can easily see if there were changes made outside some directory limits.

scm>         <section id="changesets">
scm>             <title>Changesets' Support</title>
scm>             <expl>
scm>                 Does the repository support changesets? Changesets are a way
scm>                 to group a number of modifications that are relevant to each
scm>                 other in one atomic package, that can be cancelled or 
scm>                 propagated as needed.
scm>             </expl>

Here it is not entirely clean what creator of "Better SCM Initiative"
comparison table had on mind, what he meant by this.  Not all version
control systems are changeset based; some are snapshot based.  I guess
that for snapshot based SCM the above requirement is equivalent to
"Whole tree commits".

scm>             <compare>
scm>                 <s id="cvs">No. Changes are file-specific.</s>
scm>                 <s id="subversion">Partial support. There are implicit 
scm>                     changeset that are generated on each commit.
scm>                 </s>
scm>                 <s id="bazaar">
scm>                     Yes. Changesets are supported.
scm>                 </s>
scm>                 <s id="darcs">
scm>                     Yes. Changesets are supported.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     Yes. Changesets are supported.
scm>                 </s>
scm>                 <s id="monotone">
scm>                     Yes. Changesets are supported.
scm>                 </s>
scm>                 <s id="git">
scm>                     Yes, Changesets are supported, 
scm>                     and there's some flexibility in creating them.
scm>                 </s>
scm>            </compare>
scm>         </section>

[Again, Git part was re-wrapped for better readibility]

In my opition, such an _empty_ addition ("there's some flexibility in
creating them") is totally unnecessary; it adds no solid information
(what does it mean "some flexibility") and should be removed.

If it was about Git being at the heart snapshot based rather than delta
(changeset) based, then it should be reworded to make it clear 
(if deemed to be necessary).

scm>         <section id="annotate">
scm>             <title>Tracking Line-wise File History</title>
scm>             <expl>
scm>                 Does the version control system have an option to track the
scm>                 history of the file line-by-line? I.e., can it show for each line
scm>                 at which revision it was most recently changed, and by whom?
scm>             </expl>

Here it would be nice to have example of such output, but I think
everyone knows what this criterion means in the term of SCM features.

scm>             <compare>
scm>                 <s id="git">Yes. (git blame).</s>
scm>            </compare>

Perhaps we could also add that git-blame supports (if requested)
tracking changes across code movement and code copying (crossing
file boundaries if necessary, and can ignore changes in whitespace.
And there is also "pickaxe" search, which can find deleted contents,
which is one of major limitations of usability of line-wise file
history (line provenance) annotations.

On the other hand because Git is based towards whole project history,
and not per file history, git-blame is slow.  To migitate that there
is incremental blame mode used to reduce latency in graphical blame
viewers like "git gui blame", contrib/blameview, or the one in QGit.

scm>     <section id="features">
scm>         <title>Features</title>
scm>         <section id="work_on_dir">
scm>             <title>Ability to Work only on One Directory of the Repository</title>
scm>             <expl>
scm>                 Can the version control system checkout only one directory of
scm>                 the repository? Or restrict the check-ins to only one 
scm>                 directory?
scm>             </expl>

This is combination of "restricted check-ins" and so called "partial
checkout", or "sparse checkout", or "narrow checkout".

scm>             <compare>
scm>                 <s id="bazaar">For checkouts: No. For checkins: Yes.</s>
scm>                 <s id="darcs">
scm>                     It is possible to commit only a certain directory. 
scm>                     However, one must check out the entire repository as a
scm>                     whole.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     It is possible to commit changes only in a subset of the
scm>                     tree. There are plans for partial checkouts.
scm>                 </s>
scm>                 <s id="monotone">
scm>                     It is possible to commit changes only in a subset of the
scm>                     tree. However, one must extract the entire tree to work
scm>                     on it.
scm>                 </s>

scm>                 <s id="git">
scm>                     No.  However, commits could be restricted somewhat,
scm>                     see the "Repository Permissions".
scm>                 </s>

I think (depending of course on "Repository Permissions" part) that the
part about 'work_on_dir' for checkins should be made more clear.  Note
also that for this criterion, for distributed version control systems,
one should consider difference between comitting changes (pre-commit
hook), and publishing changes (update and post-receive hook).

I would also add that "There are plans for partial checkout" (or rather
"sparse" checkouts), where "plans" for this mean "preliminary work".
Although implementing this idea seems stalled a bit.  I guess that when
Git acquires ability to do sparse checkout, it would have it done
correctly (c.f. git submodules and svn:externals).

scm>         <section id="tracking_uncommited_changes">
scm>             <title>Tracking Uncommited Changes</title>
scm>             <expl>
scm>                 Does the software have an ability to track the changes in the
scm>                 working copy that were not yet committed to the repository?
scm>             </expl>

This also should be made more clean.  Does it mean for example ability
to tell which files have changed, or ability to diff working copy to
either last comitted changes, or to any revision available in repository?

scm>             <compare>
scm>                 <s id="cvs">Yes. Using cvs diff</s>

scm>                 <s id="git">
scm>                     Yes.

"Using git diff"?  The problem is with [possible] difference between
"git diff", "git diff HEAD", "git diff --cached".

scm>
scm>                     Also, branches are very lightweight in Git, and
scm>                     could be considered a kind of storage for "uncommitted"
scm>                     code in some workflows.
scm>                 </s>

I'm not sure if it is worth mentioning here _explicit_ staging area
(index) available in Git.

BTW. it would be nice if "git gui", the Git GUI distributed with Git,
had some graphical diff (and diff3) view tool.

scm>         <section id="per_file_commit_messages">
scm>             <title>Per-File Commit Messages</title>
scm>             <expl>
scm>                 Does the system have a way to assign a per-file commit message
scm>                 to the changeset, as well as a per-changeset message?
scm>             </expl>
scm>             <compare>
scm>                 <s id="git">No. Commit messages are per changeset.</s>
scm>            </compare>
scm>         </section>
scm>     </section>

O.K.

By the way, does anybody know what happened to the 'commit annotations',
aka 'notes' idea?

scm>     <section id="technical_status">
scm>         <title>Technical Status</title>
scm>         <section id="documentation">
scm>             <title>Documentation</title>
scm>             <expl>
scm>                 How well is the system documented? How easy is it to
scm>                 get started using it?
scm>             </expl>
scm>             <compare>
scm>                 <s id="git">
scm>                     Medium. The short help is too terse and obscure.
scm>                     The man pages are extensive, but tend to be confusing.
scm>                     The are many tutorials.
scm>                 </s>
scm>             </compare>
scm>         </section>

That of course depends on your opinion.  I would say "Good", now that
there is "Git User's Manual" distributed with Git, and now that there
started semi-official "Git Community Book" (http://book.git-scm.com).

[Perhaps we could use some survey results do defend that fact.]

scm>         <section id="ease_of_deployment">
scm>             <title>Ease of Deployment</title>
scm>             <expl>
scm>                 How easy is it to deploy the software? What are
scm>                 the dependencies and how can they be satisfied?
scm>             </expl>
scm>             <compare>
scm>                <s id="git">
scm>                    Good.  Binary packages are available
scm>                    for modern platforms.  C compiler and Perl are
scm>                    required. Requires Cygwin on Windows, and has some
scm>                    UNIXisms.
scm>                </s>
scm>            </compare>

On one hand there are are still a few important Git commands like
git-am (for patch+email based workflows), git-bisect, git-pull,
git-rebase[1], git-stash and internal parts of git-merge[2] which do
require POSIX shell, and what is inherent in shell scripting some core
utilities like grep, sed, cat; also for some workflows ssh is needed.
This is gets reduced bit by bit due to builtinification efforts.

On the other hand thanks to msysGit project Git does not require Cygwin
to be installed on MS Windows.

I would also remove "has some UNIXisms" which doesn't bring IMVHO
any information.

[1] This I hope would change thanks to builtin git-sequencer from GSoC
    (or rather post-GSoC work).
[2] This I hope would change thanks to post-GSoC expansion on 
    builtin git-merge

scm>         <section id="command_set">
scm>             <title>Command Set</title>
scm>             <expl>
scm>                 What is the command set? How compatible is it with
scm>                 the commands of CVS (the current open-source defacto
scm>                 standard)?
scm>             </expl>

Sidenote: I'm not sure if CVS is still "defacto standard"; additionally
distributed SCM have enable vastly different workflows, so it is hard
to compare their command set to that of CVS, and such comparison covers
only subset of DSCM commands.

scm>             <compare>
scm>                 <s id="subversion">
scm>                     A CVS-like command set which is easy to get used to
scm>                     for CVS-users.
scm>                 </s>
scm>                 <s id="bitkeeper">
scm>                     A CVS-like command set with some easy-to-get-used-to
scm>                     complications due to its different way of work and 
scm>                     philosophy.
scm>                 </s>
scm>                 <s id="bazaar">
scm>                 <s id="mercurial">
scm>                 <s id="monotone">
scm>                     Tries to follow CVS conventions, but deviates
scm>                     where there is a different design.
scm>                 </s>
scm>                 <s id="perforce">
scm>                     Very extensive but not compatible with CVS.
scm>                 </s>

scm>                 <s id="git">
scm>                     Command set is very feature-rich,
scm>                     and not compatible with CVS.
scm>                 </s>

I wouldn't say that situation with Git is different from situation with
Mercurial, Bazaar-NG and Monotone, especially with respect to subset of
commands which have equivalents in CVS.  Although Git doesn't "try to
follow CVS conventions", it does follow BitKeeper convention, then by
transitive also CVS conventions.  I would agree with "feature-rich"
comment, though ;-)

scm>         <section id="networking">
scm>             <title>Networking Support</title>
scm>             <expl>
scm>                 How good is the networking integration of the system?
scm>                 How compliant is it with existing protocols and infra-structure?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">
scm>                     Excellent. Works natively over HTTP (read-only),
scm>                     FTP and SFTP without having Bazaar installed at
scm>                     the remote end.  Works over HTTP, SSH and a custom
scm>                     protocol when talking to a remote Bazaar
scm>                     server. Supports RSYNC and WebDAV (experimental)
scm>                     through plugins.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     Excellent.  Uses HTTP or ssh.  Remote access also
scm>                     works safely without locks over read-only network
scm>                     filesystems.
scm>                 </s>
scm>                 <s id="git">
scm>                     Excellent.  Can use native Git protocol,
scm>                     but works over rsync, ssh, HTTP and HTTPS also.
scm>                 </s>

It could be written differently, but O.K.

scm>         <section id="portability">
scm>             <title>Portability</title>
scm>             <expl>
scm>                 How portable is the version-control system to various 
scm>                 operating systems, computer architectures, and other
scm>                 types of systems?
scm>             </expl>
scm>             <compare>
scm>               <s id="git">
scm>                  The client works on most UNIXes, but not on native
scm>                  MS-Windows. The Cygwin build seems to be workable, though.
scm>               </s>
scm>             </compare>
scm>         </section>
scm>     </section>

"Most UNIXes" (or is it Unices)?  On what modern UNIX Git doesn't work?

Again, the author of of entries for Git doesn't seem to know about
msysGit project, which is native MS Windows implementation (utilizing
MSYS / MinGW).  And what does "Cygwin build _seems_ to be workable"
mean?

The entry for Git lacks also single word descriptions, like "Excellent",
"Very good", "Good", "Medium", that most other SCM have in this part
(and "Windows only" for some).

scm>     <section id="user_interaces">
scm>         <title>User Interfaces</title>
scm>         <section id="web_interface">
scm>             <title>Web Interface</title>
scm>             <expl>
scm>                 Does the system have a WWW-based interface that can be
scm>                 used to browse the tree and the various revisions of the
scm>                 files, perform arbitrary diffs, etc?
scm>             </expl>
scm>             <compare>
scm>                 <s id="git">
scm>                     Yes.  Gitweb is included in distribution.
scm>                 </s>
scm>             </compare>
scm>         </section>

For other SCMs there are listed many different web interfaces.
So I would perhaps put here a list, like in
  http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-e5a6762d6aed31c5a2034d52c1733ead46402c31

(There is slight problem with Gitweb, which has neither homepage nor
separate repository; we can use Gitweb page on Git Wiki, or README
from git.git repository via gitweb ;-).

scm>         <section id="availability_of_guis">
scm>             <title>Availability of Graphical User-Interfaces.</title>
scm>             <expl>
scm>                 What is the availability of graphical user-interfaces for
scm>                 the system? How many GUI clients are present for it?
scm>             </expl>
scm>             <compare>
scm>                 <s id="git">
scm>                     Gitk is included in distribution.  
scm>                     QGit and Git-gui tools are also available.
scm>                 </s>
scm>            </compare>
scm>         </section>
scm>     </section>

git-gui is _also_ included in distribution. So I would say:

                 <s id="git">
                     Gitk and git-gui are included in distribution.  
                     <a href="[1]">Other tools</a> are also available.
                 </s>

[1] http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-cee25e252efc24b245482fe9fa8d24ff5d5af1d6

scm>     <section id="license">
scm>         <title>License</title>
scm>         <expl>
scm>             What are the licensing terms for the software? 
scm>         </expl>
scm>         <compare>
scm>             <s id="git">GNU GPL v2 (open source).</s>
scm>         </compare>

O.K.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-13 17:06 Git at Better SCM Initiative comparison of VCS (long) Jakub Narebski
@ 2008-09-14 14:43 ` Dmitry Potapov
  2008-09-14 15:09   ` Alexey Mahotkin
  2008-09-14 17:48   ` Jakub Narebski
  2008-10-01 18:45 ` Jakub Narebski
  1 sibling, 2 replies; 9+ messages in thread
From: Dmitry Potapov @ 2008-09-14 14:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Alexey Mahotkin

Hello Jakub,

I have added Alexey Mahotkin in CC, who is allegedly the author of that
information about Git that you can read on the better-scm site.

On Sat, Sep 13, 2008 at 07:06:16PM +0200, Jakub Narebski wrote:
> 
> I have thought about trying yet another time... but Git was already
> added; see http://better-scm.berlios.de/news/changes-2008-08-07/

Interesting, the site still mentions Git as missing in a few places.
For instance, when you click on Git in the list of alternatives, you
get this: http://better-scm.berlios.de/alternatives/git/
and then when you got to FAQ, you can read this:

| The reason it's not there is that while many people have complained
| about its absense, no one suitable has volunteered to become its
| champion and supplied a good enough patch. If you have a substantial
| amount of git expertise, have good English writing skills, and wish to
| volunteer, then we'll be happy to hear from you. If not - at least don't
| complain about it.
|
| In addition to everything that was said here, it seems that the
| originator and maintainer of the site and comparison is now banned
| from sending messages to vger.kernel.org, which hosts several
| Linux-kernel-related mailing-lists, including the git one. This has
| interfered with some of his Linux-related open-source work, including
| trying to find a "Better SCM" maintainer for git. This is unfortunate,
| but changing this situation, is currently beyond his control.

Source: http://better-scm.berlios.de/faq/#git-missing

I am surprised to hear that Shlomi Fish is banned...

> scm>     <section id="repos_operations">
> scm>         <title>Repository Operations</title>
> scm>         <section id="atomic_commits">
> scm>             <title>Atomic Commits</title>
> scm>             <expl>
> scm>                 Support for atomic commits means that if an
> scm>                 operation on the repository is interrupted
> scm>                 in the middle, the repository will not be
> scm>                 left in an inconsistent state. Are the
> scm>                 check-in operations atomic, or can
> scm>                 interrupting an operation leave the
> scm>                 repository in an intermediate state?
> scm>             </expl>
> 
> Here I think the explanation of a criterion (feature) is clear enough.
> I might have added that "interruption" include killing of a process
> during for example commit, lack of disk space for a full commit, or
> a network fail during network operation (fetch or push, or equivalent).

My initial reaction was to say that killing a process with -9 is not
what you expect to see in practice, but a second later, I realized how
wrong I was. Lack of memory may cause that the process gets killed with
-9, and it has been observed in practice (at least, in case of Mercury
repo): http://norman.walsh.name/2007/08/09/mercurial

Another thing that is not clear in the above criterion is what exactly
"inconsistent state" (or "intermediate state") means. For instance, if
Git gets killed during commit, you may have to remove .git/index.lock
manually. AFAIK, Mercury leaves the 'journal' file and you have to
run "hg recovery". Does it mean that the commit is not atomic?

Another thing here is that "git commit" is local, so I am not sure
if this question includes network operations...

> scm>         <section id="move">
> scm>             <title>Files and Directories Moves or Renames</title>
> scm>             <expl>
> scm>                 Does the system support moving a file or directory to
> scm>                 a different location while still retaining the history
> scm>                 of the file? <b>Note:</b> also see the next section
> scm>                 about intelligent merging of renamed paths.
> scm>             </expl>
> 
> In my opinion this criterion is next to worthess without more in depth
> clarification of what does it mean to "support" moves or renames; as
> entries for different systems are written by different people, if it
> is not clear how to check if some feature is supported, some might
> write 'no' for some system A, and some other person can write 'yes'
> for other system B, even if the support is better in system A than in
> system B (and would be considered enough, i.e. 'yes' answer, by the
> creator of this criterion).
> 
> For me the support for renames/moves and copying (see next section)
> means that:
> 
>  0.) When examining or going to some point in the history (some old
>      revision/version of a project) the state you get is _exactly_
>      the same as it was at that time, exactly the same as it was
>      recorded (comitted) then.
> 
>      For example tricks with moving *,v files in the CVS repository
>      break this assertion.

IMHO, the above assertion is assumed when we talk about renaming, as
the system that is not capable of that will not be qualified as an
SCM. Yet, there is still plenty way to interpret the above criterion.
Even in CVS, the history of the file does not disappear when you move
a file. You can just write, this file move was renamed from old-name,
so anyone can get old history without any problem. Of course, it will
require some an additional step taken manually. But if the requirement
is to see all log history with one $scm log command, you can just copy
old log into log of a newly added file. Of course, you cannot run $scm
annotate on that file and see who changed what line, but there is no
such a requirement above.

So, I agree, it should be better defined.

> 
>  1.) When examining history of a project as a whole version control
>      system tells you that file was renamed (moved). I would consider
>      having there renaming represented as copy + delete to be only
>      a partial support of this feature.

If files moving is interpreted in the sense of preserving the old history
then copy + delete fully satisfies that criterion.

However, if you defined support of file movement as ability to see that
some file when you look at the history of the whole project then
certainly copy + delete representation would not satisfy it.

So, perhaps, it should be two separate points:
- ability to preserve history of rename (with detail clarification
  of what it means)
- ability to show renames in the project history

> 
> scm>                 <s id="git">
> scm>                     Renames are supported for most practical
> scm>                     purposes.  Git even detects renames when a file has been
> scm>                     changed afterward the rename.  However, due to a peculiar
> scm>                     repository structure, renames are not recorded
> scm>                     explicitly, and Git has to deduce them (which works well
> scm>                     in practice).
> scm>                 </s>
> 
> First, a correction to above statement.  It is not due to "a peculiar
> repository structure", but due to "a design decision" (perhaps with
> link to some explanation why it was implemented this way; I planned
> to make a wiki page about 'rename tracking' vs. 'rename detection'
> with references to various mailing list messages etc., but to this
> day it was not created).

Agreed.

> 
> 
> Second, we can think about how the above statement could be improved.
> 

<long and detail explanation of how git works>

> 
> ...Now only put the above in a few short sentences to be used in
> "Better SCM Initiative" comparison table...

Git tracks content rather than file-ids, and therefore it uses heuristics
for rename detection.  This approach has an advantage of being able to
preserve history for code lines between files, which usually happens much
more often than file renaming.

> scm>                 <s id="git">
> scm>                     No. As detailed in the <a
> scm>                         href="http://git.or.cz/gitwiki/GitFaq#rename-tracking">Git
> scm>                         FAQ</a>:
> scm>                     "Git has a rename command git mv, but that is just a
> scm>                     convenience. The effect is indistinguishable from removing
> scm>                     the file and adding another with different name and the
> scm>                     same content."
> scm>                 </s>
> 
> This is of course NOT TRUE.  If the author bother checking (which
> would be helped if there was available simple shell script, or simple
> Perl script, testing 'intelligent_renames' criterion) he/she would
> notice that git does apply change to renamed file, both if file
> itself is renamed, and if directory it is in gets renamed.

Sure. But it just demonstrates that the line of reasoning, which was
clearly based on unstated assumption of how file-id tracking performs
merge in this situation leads to the wrong conclusion for Git as it is
the content tracking system, so Git does that differently.

Perhaps, it would make sense to extend GitFaq to better cover that
point, because people with other SCM background could easily conclude
that Git cannot do "intelligent merge" after reading about git-mv.

> scm>         <section id="changesets">
> scm>             <title>Changesets' Support</title>
> scm>             <expl>
> scm>                 Does the repository support changesets? Changesets are a way
> scm>                 to group a number of modifications that are relevant to each
> scm>                 other in one atomic package, that can be cancelled or 
> scm>                 propagated as needed.
> scm>             </expl>
> 
> Here it is not entirely clean what creator of "Better SCM Initiative"
> comparison table had on mind, what he meant by this.  Not all version
> control systems are changeset based; some are snapshot based.  I guess
> that for snapshot based SCM the above requirement is equivalent to
> "Whole tree commits".

Yes, it is irrelevant to being changeset or snapshot based. It is
whether modification to more than one file can be commited (and
propogated) atomically. I also suppose that those changes should be
shown in history as a single change (not many changes too different
files that took place in the same time and the same commit comment).

However, the whole tree commit is a more strict requirement than
just being able to commit a group of changes atomically. For example,
"svn ci" creates a changeset and atomically store all its modification
on the server. Yet, it is not the whole tree commit, because the result
tree may differ from the tree that you commiting (files that are not
modified by changeset may differ).

> scm>                 <s id="git">
> scm>                     Yes, Changesets are supported, 
> scm>                     and there's some flexibility in creating them.
> scm>                 </s>
> scm>            </compare>
> scm>         </section>
> 
> [Again, Git part was re-wrapped for better readibility]
> 
> In my opition, such an _empty_ addition ("there's some flexibility in
> creating them") is totally unnecessary; it adds no solid information
> (what does it mean "some flexibility") and should be removed.

Agreed. I suspect the author implied by that Git allows to stage
and commit separately chunk without commiting the whole file.
Yet, as it is worded above, it is useless.

> scm>         <section id="tracking_uncommited_changes">
> scm>             <title>Tracking Uncommited Changes</title>
> scm>             <expl>
> scm>                 Does the software have an ability to track the changes in the
> scm>                 working copy that were not yet committed to the repository?
> scm>             </expl>
> 
> This also should be made more clean.  Does it mean for example ability
> to tell which files have changed, or ability to diff working copy to
> either last comitted changes, or to any revision available in repository?

Also, ability to diff one or more specified files in the working copy to
some specified revision.

> scm>     <section id="technical_status">
> scm>         <title>Technical Status</title>
> scm>         <section id="documentation">
> scm>             <title>Documentation</title>
> scm>             <expl>
> scm>                 How well is the system documented? How easy is it to
> scm>                 get started using it?
> scm>             </expl>
> scm>             <compare>
> scm>                 <s id="git">
> scm>                     Medium. The short help is too terse and obscure.
> scm>                     The man pages are extensive, but tend to be confusing.
> scm>                     The are many tutorials.
> scm>                 </s>
> scm>             </compare>
> scm>         </section>
> 
> That of course depends on your opinion.  I would say "Good", now that
> there is "Git User's Manual" distributed with Git, and now that there
> started semi-official "Git Community Book" (http://book.git-scm.com).

Interesting that versioncontrolblog, which, if I am not mistaken, is
Alexey's site, states for Git Documentation:

| Good. There is extensive documentation for every command, and many
| tutorials.

http://www.versioncontrolblog.com/comparison/Git/index.html

So, I am not sure were the word "Medium" came from.

Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-14 14:43 ` Dmitry Potapov
@ 2008-09-14 15:09   ` Alexey Mahotkin
  2008-09-14 17:48   ` Jakub Narebski
  1 sibling, 0 replies; 9+ messages in thread
From: Alexey Mahotkin @ 2008-09-14 15:09 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Jakub Narebski, git

Hi,

I've written the version which is on http://versioncontrolblog.com and
sent it to Mr. Shlomi Fish several months ago.     He has extensively
re-written my text, making it more "neutral", and published it on
better-scm.  I do not agree with some of the changes he made, but I
did not insist.  :)

Occasionally I update my text with the current version at better-scm,
but this has not happened for some time, and it still contains my
original version.

Is there anything I can do to improve the state of things in any way? :)

On Sun, Sep 14, 2008 at 6:43 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Hello Jakub,
>
> I have added Alexey Mahotkin in CC, who is allegedly the author of that
> information about Git that you can read on the better-scm site.
>
> On Sat, Sep 13, 2008 at 07:06:16PM +0200, Jakub Narebski wrote:
>>
>> I have thought about trying yet another time... but Git was already
>> added; see http://better-scm.berlios.de/news/changes-2008-08-07/
>
> Interesting, the site still mentions Git as missing in a few places.
> For instance, when you click on Git in the list of alternatives, you
> get this: http://better-scm.berlios.de/alternatives/git/
> and then when you got to FAQ, you can read this:
>
> | The reason it's not there is that while many people have complained
> | about its absense, no one suitable has volunteered to become its
> | champion and supplied a good enough patch. If you have a substantial
> | amount of git expertise, have good English writing skills, and wish to
> | volunteer, then we'll be happy to hear from you. If not - at least don't
> | complain about it.
> |
> | In addition to everything that was said here, it seems that the
> | originator and maintainer of the site and comparison is now banned
> | from sending messages to vger.kernel.org, which hosts several
> | Linux-kernel-related mailing-lists, including the git one. This has
> | interfered with some of his Linux-related open-source work, including
> | trying to find a "Better SCM" maintainer for git. This is unfortunate,
> | but changing this situation, is currently beyond his control.
>
> Source: http://better-scm.berlios.de/faq/#git-missing
>
> I am surprised to hear that Shlomi Fish is banned...
>
>> scm>     <section id="repos_operations">
>> scm>         <title>Repository Operations</title>
>> scm>         <section id="atomic_commits">
>> scm>             <title>Atomic Commits</title>
>> scm>             <expl>
>> scm>                 Support for atomic commits means that if an
>> scm>                 operation on the repository is interrupted
>> scm>                 in the middle, the repository will not be
>> scm>                 left in an inconsistent state. Are the
>> scm>                 check-in operations atomic, or can
>> scm>                 interrupting an operation leave the
>> scm>                 repository in an intermediate state?
>> scm>             </expl>
>>
>> Here I think the explanation of a criterion (feature) is clear enough.
>> I might have added that "interruption" include killing of a process
>> during for example commit, lack of disk space for a full commit, or
>> a network fail during network operation (fetch or push, or equivalent).
>
> My initial reaction was to say that killing a process with -9 is not
> what you expect to see in practice, but a second later, I realized how
> wrong I was. Lack of memory may cause that the process gets killed with
> -9, and it has been observed in practice (at least, in case of Mercury
> repo): http://norman.walsh.name/2007/08/09/mercurial
>
> Another thing that is not clear in the above criterion is what exactly
> "inconsistent state" (or "intermediate state") means. For instance, if
> Git gets killed during commit, you may have to remove .git/index.lock
> manually. AFAIK, Mercury leaves the 'journal' file and you have to
> run "hg recovery". Does it mean that the commit is not atomic?
>
> Another thing here is that "git commit" is local, so I am not sure
> if this question includes network operations...
>
>> scm>         <section id="move">
>> scm>             <title>Files and Directories Moves or Renames</title>
>> scm>             <expl>
>> scm>                 Does the system support moving a file or directory to
>> scm>                 a different location while still retaining the history
>> scm>                 of the file? <b>Note:</b> also see the next section
>> scm>                 about intelligent merging of renamed paths.
>> scm>             </expl>
>>
>> In my opinion this criterion is next to worthess without more in depth
>> clarification of what does it mean to "support" moves or renames; as
>> entries for different systems are written by different people, if it
>> is not clear how to check if some feature is supported, some might
>> write 'no' for some system A, and some other person can write 'yes'
>> for other system B, even if the support is better in system A than in
>> system B (and would be considered enough, i.e. 'yes' answer, by the
>> creator of this criterion).
>>
>> For me the support for renames/moves and copying (see next section)
>> means that:
>>
>>  0.) When examining or going to some point in the history (some old
>>      revision/version of a project) the state you get is _exactly_
>>      the same as it was at that time, exactly the same as it was
>>      recorded (comitted) then.
>>
>>      For example tricks with moving *,v files in the CVS repository
>>      break this assertion.
>
> IMHO, the above assertion is assumed when we talk about renaming, as
> the system that is not capable of that will not be qualified as an
> SCM. Yet, there is still plenty way to interpret the above criterion.
> Even in CVS, the history of the file does not disappear when you move
> a file. You can just write, this file move was renamed from old-name,
> so anyone can get old history without any problem. Of course, it will
> require some an additional step taken manually. But if the requirement
> is to see all log history with one $scm log command, you can just copy
> old log into log of a newly added file. Of course, you cannot run $scm
> annotate on that file and see who changed what line, but there is no
> such a requirement above.
>
> So, I agree, it should be better defined.
>
>>
>>  1.) When examining history of a project as a whole version control
>>      system tells you that file was renamed (moved). I would consider
>>      having there renaming represented as copy + delete to be only
>>      a partial support of this feature.
>
> If files moving is interpreted in the sense of preserving the old history
> then copy + delete fully satisfies that criterion.
>
> However, if you defined support of file movement as ability to see that
> some file when you look at the history of the whole project then
> certainly copy + delete representation would not satisfy it.
>
> So, perhaps, it should be two separate points:
> - ability to preserve history of rename (with detail clarification
>  of what it means)
> - ability to show renames in the project history
>
>>
>> scm>                 <s id="git">
>> scm>                     Renames are supported for most practical
>> scm>                     purposes.  Git even detects renames when a file has been
>> scm>                     changed afterward the rename.  However, due to a peculiar
>> scm>                     repository structure, renames are not recorded
>> scm>                     explicitly, and Git has to deduce them (which works well
>> scm>                     in practice).
>> scm>                 </s>
>>
>> First, a correction to above statement.  It is not due to "a peculiar
>> repository structure", but due to "a design decision" (perhaps with
>> link to some explanation why it was implemented this way; I planned
>> to make a wiki page about 'rename tracking' vs. 'rename detection'
>> with references to various mailing list messages etc., but to this
>> day it was not created).
>
> Agreed.
>
>>
>>
>> Second, we can think about how the above statement could be improved.
>>
>
> <long and detail explanation of how git works>
>
>>
>> ...Now only put the above in a few short sentences to be used in
>> "Better SCM Initiative" comparison table...
>
> Git tracks content rather than file-ids, and therefore it uses heuristics
> for rename detection.  This approach has an advantage of being able to
> preserve history for code lines between files, which usually happens much
> more often than file renaming.
>
>> scm>                 <s id="git">
>> scm>                     No. As detailed in the <a
>> scm>                         href="http://git.or.cz/gitwiki/GitFaq#rename-tracking">Git
>> scm>                         FAQ</a>:
>> scm>                     "Git has a rename command git mv, but that is just a
>> scm>                     convenience. The effect is indistinguishable from removing
>> scm>                     the file and adding another with different name and the
>> scm>                     same content."
>> scm>                 </s>
>>
>> This is of course NOT TRUE.  If the author bother checking (which
>> would be helped if there was available simple shell script, or simple
>> Perl script, testing 'intelligent_renames' criterion) he/she would
>> notice that git does apply change to renamed file, both if file
>> itself is renamed, and if directory it is in gets renamed.
>
> Sure. But it just demonstrates that the line of reasoning, which was
> clearly based on unstated assumption of how file-id tracking performs
> merge in this situation leads to the wrong conclusion for Git as it is
> the content tracking system, so Git does that differently.
>
> Perhaps, it would make sense to extend GitFaq to better cover that
> point, because people with other SCM background could easily conclude
> that Git cannot do "intelligent merge" after reading about git-mv.
>
>> scm>         <section id="changesets">
>> scm>             <title>Changesets' Support</title>
>> scm>             <expl>
>> scm>                 Does the repository support changesets? Changesets are a way
>> scm>                 to group a number of modifications that are relevant to each
>> scm>                 other in one atomic package, that can be cancelled or
>> scm>                 propagated as needed.
>> scm>             </expl>
>>
>> Here it is not entirely clean what creator of "Better SCM Initiative"
>> comparison table had on mind, what he meant by this.  Not all version
>> control systems are changeset based; some are snapshot based.  I guess
>> that for snapshot based SCM the above requirement is equivalent to
>> "Whole tree commits".
>
> Yes, it is irrelevant to being changeset or snapshot based. It is
> whether modification to more than one file can be commited (and
> propogated) atomically. I also suppose that those changes should be
> shown in history as a single change (not many changes too different
> files that took place in the same time and the same commit comment).
>
> However, the whole tree commit is a more strict requirement than
> just being able to commit a group of changes atomically. For example,
> "svn ci" creates a changeset and atomically store all its modification
> on the server. Yet, it is not the whole tree commit, because the result
> tree may differ from the tree that you commiting (files that are not
> modified by changeset may differ).
>
>> scm>                 <s id="git">
>> scm>                     Yes, Changesets are supported,
>> scm>                     and there's some flexibility in creating them.
>> scm>                 </s>
>> scm>            </compare>
>> scm>         </section>
>>
>> [Again, Git part was re-wrapped for better readibility]
>>
>> In my opition, such an _empty_ addition ("there's some flexibility in
>> creating them") is totally unnecessary; it adds no solid information
>> (what does it mean "some flexibility") and should be removed.
>
> Agreed. I suspect the author implied by that Git allows to stage
> and commit separately chunk without commiting the whole file.
> Yet, as it is worded above, it is useless.
>
>> scm>         <section id="tracking_uncommited_changes">
>> scm>             <title>Tracking Uncommited Changes</title>
>> scm>             <expl>
>> scm>                 Does the software have an ability to track the changes in the
>> scm>                 working copy that were not yet committed to the repository?
>> scm>             </expl>
>>
>> This also should be made more clean.  Does it mean for example ability
>> to tell which files have changed, or ability to diff working copy to
>> either last comitted changes, or to any revision available in repository?
>
> Also, ability to diff one or more specified files in the working copy to
> some specified revision.
>
>> scm>     <section id="technical_status">
>> scm>         <title>Technical Status</title>
>> scm>         <section id="documentation">
>> scm>             <title>Documentation</title>
>> scm>             <expl>
>> scm>                 How well is the system documented? How easy is it to
>> scm>                 get started using it?
>> scm>             </expl>
>> scm>             <compare>
>> scm>                 <s id="git">
>> scm>                     Medium. The short help is too terse and obscure.
>> scm>                     The man pages are extensive, but tend to be confusing.
>> scm>                     The are many tutorials.
>> scm>                 </s>
>> scm>             </compare>
>> scm>         </section>
>>
>> That of course depends on your opinion.  I would say "Good", now that
>> there is "Git User's Manual" distributed with Git, and now that there
>> started semi-official "Git Community Book" (http://book.git-scm.com).
>
> Interesting that versioncontrolblog, which, if I am not mistaken, is
> Alexey's site, states for Git Documentation:
>
> | Good. There is extensive documentation for every command, and many
> | tutorials.
>
> http://www.versioncontrolblog.com/comparison/Git/index.html
>
> So, I am not sure were the word "Medium" came from.
>
>
> Dmitry
>



-- 
Алексей Махоткин
http://squadette.ru/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-14 14:43 ` Dmitry Potapov
  2008-09-14 15:09   ` Alexey Mahotkin
@ 2008-09-14 17:48   ` Jakub Narebski
  2008-09-14 19:48     ` Dmitry Potapov
  1 sibling, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2008-09-14 17:48 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git, Alexey Mahotkin

On Sun, 14 Sep 2008, Dmitry Potapov wrote:
> 
> I have added Alexey Mahotkin in CC, who is allegedly the author of that
> information about Git that you can read on the better-scm site.

Did you forwarded him my original email, I should I do it?

I'm sorry about that, I should have tried to find original author
of Git entry on Better SCM Initiative comparison table, and CC him.
As excuse I have the fact that while instructions how to add new
SCM to the table are easy to find and quite detailed, this is not
the case for trying to correct information in it, including how to
find original authors.

By the way, perhaps I haven't stressed this like I should, but the
most important thing I wanted to ask git list about, besides insight
in my comments and proposal for changes, was if it is possible
using current hooks infrastructure to restrict changes coming from
some account in such a way as to allow it only if all changes are
restricted to specified directory.  In short if "Repository Permissions"
_can_ be implemented on per-directory and/or per-file basis (never
mind the fact that it is better to use social means rather than
technical restrictions, and distributed SCM are good at social solution
of a 'honors system').

> On Sat, Sep 13, 2008 at 07:06:16PM +0200, Jakub Narebski wrote:
> > 
> > I have thought about trying yet another time... but Git was already
> > added; see http://better-scm.berlios.de/news/changes-2008-08-07/
> 
> Interesting, the site still mentions Git as missing in a few places.
> For instance, when you click on Git in the list of alternatives, you
> get this: http://better-scm.berlios.de/alternatives/git/
> and then when you got to FAQ, you can read this:
> 
> | The reason it's not there is that while many people have complained
> | about its absense, no one suitable has volunteered to become its
> | champion and supplied a good enough patch. [...]

Well, I have also noticed that, and I have planned on mentioning this
discrepancy when sending corrections for Git for SCM comparison table
to Shlomi Fish / Better SCM Initiative.

> | In addition to everything that was said here, it seems that the
> | originator and maintainer of the site and comparison is now banned
> | from sending messages to vger.kernel.org, which hosts several
> | Linux-kernel-related mailing-lists, including the git one. This has
> | interfered with some of his Linux-related open-source work, including
> | trying to find a "Better SCM" maintainer for git. This is unfortunate,
> | but changing this situation, is currently beyond his control.
> 
> Source: http://better-scm.berlios.de/faq/#git-missing
> 
> I am surprised to hear that Shlomi Fish is banned...

Well, anti-SPAM filter at vger.kernel.org is rather heavy handed; it
stopped a few legitimate posts of mine, and even a few patches.
However admins are responsive and reasonable, so I wonder why Shlomi
Fish couldn't solve this issue.

I wonder if it is perhaps a case of strange email headers, or
something...

> > scm>     <section id="repos_operations">
> > scm>         <title>Repository Operations</title>
> > scm>         <section id="atomic_commits">
> > scm>             <title>Atomic Commits</title>
> > scm>             <expl>
> > scm>                 Support for atomic commits means that if an
> > scm>                 operation on the repository is interrupted
> > scm>                 in the middle, the repository will not be
> > scm>                 left in an inconsistent state. Are the
> > scm>                 check-in operations atomic, or can
> > scm>                 interrupting an operation leave the
> > scm>                 repository in an intermediate state?
> > scm>             </expl>
> > 
> > Here I think the explanation of a criterion (feature) is clear enough.
> > I might have added that "interruption" include killing of a process
> > during for example commit, lack of disk space for a full commit, or
> > a network fail during network operation (fetch or push, or equivalent).
> 
> My initial reaction was to say that killing a process with -9 is not
> what you expect to see in practice, but a second later, I realized how
> wrong I was. Lack of memory may cause that the process gets killed with
> -9, and it has been observed in practice (at least, in case of Mercurial
> repo): http://norman.walsh.name/2007/08/09/mercurial

Well, kill -9 might be not very common, but interrupting for example
fetch process which looks to take too long, or some operation that you
found shouldn't be run, with ^C is I think more common.

> Another thing that is not clear in the above criterion is what exactly
> "inconsistent state" (or "intermediate state") means. For instance, if
> Git gets killed during commit, you may have to remove .git/index.lock
> manually. AFAIK, Mercury leaves the 'journal' file and you have to
> run "hg recovery". Does it mean that the commit is not atomic?

IMHO "atomic commits" (or rather "atomic operations", see below) is
about commit being either in full, or not done at all.  The fact
that SCM might need some manual steps to recover from failure shouldn't
factor in evaluating this criterion.

> Another thing here is that "git commit" is local, so I am not sure
> if this question includes network operations...

Well, I think this session would be better titled "Atomic Operations"
or just "Atomicity".  Although I'm not sure if for example in Git
all operations are atomic under all conditions...

But even if we leave it "Atomic Commits", as for centralized SCM
commit includes network operation, to have centralized and distributed
SCM on equal footing, for distributed SCM this in my opinion should
mean both atomic commit, and atomic push.  (And that should be stated
explicitly in the description...)

> > scm>         <section id="move">
> > scm>             <title>Files and Directories Moves or Renames</title>
> > scm>             <expl>
> > scm>                 Does the system support moving a file or directory to
> > scm>                 a different location while still retaining the history
> > scm>                 of the file? <b>Note:</b> also see the next section
> > scm>                 about intelligent merging of renamed paths.
> > scm>             </expl>
> > 
> > In my opinion this criterion is next to worthess without more in depth
> > clarification of what does it mean to "support" moves or renames; as
> > entries for different systems are written by different people, if it
> > is not clear how to check if some feature is supported, some might
> > write 'no' for some system A, and some other person can write 'yes'
> > for other system B, even if the support is better in system A than in
> > system B (and would be considered enough, i.e. 'yes' answer, by the
> > creator of this criterion).
> > 
> > For me the support for renames/moves and copying (see next section)
> > means that:
> > 
> >  0.) When examining or going to some point in the history (some old
> >      revision/version of a project) the state you get is _exactly_
> >      the same as it was at that time, exactly the same as it was
> >      recorded (comitted) then.
> > 
> >      For example tricks with moving *,v files in the CVS repository
> >      break this assertion.
> 
> IMHO, the above assertion is assumed when we talk about renaming, as
> the system that is not capable of that will not be qualified as an
> SCM. Yet, there is still plenty way to interpret the above criterion.
> Even in CVS, the history of the file does not disappear when you move
> a file. You can just write, this file move was renamed from old-name,
> so anyone can get old history without any problem. Of course, it will
> require some an additional step taken manually. But if the requirement
> is to see all log history with one $scm log command, you can just copy
> old log into log of a newly added file. Of course, you cannot run $scm
> annotate on that file and see who changed what line, but there is no
> such a requirement above.

I have written this obvious requirement (that it is "obvious" can be
seen in the fact that it got number '0' -- zero -- and not '1'; but
perhaps it was too subtle) to exclude CVS "support" for renames by
copying *,v file to new name, as after this operation if you would
checkout old revision you would get extra file, the new name one.
So this 'solution' wouldn't satisfy this requirement.

> So, I agree, it should be better defined.

Nice to be in agreement.

It would be best if there was some automated test for each criterion,
or at least description how to check if SCM fulfils it; not necessary
visible in the table by default.

> >  1.) When examining history of a project as a whole version control
> >      system tells you that file was renamed (moved). I would consider
> >      having there renaming represented as copy + delete to be only
> >      a partial support of this feature.
> 
> If files moving is interpreted in the sense of preserving the old history
> then copy + delete fully satisfies that criterion.
> 
> However, if you defined support of file movement as ability to see that
> some file when you look at the history of the whole project then
> certainly copy + delete representation would not satisfy it.

Well, I would consider 'copy + delete' as "good enough" solution; not
perfect, but enough for most cases.

> So, perhaps, it should be two separate points:
> - ability to preserve history of rename (with detail clarification
>   of what it means)
> - ability to show renames in the project history

That are points '1' and '2' on my list, perhaps stated bit differently:
showing renames in full history / history of project as whole, and
following history of a single file across renames.

> > scm>                 <s id="git">
> > scm>                     Renames are supported for most practical
> > scm>                     purposes.  Git even detects renames when a file has been
> > scm>                     changed afterward the rename.  However, due to a peculiar
> > scm>                     repository structure, renames are not recorded
> > scm>                     explicitly, and Git has to deduce them (which works well
> > scm>                     in practice).
> > scm>                 </s>
[...]

> > Second, we can think about how the above statement could be improved.
> > 
> 
> <long and detail explanation of how git works>
> 
> > 
> > ...Now only put the above in a few short sentences to be used in
> > "Better SCM Initiative" comparison table...
> 
> Git tracks content rather than file-ids, and therefore it uses heuristics
> for rename detection.  This approach has an advantage of being able to
> preserve history for code lines between files, which usually happens much
> more often than file renaming.

I would rather write

  Renames are supported for most practical purposes[1]. By design Git
  does heuristic <i>rename detection</i> (based on similarity score of
  pathnames and file contents), instead of doing rename tracking (which
  usually is based on some kind of file-ids).  This approach allows for
  more generic content tracking of code movement (which usually happens
  much often than wholesame file renaming), e.g. in "git blame -C -C".

  Footnotes:
  [1] "git log --follow <i>filename</i>" works only for very simple
      history currently; rename detection gets confused by empty files
      and files consisting mainly of boilerplate (e.g. license text).

But this also could be I think improved.

BTW. I wouldn't mention problem with 'new files in renamed directories',
which is IMHO separate issue, and contrary to what Mark Shuttleworth
wrote in "Renaming is the killer app of distributed version control"
(http://www.markshuttleworth.com/archives/123) that part of rename
support is not that important.  Especially that I doubt that it was
tested / checked for other SCM in the table.

> > scm>                 <s id="git">
> > scm>                     No. As detailed in the <a
> > scm>                         href="http://git.or.cz/gitwiki/GitFaq#rename-tracking">Git
> > scm>                         FAQ</a>:
> > scm>                     "Git has a rename command git mv, but that is just a
> > scm>                     convenience. The effect is indistinguishable from removing
> > scm>                     the file and adding another with different name and the
> > scm>                     same content."
> > scm>                 </s>
> > 
> > This is of course NOT TRUE.  If the author bother checking (which
> > would be helped if there was available simple shell script, or simple
> > Perl script, testing 'intelligent_renames' criterion) he/she would
> > notice that git does apply change to renamed file, both if file
> > itself is renamed, and if directory it is in gets renamed.
> 
> Sure. But it just demonstrates that the line of reasoning, which was
> clearly based on unstated assumption of how file-id tracking performs
> merge in this situation leads to the wrong conclusion for Git as it is
> the content tracking system, so Git does that differently.

Well, if you are not sure, test it.  I did this; admittedly you has
to take care for your test files to have some more content to better
resemble real-life examples for Git contents similarity based rename
detection to work.

Side note: I _think_ that usually if rename detection fails, 3-way
merge would fail too, where by fail I mean the situation where it
is hard to resolve textual conflict, not that there is conflict.

> Perhaps, it would make sense to extend GitFaq to better cover that
> point, because people with other SCM background could easily conclude
> that Git cannot do "intelligent merge" after reading about git-mv.

That is a fact that the section in GitFaq about tracking renames
http://git.or.cz/gitwiki/GitFaq#rename-tracking should be cleaned up
(for example "git annotate" is now just a convenient shortcut to
"git blame -c"; it is no longer alternate implementation), and perhaps
even rewritten.  I think that the whole reasoning about rename tracking
vs rename detection, perhaps with famous Linus Torvalds post on that
matter (if I can find it) should be put on separate page, as it will
be I think quite lenghty, while GitFaq should contain only bare-bones
consequences of such decision (and link to details).

> > scm>         <section id="changesets">
> > scm>             <title>Changesets' Support</title>
> > scm>             <expl>
> > scm>                 Does the repository support changesets? Changesets are a way
> > scm>                 to group a number of modifications that are relevant to each
> > scm>                 other in one atomic package, that can be cancelled or 
> > scm>                 propagated as needed.
> > scm>             </expl>
> > 
> > Here it is not entirely clean what creator of "Better SCM Initiative"
> > comparison table had on mind, what he meant by this.  Not all version
> > control systems are changeset based; some are snapshot based.  I guess
> > that for snapshot based SCM the above requirement is equivalent to
> > "Whole tree commits".
> 
> Yes, it is irrelevant to being changeset or snapshot based.

Errr... what I meant to say that description is clearly inspired by
changeset based SCM, but the criterion is important also for snapshot
based; it is simply not without any doubts that it refers to the
visible UI feature, and not underlying repository design (engine).

> It is whether modification to more than one file can be commited (and
> propogated) atomically. I also suppose that those changes should be
> shown in history as a single change (not many changes too different
> files that took place in the same time and the same commit comment).
> 
> However, the whole tree commit is a more strict requirement than
> just being able to commit a group of changes atomically. For example,
> "svn ci" creates a changeset and atomically store all its modification
> on the server. Yet, it is not the whole tree commit, because the result
> tree may differ from the tree that you commiting (files that are not
> modified by changeset may differ).

I think the gist of intent was to have revisions (revision identifiers)
refer to the state of a whole repository (or to changes to the set of
files as a whole: a changeset).

I agree that "whole tree commit" has slightly different semantics than
"supporting changesets".

> > scm>                 <s id="git">
> > scm>                     Yes, Changesets are supported, 
> > scm>                     and there's some flexibility in creating them.
> > scm>                 </s>
> > scm>            </compare>
> > scm>         </section>
> > 
> > [Again, Git part was re-wrapped for better readibility]
> > 
> > In my opition, such an _empty_ addition ("there's some flexibility in
> > creating them") is totally unnecessary; it adds no solid information
> > (what does it mean "some flexibility") and should be removed.
> 
> Agreed. I suspect the author implied by that Git allows to stage
> and commit separately chunk without commiting the whole file.
> Yet, as it is worded above, it is useless.

Hmmm... now that you stated that possibility I see how this wording can
refer to it.  Nevertheless it is irrelevant to the criterion discussed;
also Darcs which also has extra flexibility (chunk selection in
interactive git-add / git-commit were inspired by Darcs feature) doesn't
have any extra wording.

> > scm>         <section id="tracking_uncommited_changes">
> > scm>             <title>Tracking Uncommited Changes</title>
> > scm>             <expl>
> > scm>                 Does the software have an ability to track the changes in the
> > scm>                 working copy that were not yet committed to the repository?
> > scm>             </expl>
> > 
> > This also should be made more clean.  Does it mean for example ability
> > to tell which files have changed, or ability to diff working copy to
> > either last comitted changes, or to any revision available in repository?
> 
> Also, ability to diff one or more specified files in the working copy to
> some specified revision.

Right.

I'm not sure now if "Tracking Uncommitted Changes" is a good name for
this feature / criterion, but I don't have definite idea for change...

> > scm>     <section id="technical_status">
> > scm>         <title>Technical Status</title>
> > scm>         <section id="documentation">
> > scm>             <title>Documentation</title>
> > scm>             <expl>
> > scm>                 How well is the system documented? How easy is it to
> > scm>                 get started using it?
> > scm>             </expl>
> > scm>             <compare>
> > scm>                 <s id="git">
> > scm>                     Medium. The short help is too terse and obscure.
> > scm>                     The man pages are extensive, but tend to be confusing.
> > scm>                     The are many tutorials.
> > scm>                 </s>
> > scm>             </compare>
> > scm>         </section>
> > 
> > That of course depends on your opinion.  I would say "Good", now that
> > there is "Git User's Manual" distributed with Git, and now that there
> > started semi-official "Git Community Book" (http://book.git-scm.com).
> 
> Interesting that versioncontrolblog, which, if I am not mistaken, is
> Alexey's site, states for Git Documentation:
> 
> | Good. There is extensive documentation for every command, and many
> | tutorials.
> 
> http://www.versioncontrolblog.com/comparison/Git/index.html
> 
> So, I am not sure were the word "Medium" came from.

Backward compatibility^W^W Old impressions die hard, I would guess...
And the meme that git documentation is not user friendly is hard to
fight.

Thank you very much for your comments.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-14 17:48   ` Jakub Narebski
@ 2008-09-14 19:48     ` Dmitry Potapov
  2008-09-14 21:06       ` Shawn O. Pearce
  2008-09-15  0:37       ` Jakub Narebski
  0 siblings, 2 replies; 9+ messages in thread
From: Dmitry Potapov @ 2008-09-14 19:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Alexey Mahotkin

On Sun, Sep 14, 2008 at 07:48:05PM +0200, Jakub Narebski wrote:
> 
> By the way, perhaps I haven't stressed this like I should, but the
> most important thing I wanted to ask git list about, besides insight
> in my comments and proposal for changes, was if it is possible
> using current hooks infrastructure to restrict changes coming from
> some account in such a way as to allow it only if all changes are
> restricted to specified directory. 

I believe the update hook should be able to do that. You have oldrev and
newrev, so you can run "git diff --name-only oldrev newrev" and see what
files are going to change. And then verify that the user has the write
access to this directories or files.

I have not tried it yet, and I don't think we have a ready example of
how to do that, but I believe that the example of the update hook that
restricts user access based on the target branch can be used as a
starting point.

> 
> IMHO "atomic commits" (or rather "atomic operations", see below) is
> about commit being either in full, or not done at all.  The fact
> that SCM might need some manual steps to recover from failure shouldn't
> factor in evaluating this criterion.

Agreed. Your wording makes much more sense than what was used in the
above criterion.

> 
> > Another thing here is that "git commit" is local, so I am not sure
> > if this question includes network operations...
> 
> Well, I think this session would be better titled "Atomic Operations"
> or just "Atomicity".  Although I'm not sure if for example in Git
> all operations are atomic under all conditions...

I believe that all git basic operations are atomic. In fact, you either
got a new revision with new SHA-1 or don't. Aborting operation may
leave some dangling objects, but it is okay, because they will be
garbage collected later. But I am not sure about additional utilities
such as git-svn. Git-svn uses rebase as it dcommits, being interrupted,
it may leave you in some strange state. It is possible to recover but
it may be not obvious for newbies. Other than that, I think everything
is very resilient to any interruption.

> 
> But even if we leave it "Atomic Commits", as for centralized SCM
> commit includes network operation, to have centralized and distributed
> SCM on equal footing, for distributed SCM this in my opinion should
> mean both atomic commit, and atomic push.  (And that should be stated
> explicitly in the description...)

Agreed.

> 
> > So, perhaps, it should be two separate points:
> > - ability to preserve history of rename (with detail clarification
> >   of what it means)
> > - ability to show renames in the project history
> 
> That are points '1' and '2' on my list, perhaps stated bit differently:
> showing renames in full history / history of project as whole, and
> following history of a single file across renames.

I did not mean '1' and '2' as priorities, but that it is slightly
different features and both can be titled as support of renaming.

> > 
> > Git tracks content rather than file-ids, and therefore it uses heuristics
> > for rename detection.  This approach has an advantage of being able to
> > preserve history for code lines between files, which usually happens much
> > more often than file renaming.
> 
> I would rather write
> 
>   Renames are supported for most practical purposes[1]. By design Git
>   does heuristic <i>rename detection</i> (based on similarity score of
>   pathnames and file contents), instead of doing rename tracking (which
>   usually is based on some kind of file-ids).  This approach allows for
>   more generic content tracking of code movement (which usually happens
>   much often than wholesame file renaming), e.g. in "git blame -C -C".

Sounds good to me. Perhaps, I would drop '(which usually... file-ids)'
to make the sentence a bit shorter.

> > It is whether modification to more than one file can be commited (and
> > propogated) atomically. I also suppose that those changes should be
> > shown in history as a single change (not many changes too different
> > files that took place in the same time and the same commit comment).
> > 
> > However, the whole tree commit is a more strict requirement than
> > just being able to commit a group of changes atomically. For example,
> > "svn ci" creates a changeset and atomically store all its modification
> > on the server. Yet, it is not the whole tree commit, because the result
> > tree may differ from the tree that you commiting (files that are not
> > modified by changeset may differ).
> 
> I think the gist of intent was to have revisions (revision identifiers)
> refer to the state of a whole repository (or to changes to the set of
> files as a whole: a changeset).

Perhaps. Though, then SVN should have 'Yes', but it listed as "partial
support." It seems this feature also implies ability to create a
changset and propagate as separated steps. Yet, svk is listed "Same as
subversion." So, I don't really know what this support of changesets
really means.

> 
> Hmmm... now that you stated that possibility I see how this wording can
> refer to it.  Nevertheless it is irrelevant to the criterion discussed;

Agreed.

> 
> > > scm>         <section id="tracking_uncommited_changes">
> > > scm>             <title>Tracking Uncommited Changes</title>
> > > scm>             <expl>
> > > scm>                 Does the software have an ability to track the changes in the
> > > scm>                 working copy that were not yet committed to the repository?
> > > scm>             </expl>
> > > 
> > > This also should be made more clean.  Does it mean for example ability
> > > to tell which files have changed, or ability to diff working copy to
> > > either last comitted changes, or to any revision available in repository?
> > 
> > Also, ability to diff one or more specified files in the working copy to
> > some specified revision.
> 
> Right.
> 
> I'm not sure now if "Tracking Uncommitted Changes" is a good name for
> this feature / criterion, but I don't have definite idea for change...

Actually, I don't like this name either. In particular, the word
"tracking". Perhaps, "Showing Uncommitted Changes" would be a better
name. Yet, ability to show diff between the working copy as some
arbitrary version should be listed as a separate feature.

> > So, I am not sure were the word "Medium" came from.
> 
> Backward compatibility^W^W Old impressions die hard, I would guess...
> And the meme that git documentation is not user friendly is hard to
> fight.

Yes. And, frankly, earlier versions of it were not very user friendly.
When you looked at some porcelain command and it referred you to
plumbing to see what options it takes, that gave very bad impression.
Now, this and similar shortcomings are corrected, but as you said the
old impressions die hard...

Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-14 19:48     ` Dmitry Potapov
@ 2008-09-14 21:06       ` Shawn O. Pearce
  2008-09-14 21:29         ` Jakub Narebski
  2008-09-15  0:37       ` Jakub Narebski
  1 sibling, 1 reply; 9+ messages in thread
From: Shawn O. Pearce @ 2008-09-14 21:06 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Jakub Narebski, git, Alexey Mahotkin

Dmitry Potapov <dpotapov@gmail.com> wrote:
> On Sun, Sep 14, 2008 at 07:48:05PM +0200, Jakub Narebski wrote:
> > 
> > [...] if it is possible
> > using current hooks infrastructure to restrict changes coming from
> > some account in such a way as to allow it only if all changes are
> > restricted to specified directory. 
> 
> I believe the update hook should be able to do that. You have oldrev and
> newrev, so you can run "git diff --name-only oldrev newrev" and see what
> files are going to change. And then verify that the user has the write
> access to this directories or files.
> 
> I have not tried it yet, and I don't think we have a ready example of
> how to do that, but I believe that the example of the update hook that
> restricts user access based on the target branch can be used as a
> starting point.

contrib/hooks/update-paranoid can do both branch and file path
level restrictions.  I used it at my prior day-job to prevent some
accidental changes from folks who didn't usually need to modify
certain parts of the repository.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-14 21:06       ` Shawn O. Pearce
@ 2008-09-14 21:29         ` Jakub Narebski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Narebski @ 2008-09-14 21:29 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Dmitry Potapov, git, Alexey Mahotkin

On Sun, 14 Sep 2008, Shawn O. Pearce wrote:
> Dmitry Potapov <dpotapov@gmail.com> wrote:
>> On Sun, Sep 14, 2008 at 07:48:05PM +0200, Jakub Narebski wrote:
>>> 
>>> [...] if it is possible
>>> using current hooks infrastructure to restrict changes coming from
>>> some account in such a way as to allow it only if all changes are
>>> restricted to specified directory. 
>> 
>> I believe the update hook should be able to do that. You have oldrev and
>> newrev, so you can run "git diff --name-only oldrev newrev" and see what
>> files are going to change. And then verify that the user has the write
>> access to this directories or files.
>> 
>> I have not tried it yet, and I don't think we have a ready example of
>> how to do that, but I believe that the example of the update hook that
>> restricts user access based on the target branch can be used as a
>> starting point.
> 
> contrib/hooks/update-paranoid can do both branch and file path
> level restrictions.  I used it at my prior day-job to prevent some
> accidental changes from folks who didn't usually need to modify
> certain parts of the repository.

Could you then update contrib/hooks/update-paranoid documentation?
It talks only about branch level restrictions (created, delete, 
fast-forward, forced update for a given ref class).

Thanks in advance
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Git at Better SCM Initiative comparison of VCS (long)
  2008-09-14 19:48     ` Dmitry Potapov
  2008-09-14 21:06       ` Shawn O. Pearce
@ 2008-09-15  0:37       ` Jakub Narebski
  1 sibling, 0 replies; 9+ messages in thread
From: Jakub Narebski @ 2008-09-15  0:37 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git, Alexey Mahotkin

On Sun, 14 Sep 2008, Dmitry Potapov wrote:
> On Sun, Sep 14, 2008 at 07:48:05PM +0200, Jakub Narebski wrote:

> > > Another thing here is that "git commit" is local, so I am not sure
> > > if this question includes network operations...
> > 
> > Well, I think this session would be better titled "Atomic Operations"
> > or just "Atomicity".  Although I'm not sure if for example in Git
> > all operations are atomic under all conditions...
> 
> I believe that all git basic operations are atomic. In fact, you either
> got a new revision with new SHA-1 or don't. Aborting operation may
> leave some dangling objects, but it is okay, because they will be
> garbage collected later. But I am not sure about additional utilities
> such as git-svn. Git-svn uses rebase as it dcommits, being interrupted,
> it may leave you in some strange state. It is possible to recover but
> it may be not obvious for newbies. Other than that, I think everything
> is very resilient to any interruption.

I was thinking here about long-lasting and multiple-parts operations
like for example git-clone.  Nevertheless we would never be in
inconsistent state.
 
> > > So, perhaps, it should be two separate points:
> > > - ability to preserve history of rename (with detail clarification
> > >   of what it means)
> > > - ability to show renames in the project history
> > 
> > That are points '1' and '2' on my list, perhaps stated bit differently:
> > showing renames in full history / history of project as whole, and
> > following history of a single file across renames.
> 
> I did not mean '1' and '2' as priorities, but that it is slightly
> different features and both can be titled as support of renaming.

I didn't mean '1' and '2' as priorities; they are more or less equal,
although I would say that '1' might be prerequisite to '2'.  '0' is
however a base which must be satisfied for tool to be named to have
"rename support".
 
> > > 
> > > Git tracks content rather than file-ids, and therefore it uses heuristics
> > > for rename detection.  This approach has an advantage of being able to
> > > preserve history for code lines between files, which usually happens much
> > > more often than file renaming.
> > 
> > I would rather write
> > 
> >   Renames are supported for most practical purposes[1]. By design Git
> >   does heuristic <i>rename detection</i> (based on similarity score of
> >   pathnames and file contents), instead of doing rename tracking (which
> >   usually is based on some kind of file-ids).  This approach allows for
> >   more generic content tracking of code movement (which usually happens
> >   much often than wholesame file renaming), e.g. in "git blame -C -C".
> 
> Sounds good to me. Perhaps, I would drop '(which usually... file-ids)'
> to make the sentence a bit shorter.

O.K.

(But I would wait a bit for final proposal, with sending patch for
scm-comparison.xml to Alexey and Shlomi.)

> > > > scm>         <section id="tracking_uncommited_changes">
> > > > scm>             <title>Tracking Uncommited Changes</title>
> > > > scm>             <expl>
> > > > scm>                 Does the software have an ability to track the changes in the
> > > > scm>                 working copy that were not yet committed to the repository?
> > > > scm>             </expl>
> > > > 
> > > > This also should be made more clean.  Does it mean for example ability
> > > > to tell which files have changed, or ability to diff working copy to
> > > > either last comitted changes, or to any revision available in repository?
> > > 
> > > Also, ability to diff one or more specified files in the working copy to
> > > some specified revision.
> > 
> > Right.
> > 
> > I'm not sure now if "Tracking Uncommitted Changes" is a good name for
> > this feature / criterion, but I don't have definite idea for change...
> 
> Actually, I don't like this name either. In particular, the word
> "tracking". Perhaps, "Showing Uncommitted Changes" would be a better
> name. Yet, ability to show diff between the working copy as some
> arbitrary version should be listed as a separate feature.

I don't have good name either. It is <something> about Uncommitted Changes.
Dealing with, or support for, or something...
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Git at Better SCM Initiative comparison of VCS (long)
  2008-09-13 17:06 Git at Better SCM Initiative comparison of VCS (long) Jakub Narebski
  2008-09-14 14:43 ` Dmitry Potapov
@ 2008-10-01 18:45 ` Jakub Narebski
  1 sibling, 0 replies; 9+ messages in thread
From: Jakub Narebski @ 2008-10-01 18:45 UTC (permalink / raw)
  To: git, Alexey Mahotkin; +Cc: Dmitry Potapov, Shawn O. Pearce

I think that the information about Git at Better SCM Initiative
comparison of various version control systems

  http://better-scm.berlios.de/comparison/comparison.html

used also for versioncontrolblog "Version control systems comparison"

  http://versioncontrolblog.com/comparison

needs a few corrections.

Therefore below you can find proposed changes, as discussed here on
this mailing list (this time CC-ing author of Git entry, Alexey
Mahotkin of versioncontrolblog).  For each section (each compared
feature) I have put description, entry for one or more open source
distributed version control systems (for reference) and current entry
for Git, in email-quote like format (prefixed with "scm> "), then
proposed corrected entry (if it needs correction) and some comments
about it.

Contrary to previous post in this thread I did not comment this time
on the comparison itself, for example not well defined criteria, and
lack of tests or functional/use case description which would help to
check if SCM supports given feature.

I think this could be basis for Alexey to send corrections to
Better SCM comparison table.

Please comment

scm>         <timestamp>
scm>             $Id: scm-comparison.xml 322 2008-08-09 05:47:26Z shlomif $
scm>         </timestamp>

Here just so you know on which version it is based on.

scm>     <section id="repos_operations">
scm>         <title>Repository Operations</title>
scm>         <section id="atomic_commits">
scm>             <title>Atomic Commits</title>
scm>             <expl>
scm>                 Support for atomic commits means that if an
scm>                 operation on the repository is interrupted
scm>                 in the middle, the repository will not be
scm>                 left in an inconsistent state. Are the
scm>                 check-in operations atomic, or can
scm>                 interrupting an operation leave the
scm>                 repository in an intermediate state?
scm>             </expl>
scm>             <compare>
scm>                 <s id="cvs">No. CVS commits are not atomic.</s>
scm>                 <s id="bazaar">Yes. Commits are atomic.</s>
scm>                 <s id="mercurial">Yes.</s>
scm>                 <s id="git">Yes. Commits are atomic.</s>
scm>             </compare>
scm>         </section>

No comment here (beside the fact that table neds some unification, but
this is hardly the case for Better SCM comparison Git entry maintainer).

scm>         <section id="move">
scm>             <title>Files and Directories Moves or Renames</title>
scm>             <expl>
scm>                 Does the system support moving a file or directory to
scm>                 a different location while still retaining the history
scm>                 of the file? <b>Note:</b> also see the next section
scm>                 about intelligent merging of renamed paths.
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">Yes. Renames are supported for files and directories.</s>
scm>                 <s id="mercurial">Yes. Renames are supported.</s>
scm>                 <s id="git">
scm>                     Renames are supported for most practical
scm>                     purposes.  Git even detects renames when a file has been
scm>                     changed afterward the rename.  However, due to a peculiar
scm>                     repository structure, renames are not recorded
scm>                     explicitly, and Git has to deduce them (which works well
scm>                     in practice).
scm>                 </s>
scm>            </compare>
scm>         </section>

I would propose to change it to something like the followig:

  Renames are supported for most practical purposes[1]. By design Git
  does heuristic <i>rename detection</i> (based on similarity score of
  pathnames and file contents), instead of doing rename tracking.  This
  approach allows for more generic content tracking of code movement
  (which usually happens much often than wholesame file renaming),
  e.g. in "git blame -C -C".

  Footnotes:
  [1] "git log --follow <i>filename</i>" works only for very simple
      history currently; rename detection can get confused by empty files
      and files consisting mainly of boilerplate (e.g. license text).

Comments:

I am a bit unsure about need for footnote.  Perhaps it should be simply
removed.  Also, perhaps instead of "Renames are supported..." it should
be "File renames are supported...", although it looks like it would
improve soon: see thread on git mailing list about detecting wholesale
firectory renames (with ability to track directory splitting, like now
git is able to track contents movement across files).

scm>         <section id="intelligent_renames">
scm>             <title>Intelligent Merging after Moves or Renames</title>
scm>             <expl>
scm>                 If the system keeps tracks of renames, does it support
scm>                 intelligent merging of the files in the history after
scm>                 the rename? (For example, changing a file in a renamed
scm>                 directory, and trying to merge it).
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">Yes. Renames are intelligent.</s>
scm>                 <s id="mercurial">
scm>                     No. <a 
scm>                         href="http://hgbook.red-bean.com/hgbookch5.html#x9-1030005.4">the
scm>                         Mercurial book says:</a>
scm>                     "When you use the 'hg rename' command, Mercurial makes a 
scm>                     copy of each source file, then deletes it and marks the
scm>                     file as removed. "
scm>                 </s>
scm>                 <s id="git">
scm>                     No. As detailed in the <a
scm>                         href="http://git.or.cz/gitwiki/GitFaq#rename-tracking">Git
scm>                         FAQ</a>:
scm>                     "Git has a rename command git mv, but that is just a
scm>                     convenience. The effect is indistinguishable from removing
scm>                     the file and adding another with different name and the
scm>                     same content."
scm>                 </s>
scm>            </compare>
scm>         </section>

This entry is incorrect[1]; it has to be changed to something like the
following proposal:

  Yes, Git can deal with renamed files during merging, thanks to rename
  detection.

Comments:

Git does apply change to renamed file, both if file itself is renamed,
and if directory it is in gets renamed (like in example in feature
description).  What Git _currently_ doesn't support (at least for now,
with lack of detection of directories as a whole; this might change
soon) is with adding new files to the renamed directory: if one side
renamed directory and second side added new files in the old directory,
those new files would show at old name, not at new name.

There also might be two kinds of problems: first, if you are merging
old and/or much diverged branch rename detection can use much of CPU
power even if there are no renames present (this happened once or
twice).  Second, if similarity based rename detection fail you would
not get conflict and will be left with two versions of a file in
working directory; this might happen for example if you renamed some
'sensitive' binary file (so small change results in large change in
representation) or change is too large compared to whole contents of 
a file.  In second case it is expected that requested from time to time
advisory rename tracking would help.

Hmmm... I wonder if the above problems with rename detection happen
more often than problems with, usually file-id based, rename tracking
used by other SCM.

Footnote:
=========
[1] It looks like we really need http://git.or.cz/gitwiki/FileRenames

BTW. I wonder if Mercurial entry isn't incorrect either...

scm>         <section id="copy">
scm>             <title>File and Directories Copies</title>
scm>             <expl>
scm>                 Does the version control system support copying
scm>                 files or directories to a different location at the
scm>                 repository level, while retaining the history?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">No. Copies are not supported.</s>
scm>                 <s id="mercurial">Yes. Copies are supported</s>
scm>                 <s id="git">No.  Copies are not supported.</s>
scm>            </compare>
scm>         </section>

I would write that:

  Copies detection is supported, but for performance reasons it is not
  enabled by default.

Comments:

Perhaps we could mention here that there are two thresholds of copying
detection: checking only changed files and --detect-copies-harder. Also
worth of note might be the fact that git-blame has support for
detecting code copying, also across filenames.

scm>         <section id="repos_clone">
scm>             <title>Remote Repository Replication</title>
scm>             <expl>
scm>                 Does the system support cloning a remote repository to get
scm>                 a functionally equivalent copy in the local system? That 
scm>                 should be done without any special access to the remote 
scm>                 server except for normal repository access.
scm>             </expl>
scm>             <compare>
scm>                 <s id="mercurial">Yes.</s>
scm>                 <s id="git">Yes.  This is very intrinsic feature of Git.</s>
scm>             </compare>
scm>         </section>

In fact this is 'very intrinsic feature' of each distributed SCM...
In short: I think that simple 'Yes.' answer for Git would be better.

scm>         <section id="push">
scm>             <title>Propagating Changes to Parent Repositories</title>
scm>             <expl>
scm>                 Can the system propagate changes from one repository to 
scm>                 another?
scm>             </expl>
scm>             <compare>
scm>                 <s id="mercurial">Yes.</s>
scm>                 <s id="git">Yes.  (The Linux kernel development process uses this extremely often).</s>
scm>             </compare>
scm>         </section>

Again I think that simple "Yes" would be sufficient.  The sentence in
parentheses doesn't bring any new information, so IMHO should be
removed.  Especially that Linux kernel uses also patch based wokflow 
very extensively.

scm>         <section id="permissions">
scm>             <title>Repository Permissions</title>
scm>             <expl>
scm>                 Is it possible to define permissions on access to different
scm>                 parts of a remote repository? Or is access open for all? 
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">
scm>                     Basic access control can be implemented through a
scm>                     contributed hook script.  ACL support for the
scm>                     Bazaar server is planned.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     Yes. It is possible to lock down repositories,
scm>                     subdirectories, or files using hooks.
scm>                 </s>
scm>                 <s id="git">
scm>                     No, but a single server can serve many repositories.
scm>                     Also, UNIX permissions can be used to some extent.
scm>                 </s>

Line-wrapped for better readibility.

scm>             </compare>
scm>         </section>

I think the answer should be here:

  Yes. It is possible to lock down repositories, branches, 
  subdirectories, or files using hooks (see for example
  <a href="">contrib/hooks/paranoid</a> example hook).

Comments:

Shawn, could you _please_ make documentation of 'paranoid' contrib hook
complete by documenting how to configure it to lock down directories
or files?

As the feature seems to be more about fine-grained access control 
(by the way something that Karl Fogel in his "Prodicting OSS..." book
is against, as better solved by socual and not technological means)
I have not mentioned here things like web server permissions for WebDAV
access, UNIX file permissions, or tools like Gitosis or ssh_acl, which
are about access to repository as a whole.

scm>         <section id="changesets">
scm>             <title>Changesets' Support</title>
scm>             <expl>
scm>                 Does the repository support changesets? Changesets are a way
scm>                 to group a number of modifications that are relevant to each
scm>                 other in one atomic package, that can be cancelled or 
scm>                 propagated as needed.
scm>             </expl>
scm>             <compare>
scm>                 <s id="darcs">
scm>                     Yes. Changesets are supported.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     Yes. Changesets are supported.
scm>                 </s>
scm>                 <s id="git">
scm>                     Yes, Changesets are supported,
scm>                     and there's some flexibility in creating them.
scm>                 </s>

Line wrapped for better eradibility.

scm>            </compare>
scm>         </section>

In my opition, such an _empty_ addition ("there's some flexibility in
creating them") is totally unnecessary; it adds no solid information
(what does it mean "some flexibility") and should be removed.

I think the entry should simply state "Yes. Changesets are supported.";
see for example entry for Darcs (from which Iu the idea of having
"git add --interactive" was taken from).

Comments:

Description of this feature is not entirely clean, but I think
everybody knows what it meant to mean.

scm>         <section id="annotate">
scm>             <title>Tracking Line-wise File History</title>
scm>             <expl>
scm>                 Does the version control system have an option to track the
scm>                 history of the file line-by-line? I.e., can it show for each line
scm>                 at which revision it was most recently changed, and by whom?
scm>             </expl>
scm>             <compare>
scm>                 <s id="mercurial">Yes. (hg annotate)</s>
scm>                 <s id="git">Yes. (git blame).</s>
scm>            </compare>
scm>         </section>
scm>     </section>

This is O.K, but I wonder if it wouldn't be worth to add information
about unique (as far as I know) git-blame abilities, something like:

  Yes.  (git blame).

  <a href="">git-blame</a> can be asked to detect moving lines in file
  and between files; there exist GUI for blame ("git gui blame <file>").

Comments:

One would suspect that because Git is based towards whole project
history, and not per file history, git-blame is slow.  To migitate
that there is incremental blame mode used to reduce latency in graphical
blame viewers like "git gui blame", contrib/blameview, or the one
in QGit.

But anegdotical evidence (meaning in this case discussion with Pieter
de Bie of vcscompare blog on #git IRC channel) shows that git-blame
is of similar performance that its equivalent in Mercurial and Bazaar.
(It is hard to compare with "cvs annotate" as for centralized SCM there
is matter of network speed; I don't know how it compares for local
access, i.e. repository and file/client on the same filesystem).

scm>     <section id="features">
scm>         <title>Features</title>
scm>         <section id="work_on_dir">
scm>             <title>Ability to Work only on One Directory of the Repository</title>
scm>             <expl>
scm>                 Can the version control system checkout only one directory of
scm>                 the repository? Or restrict the check-ins to only one 
scm>                 directory?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">For checkouts: No. For checkins: Yes.</s>
scm>                 <s id="mercurial">
scm>                     It is possible to commit changes only in a subset of the
scm>                     tree. There are plans for partial checkouts.
scm>                 </s>
scm>                 <s id="git">
scm>                     No.  However, commits could be restricted somewhat,
scm>                     see the "Repository Permissions".
scm>                 </s>

Again, re-wrapped for better readibility.

scm>            </compare>
scm>         </section>

I'm not sure if the entry for Git shouldn't read as:

  No. All changes are made repository-wide.

as in for Aegis.  On the other hand it is possible to restrict commits
and remote access to only some subset of the tree using hooks; also
there is work done on implementing partial/sparse checkout for Git by
Nguyễn Thái Ngọc Duy (pclouds).  So perhaps it should read instead,
similarly to what we have for Mercurial:

  It is possible to restrict commit via hooks to changes only in
  a subset of the tree. Implementing partial/sparse checkouts is
  work in progress.

Comments:

Again, as said in "Producting OSS..." it is better to implement such
restrictions using social rather that technical means; at least for
checkins; there are some reasons (disk space, protecting against
accidental changes, <insert your reason here>) for partial checkouts.

scm>         <section id="tracking_uncommited_changes">
scm>             <title>Tracking Uncommited Changes</title>
scm>             <expl>
scm>                 Does the software have an ability to track the changes in the
scm>                 working copy that were not yet committed to the repository?
scm>             </expl>
scm>             <compare>
scm>                 <s id="mercurial">Yes. Using hg diff.</s>
scm>                 <s id="git">Yes.
scm>                     Also, branches are very lightweight in Git, and
scm>                     could be considered a kind of storage for "uncommitted" code
scm>                     in some workflows.
scm>                 </s>

A bit rewrapped, and indented.

scm>             </compare>
scm>         </section>

Perhaps we should add "Using git diff" here; the problem is with
[possible] difference between "git diff", "git diff HEAD", and
"git diff --cached".

I also think that the comment is not on subject; it is not closely
related to described feature, so it should be IMHO abandoned. The entry
should then read:

  Yes. Using git diff. 

Comments:

I don't think it is worth mentioning the fact that staging area in Git
(the index) is explicit and visible, and can be directly accessed. In
usual [newbie] workflows "git diff" works just fine...

scm>         <section id="per_file_commit_messages">
scm>             <title>Per-File Commit Messages</title>
scm>             <expl>
scm>                 Does the system have a way to assign a per-file commit message
scm>                 to the changeset, as well as a per-changeset message?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bitkeeper">Yes. It is possible to have a per-file
scm>                     commit message</s>
scm>                 <s id="mercurial">
scm>                     No.
scm>                 </s>
scm>                 <s id="git">
scm>                     No. Commit messages are per changeset.
scm>                 </s>

Re-wrapped.

scm>            </compare>
scm>         </section>
scm>     </section>

Perhaps simple "No." as in case of Mercurial entry would be enough here?

Comments:

What is the reason somebody would want per-file commit message?  They
cannot be terribly useful, as BitKeeper which taught Linus about
distributed version control has it, but Git doesn't.

scm>     <section id="technical_status">
scm>         <title>Technical Status</title>
scm>         <section id="documentation">
scm>             <title>Documentation</title>
scm>             <expl>
scm>                 How well is the system documented? How easy is it to
scm>                 get started using it?
scm>             </expl>
scm>             <compare>
scm>                 <s id="darcs">
scm>                     Good. The manual contains a brief tutorial and a solid
scm>                     reference.  Every sub-command can print its usage.
scm>                     Because the command-set is small and the model is
scm>                     simple, many users find it easy to get started.
scm>                 </s>
scm>                 <s id="git">
scm>                     Medium. The short help is too terse and obscure.
scm>                     The man pages are extensive, but tend to be confusing.
scm>                     The are many tutorials.
scm>                 </s>

Re-wrapped and re-indented for better readibilit.

scm>             </compare>
scm>         </section>

That of course depends on ones opinion.  I would say "Good", now that
there is "Git User's Manual" distributed with Git, and now that there
started semi-official "Git Community Book" (http://book.git-scm.com).

So I would say:

  Good. There is extensive <a href="">"Git User's Manual"<a/> distributed
  with Git. There started semi-official <a href="http://book.git-scm.com">"Git
  Community Book". There is manpage for each sub-command, and most commands
  can print short usage.

Comments:

Backward compatibility^W^W Old impressions die hard,... And the meme
that git documentation is not user friendly is difficult to kill.

scm>         <section id="ease_of_deployment">
scm>             <title>Ease of Deployment</title>
scm>             <expl>
scm>                 How easy is it to deploy the software? What are
scm>                 the dependencies and how can they be satisfied?
scm>             </expl>
scm>             <compare>
scm>                 <s id="mercurial">
scm>                     Excellent.  Binary packages are available for all
scm>                     popular platforms.  Building from source requires
scm>                     only Python 2.3 (or later) and a C compiler.
scm>                 </s>
scm>                <s id="git">
scm>                    Good.  Binary packages are available
scm>                    for modern platforms.  C compiler and Perl are
scm>                    required. Requires cygwin on Windows, and has some
scm>                    UNIXisms.
scm>                </s>
scm>            </compare>
scm>         </section>

Thanks to msysGit project it is not required to install Cygwin to have
Git on Windows. Also some commands are still written as shell scripts.
So I would say:

  Good. Binary packages are available for modern platforms. On Windows
  one can use either Cygwin, or native msysGit version. Requires Perl
  and POSIX shell (and assorted shell tools) for some commands.

Comments:

I don't know what to do with "has some UNIXisms"; I think it is not
very relevant for this entry, and it shouldn't be here.

As I use Git only on Linux (and at one time used msysGit on MS Windows XP
and FAT filesystem on USB drive to fetch git.git only), I cannot say much
on ease of deployment on other operating systems, like Free/Open/NetBSD,
MacOS X, different Unices, and MS Windows.  Perhaps it should be even
"Very good" here?

scm>         <section id="command_set">
scm>             <title>Command Set</title>
scm>             <expl>
scm>                 What is the command set? How compatible is it with
scm>                 the commands of CVS (the current open-source defacto
scm>                 standard)?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bitkeeper">
scm>                     A CVS-like command set with some easy-to-get-used-to
scm>                     complications due to its different way of work and 
scm>                     philosophy.
scm>                 </s>
scm>                 <s id="bazaar">
scm>                     Tries to follow CVS conventions, but deviates
scm>                     where there is a different design.
scm>                 </s>

The same for Mercurial.

scm>                 <s id="git">
scm>                     Command set is very feature-rich, and not compatible
scm>                     with CVS.
scm>                 </s>
scm>            </compare>
scm>         </section>

I don't think the situation is that different than with Mercurial, so
perhaps it should simply read:

  Tries to follow CVS conventions, but deviates where there is
  a different design.

Although Git doesn't "try to follow CVS conventions", it does follow
BitKeeper convention, then transitively also CVS conventions.  I would
agree with "feature-rich" comment, though ;-)

So perhaps something like

  Command set is very feature-rich; compatibility with CVS conventions
  are limited by differences in design.

scm>         <section id="networking">
scm>             <title>Networking Support</title>
scm>             <expl>
scm>                 How good is the networking integration of the system?
scm>                 How compliant is it with existing protocols and infra-structure?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">
scm>                     Excellent. Works natively over HTTP (read-only),
scm>                     FTP and SFTP without having Bazaar installed at
scm>                     the remote end.  Works over HTTP, SSH and a custom
scm>                     protocol when talking to a remote Bazaar
scm>                     server. Supports RSYNC and WebDAV (experimental)
scm>                     through plugins.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     Excellent.  Uses HTTP or ssh.  Remote access also
scm>                     works safely without locks over read-only network
scm>                     filesystems.
scm>                 </s>
scm>                 <s id="git">
scm>                     Excellent.  Can use native Git protocol, but works
scm>                     over rsync, ssh, HTTP and HTTPS also.
scm>                 </s>

Line-wrapped for better readibility.

scm>             </compare>
scm>         </section>

It could be written differently, but basically is O.K. Perhaps we
should state explicitely which protocols are read-only, and which allow
to push (publish) changes to server; which protocols require Git
installed on server and which can do without; perhaps also git-bundle
for off-line transport could be mentioned here.

scm>         <section id="portability">
scm>             <title>Portability</title>
scm>             <expl>
scm>                 How portable is the version-control system to various 
scm>                 operating systems, computer architectures, and other
scm>                 types of systems?
scm>             </expl>
scm>             <compare>
scm>                 <s id="mercurial">
scm>                     Excellent. Runs on all platforms supported by
scm>                     Python.  Repositories are portable across CPU
scm>                     architectures and endian conventions.
scm>                 </s>
scm>                 <s id="perforce">
scm>                     Excellent. Runs on UNIX, Mac OS, BeOS and Windows.
scm>                 </s>
scm>                 <s id="git">
scm>                     The client works on most UNIXes, but not on native
scm>                     MS-Windows. The Cygwin build seems to be workable, though.
scm>                 </s>

Re-wrapped for better readibility.

scm>             </compare>
scm>         </section>
scm>     </section>

There is native MS Windows implementation named msysGit. We can also
mention that Git runs on Linux, Free/Open/NetBSD, various Unices, 
MacOX X.

The entry for Git lacks also single word descriptions, like "Excellent",
"Very good", "Good", "Medium", that most other SCM have in this part
(and "Windows only" for some).

So I would say something like the following:

  Very good. Works on Linux, FreeBSD, MacOS X and various Unices. Works
  on MS Windows either via Cygwin, or natively via msysGit.

Perhaps I would also add the following cautionary note:

  There are some issues with case-insensitive (e.g. FAT) or filename-mangling
  (MacOS HFS+) filesystems, but they can be worked around.

Comments:

I'm not sure if it should be "Very good" or "Excellent" here. I think
that this section is about number of systems SCM was ported to, not
how hard is to make it work (port Git) on some additional OS. I'm not
sure how complete msysGit is (git-svn and other Perl scripts), so I put
"Very good"; but it is improving, buth thanks to builtinification and
to efforts of msysGit maintainers and developers.

scm>     <section id="user_interaces">
scm>         <title>User Interfaces</title>
scm>         <section id="web_interface">
scm>             <title>Web Interface</title>
scm>             <expl>
scm>                 Does the system have a WWW-based interface that can be
scm>                 used to browse the tree and the various revisions of the
scm>                 files, perform arbitrary diffs, etc?
scm>             </expl>
scm>             <compare>
scm>                 <s id="cvs">Yes. 
scm>                     <a href="http://www.freebsd.org/projects/cvsweb.html">CVSweb</a>,
scm>                     <a href="http://www.viewvc.org/">ViewVC</a>,
scm>                     <a href="http://www.horde.org/chora/">Chora</a>,
scm>                     and <a href="http://wwcvs.republika.pl/">wwCVS</a>.
scm>                 </s>
scm>                 <s id="bazaar">
scm>                     Yes, several:
scm>                     <a href="http://www.lag.net/loggerhead/">Loggerhead</a>,
scm>                     <a href="http://goffredo-baroncelli.homelinux.net/bazaar/">Webserve</a>,
scm>                     <a href="http://mccormick.cx/dev/bzrweb/index.py/log/bzrweb/head">Bzrweb</a>,
scm>                     and
scm>                     <a href="http://bazaar-vcs.org/TracBzr">Trac</a>.
scm>                 </s>
scm>                 <s id="mercurial">Yes.  The web interface is a bundled component.</s>
scm>                 <s id="git">
scm>                     Yes.  Gitweb is included in distribution.
scm>                 </s>
scm>             </compare>
scm>         </section>

For other SCMs there are listed many different web interfaces.
So I would perhaps put here a list:

  Yes.  Gitweb (in Perl) is included in distribution, but there are
  many other web interfaces:
  <a href="http://hjemli.net/git/cgit/">cgit</a> (in C),
  <a href="http://code.google.com/p/git-php/">git-php</a> and
  <a href="http://people.proekspert.ee/peeter/blog/index.php?category=5">its fork<a> (in PHP),
  <a href="http://viewgit.sourceforge.net/">ViewGit</a> (in PHP),
  <a href="http://www.flameeyes.eu/projects#gitarella">Gitarella</a> (in Ruby)

Perhaps we should leave programming language used, as there is no such
information provided for other SCM listed in comparison.

Comments:

I have not put Wit by Daniel Chokola in Ruby/eRuby, as the site seems
to be down. I also didn't add GitStat which is not web interface but
offers instead web-based statistics about repository, and hosting
solutions such as Gitorious (in Ruby), InDefero (in PHP) and GitHub
(not open, in Ruby).

I'm also not sure which one of git-php implementations to list; perhaps
simply the first one, the one that has kind of homepage on Google Code.

scm>         <section id="availability_of_guis">
scm>             <title>Availability of Graphical User-Interfaces.</title>
scm>             <expl>
scm>                 What is the availability of graphical user-interfaces for
scm>                 the system? How many GUI clients are present for it?
scm>             </expl>
scm>             <compare>
scm>                 <s id="bazaar">
scm>                     There are several graphical frontends in
scm>                     development,
scm>                     see <a href="http://bazaar-vcs.org/BzrPlugins">the Bazaar Plugins page</a>
scm>                     and <a href="http://bazaar-vcs.org/3rdPartyTools">the Third-party Tools page</a>.
scm>                     Notable
scm>                     are <a href="http://bazaar-vcs.org/QBzr">QBzr
scm>                     (Qt)</a> and 
scm>                     <a href="http://bazaar-vcs.org/bzr-gtk">bzr-gtk (GTK+)</a>, which
scm>                     can be considered beta quality.  Work is also
scm>                     being done on integrating Bazaar with Windows
scm>                     Explorer, Eclipse, Nautilus, and Meld.
scm>                 </s>
scm>                 <s id="mercurial">
scm>                     History viewing available with hgit extension;
scm>                     check-in extension (hgct) makes committing easier.
scm>                     Some third-party IDEs and GUI tools (e.g. eric3,
scm>                     meld) have integrated Mercurial support.
scm>                 </s>
scm>                 <s id="git">
scm>                     Gitk is included in distribution.  
scm>                     Qgit and Git-gui tools are also available.
scm>                 </s>

Re-wrapped.

scm>            </compare>
scm>         </section>
scm>     </section>

There is a question whether to list all (or at least more popular) GUI
tools, or list only built-in and perhaps one or two more, and direct
to Git Wiki for details, like below:

  Gitk history viewer and Git-gui commit tool are included in distribution.
  There are also other tools available, like
  <a href="http://digilander.libero.it/mcostalba/">QGit</a> (Qt) nad
  <a href="http://github.com/Caged/gitnub/wikis">GitNub</a> (MacOS);
  see <a href="http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#gui">Interfaces,
  Frontends And Tools</a> page on Git Wiki for a list.

I have listed two most popular GUI according to currently open survey
  http://www.survs.com/shareResults?survey=M3PIVU72&rndm=OKJQ45LAG8

Comments:

Should there be mentioned integration and beginning of integration with
IDE, editors and other tools in the form of EGit (Eclipse Git plugin),
similar work for NetBeans and IntelliJ/JetBeans, Git TextMate bundle,
Emacs modes (git.el, vc-git.el, DVC, magit), and support from other
tools like PIDA and Meld?  It is mentioned for other SCM, but it is
not GUI in exact meaning of this word.

scm>     <section id="license">
scm>         <title>License</title>
scm>         <expl>
scm>             What are the licensing terms for the software? 
scm>         </expl>
scm>         <compare>
scm>             <s id="mercurial">GNU GPL (open source)</s>
scm>             <s id="git">GNU GPL v2 (open source).</s>
scm>         </compare>
scm>     </section>
scm> </section>

No comments here.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-10-01 18:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-13 17:06 Git at Better SCM Initiative comparison of VCS (long) Jakub Narebski
2008-09-14 14:43 ` Dmitry Potapov
2008-09-14 15:09   ` Alexey Mahotkin
2008-09-14 17:48   ` Jakub Narebski
2008-09-14 19:48     ` Dmitry Potapov
2008-09-14 21:06       ` Shawn O. Pearce
2008-09-14 21:29         ` Jakub Narebski
2008-09-15  0:37       ` Jakub Narebski
2008-10-01 18:45 ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).