Git development
 help / color / mirror / Atom feed
* [PATCH 17/19] More fixes to the git-remote-cvs installation procedure
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

- Makefile: Make sure git-remote-cvs is rebuilt when 'prefix' changes
  (by adding a dependency on GIT-CFLAGS). This prevents a regular 'make'
  followed by a 'make prefix=/somewhere/else install' from installing a
  non-working git-remote-cvs.

- Makefile: When mangling git-remote-cvs to add the git_remote_cvs install
  location to the Python search path, _replace_ the initial 'current dir'
  entry in sys.path (instead of merely prepending the install location).
  Hence, if the git_remote_cvs package is not installed at the correct
  location (and also not present in any of Python's default package dirs),
  then git-remote-cvs will fail loudly instead of silently falling back on
  the git_remote_cvs subdir in git.git.

- Allow for the git_remote_cvs install path to be overridden by the
  $GITPYTHONLIB environment variable.

- t/test-lib.sh: Set $GITPYTHONLIB (unless $GIT_TEST_INSTALLED is enabled)
  so that we test the currently built version of git_remote_cvs  (the one
  that is built in git_remote_cvs/build/lib) instead of a previously
  installed version.

- Another minor check and a line length fix.

Found-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

	Unchanged.

 Makefile      |   11 +++++++++--
 t/test-lib.sh |    9 +++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 9fb2747..6d37399 100644
--- a/Makefile
+++ b/Makefile
@@ -1523,13 +1523,20 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh
 endif # NO_PERL
 
 ifndef NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): GIT-CFLAGS
 $(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
 	$(QUIET_GEN)$(RM) $@ $@+ && \
-	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' instlibdir` && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s \
+		--no-print-directory prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' \
+		instlibdir` && \
 	sed -e '1{' \
 	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
 	    -e '}' \
-	    -e 's|^import sys.*|&; sys.path.insert(0, "@@INSTLIBDIR@@")|' \
+	    -e 's|^import sys.*|&; \\\
+	           import os; \\\
+	           sys.path[0] = os.environ.has_key("GITPYTHONLIB") and \\\
+	                         os.environ["GITPYTHONLIB"] or \\\
+	                         "@@INSTLIBDIR@@"|' \
 	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
 	    $@.py >$@+ && \
 	chmod +x $@+ && \
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0b991db..77ad23e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -638,6 +638,15 @@ test -d ../templates/blt || {
 	error "You haven't built things yet, have you?"
 }
 
+if test -z "$GIT_TEST_INSTALLED"
+then
+	GITPYTHONLIB="$(pwd)/../git_remote_cvs/build/lib"
+	export GITPYTHONLIB
+	test -d ../git_remote_cvs/build || {
+		error "You haven't built git_remote_cvs yet, have you?"
+	}
+fi
+
 if ! test -x ../test-chmtime; then
 	echo >&2 'You need to build test-chmtime:'
 	echo >&2 'Run "make test-chmtime" in the source (toplevel) directory'
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 16/19] Fix the Makefile-generated path to the git_remote_cvs package in git-remote-cvs
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

Junio discovered that the installed git-remote-cvs executable is unable
to find the installed git_remote_cvs Python package. This is due to the
'instlibdir' target of git_remote_cvs/Makefile being invoked with
insufficient arguments (prefix and DESTDIR). This patch fixes that.

Signed-off-by: Johan Herland <johan@herland.net>
---

	Unchanged.

 Makefile                |    2 +-
 git_remote_cvs/Makefile |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index a63f44d..9fb2747 100644
--- a/Makefile
+++ b/Makefile
@@ -1525,7 +1525,7 @@ endif # NO_PERL
 ifndef NO_PYTHON
 $(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
 	$(QUIET_GEN)$(RM) $@ $@+ && \
-	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory instlibdir` && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' instlibdir` && \
 	sed -e '1{' \
 	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
 	    -e '}' \
diff --git a/git_remote_cvs/Makefile b/git_remote_cvs/Makefile
index 061c247..0d9eb31 100644
--- a/git_remote_cvs/Makefile
+++ b/git_remote_cvs/Makefile
@@ -28,7 +28,7 @@ install: $(pysetupfile)
 	$(PYTHON_PATH) $(pysetupfile) install --prefix $(DESTDIR_SQ)$(prefix)
 
 instlibdir: $(pysetupfile)
-	@echo "$(prefix)/$(PYLIBDIR)"
+	@echo "$(DESTDIR_SQ)$(prefix)/$(PYLIBDIR)"
 
 clean:
 	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) clean -a
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 14/19] git-remote-cvs: Remote helper program for CVS repositories
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland, Daniel Barkalow, David Aguilar
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

Implements the import of objects from a local or remote CVS repository.

This helper program uses the "git_remote_cvs" Python package introduced
earlier, and provides a working draft implementation of the remote helper
API, as described in Documentation/git-remote-helpers.txt. Further details
about this specific helper are described in the new
Documentation/git-remote-cvs.txt.

This patch has been improved by the following contributions:
- Daniel Barkalow: Updates reflecting changes in remote helper API
- David Aguilar: Lots of Python coding style fixes

Cc: Daniel Barkalow <barkalow@iabervon.org>
Cc: David Aguilar <davvid@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

	Unchanged.

 Documentation/git-remote-cvs.txt |   85 +++++
 Makefile                         |   24 ++
 git-remote-cvs.py                |  683 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 792 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-remote-cvs.txt
 create mode 100755 git-remote-cvs.py

diff --git a/Documentation/git-remote-cvs.txt b/Documentation/git-remote-cvs.txt
new file mode 100644
index 0000000..309ee7f
--- /dev/null
+++ b/Documentation/git-remote-cvs.txt
@@ -0,0 +1,85 @@
+git-remote-cvs(1)
+=================
+
+NAME
+----
+git-remote-cvs - Helper program for interoperation with CVS repositories
+
+SYNOPSIS
+--------
+'git remote-cvs' <remote>
+
+DESCRIPTION
+-----------
+
+Please see the linkgit:git-remote-helpers[1] documentation for general
+information about remote helper programs.
+
+CONFIGURATION
+-------------
+
+remote.*.cvsRoot::
+	The URL of the CVS repository (as found in a `CVSROOT` variable, or
+	in a `CVS/Root` file).
+	Example: "`:pserver:user@server/var/cvs/cvsroot`".
+
+remote.*.cvsModule::
+	The path of the CVS module (as found in a `CVS/Repository` file)
+	within the CVS repository specified in `remote.*.cvsRoot`.
+	Example: "`foo/bar`"
+
+remote.*.cachedSymbolsOnly::
+	When 'true', a cache of CVS symbols is used instead of querying the
+	CVS server for all existing symbols (potentially expensive). In this
+	mode, git-remote-cvs will not discover new CVS symbols unless you add
+	them explicitly with the "`addsymbol <symbol>`" command (on
+	git-remote-cvs's stdin), or request an explicit symbol cache update
+	from the CVS server with the "`syncsymbols`" command (on
+	git-remote-cvs's stdin). When 'false' (the default), the CVS server
+	will be queried whenever a list of CVS symbols is required.
+
+remote.*.usernameMap::
+	The path (absolute, or relative to the repository (NOT the worktree))
+	to the file that contains the mapping from CVS usernames to the
+	corresponding full names and email addresses, as used by Git in the
+	Author and Committer fields of commit objects. When this config
+	variable is set, CVS usernames will be resolved against this file.
+	If no match is found in the file, or if this config variable is unset,
+	or if the variable points to a non-existing file, the original CVS
+	username will be used as the Author/Committer name, and the
+	corresponding email address will be set to "`<username>@example.com`".
++
+The format of the usernameMap file is one entry per line, where each line is
+of the form "`username: Full Name <email@address>`".
+Example: `johndoe: John Doe <johndoe@example.com>`
+Blank lines and lines starting with '#' are ignored.
+
+COMMANDS
+--------
+
+In addition to the commands that constitute the git-remote-helpers API, the
+following extra commands are supported for managing the local symbol cache when
+the `remote.*.cachedSymbolsOnly` config variable is true. The following
+commands can be given on the standard input of git-remote-cvs:
+
+'addsymbol'::
+	Takes one CVS symbol name as argument. The given CVS symbol is
+	fetched from the CVS server and stored into the local CVS symbol
+	cache. If `remote.*.cachedSymbolsOnly` is enabled, this can be used
+	to introduce a new CVS symbol to the CVS helper application.
+
+'syncsymbols'::
+	All CVS symbols that are available from the given remote are
+	fetched from the CVS server and stored into the local CVS symbol
+	cache. This is equivalent to disabling `remote.{asterisk}.cachedSymbolsOnly`,
+	running the "list" command, and then finally re-enabling the
+	`remote.*.cachedSymbolsOnly` config variable. I.e. this command can
+	be used to manually synchronize the CVS symbols available to the
+	CVS helper application.
+
+'verify'::
+	Takes one CVS symbol name as argument. Verifies that the CVS symbol
+	has been successfully imported be checking out the CVS symbol from
+	the CVS server, and comparing the CVS working tree against the Git
+	tree object identified by `refs/cvs/<remote>/<symbol>`. This can be
+	used to verify the correctness of a preceding 'import' command.
diff --git a/Makefile b/Makefile
index 944f4d0..a63f44d 100644
--- a/Makefile
+++ b/Makefile
@@ -346,6 +346,8 @@ SCRIPT_PERL += git-relink.perl
 SCRIPT_PERL += git-send-email.perl
 SCRIPT_PERL += git-svn.perl
 
+SCRIPT_PYTHON += git-remote-cvs.py
+
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
 	  $(patsubst %.py,%,$(SCRIPT_PYTHON)) \
@@ -1520,6 +1522,28 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh
 	mv $@+ $@
 endif # NO_PERL
 
+ifndef NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory instlibdir` && \
+	sed -e '1{' \
+	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
+	    -e '}' \
+	    -e 's|^import sys.*|&; sys.path.insert(0, "@@INSTLIBDIR@@")|' \
+	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
+	    $@.py >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+else # NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): % : unimplemented.sh
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
+	    -e 's|@@REASON@@|NO_PYTHON=$(NO_PYTHON)|g' \
+	    unimplemented.sh >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+endif # NO_PYTHON
+
 configure: configure.ac
 	$(QUIET_GEN)$(RM) $@ $<+ && \
 	sed -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
diff --git a/git-remote-cvs.py b/git-remote-cvs.py
new file mode 100755
index 0000000..94b61e7
--- /dev/null
+++ b/git-remote-cvs.py
@@ -0,0 +1,683 @@
+#!/usr/bin/env python
+
+"""Usage: git-remote-cvs <remote> [<url>]
+
+Git remote helper command for interacting with CVS repositories
+
+See git-remote-helpers documentation for details on external interface,
+usage, etc.  See git-remote-cvs documentation for specific
+configuration details of this remote helper.
+
+"""
+
+# PRINCIPLES:
+# -----------
+# - Importing same symbol twice (with no CVS changes in between) should
+#   yield the exact same Git state (and the second import should have
+#   no commits).
+# - Importing several CVS symbols pointing to the same state should
+#   yield corresponding refs pointing to the _same_ commit in Git.
+# - Importing a CVS symbol which has received only "regular commits"
+#   since last import should yield a fast-forward straight line of
+#   commits.
+
+# TODO / KNOWN PROBLEMS:
+# ----------------------
+# - Remove cachedSymbolsOnly config variable for now?
+# - Author map handling; map CVS usernames to Git full name + email
+# - Handle files that have been created AND deleted since last import
+# - How to handle CVS tags vs. branches.  Turn CVS tags into Git tags?
+# - Better CVS branch handling: When a branch as a super/subset of
+#   files/revs compared to another branch, find a way to base one branch
+#   on the other instead of creating parallel lines of development with
+#   roughly the same commits.
+# - Profiling, optimizations...
+
+import sys
+import os
+
+from git_remote_cvs.util import (debug, error, die, ProgressIndicator,
+                                 run_command)
+from git_remote_cvs.cvs import CVSState, CVSWorkDir, fetch_revs
+from git_remote_cvs.git import (get_git_dir, parse_git_config, git_config_bool,
+                                valid_git_ref, GitObjectFetcher, GitRefMap,
+                                GitFICommit, GitFastImport, GitNotes)
+from git_remote_cvs.cvs_symbol_cache import CVSSymbolCache
+from git_remote_cvs.commit_states import CommitStates
+from git_remote_cvs.cvs_revision_map import CVSRevisionMap, CVSStateMap
+from git_remote_cvs.changeset import build_changesets_from_revs
+
+
+class Config(object):
+
+    """Encapsulation of configuration variables."""
+
+    # Author name/email tuple for commits created by this tool
+    Author = ("git remote-cvs", "invalid@example.com")
+    # Git remote name
+    Remote = None
+    # CVS symbols are imported into this refs namespace/directory
+    RefSpace = None
+    # Git notes ref, the refname pointing to our git notes
+    NotesRef = None
+    # CVS repository identifier, a 2-tuple (cvs_root, cvs_module),
+    # where cvs_root is the CVS server/repository URL (as found in
+    # $CVSROOT, or in a CVS/Root file), and cvs_module is the path
+    # to a CVS module relative to the CVS repository (as found in a
+    # CVS/Repository file)
+    CVSRepo = (None, None)
+    # Path to the git-remote-cvs cache/work directory
+    # (normally "info/cvs/$remote" within $GIT_DIR)
+    WorkDir = None
+    # If False, the list of CVS symbols will always be retrieved from
+    # the CVS server using 'cvs rlog'.  If True, only the cached
+    # symbols within the "symbols" subdirectory of WorkDir are
+    # consulted.
+    CachedSymbolsOnly = False
+
+    @classmethod
+    def init (cls, remote):
+        """Fetch and setup configuration for the given remote."""
+        git_config = parse_git_config()
+        assert git_config["remote.%s.vcs" % (remote)] == "cvs"
+
+        cls.Author = (git_config["user.name"], git_config["user.email"])
+        cls.Remote = remote
+        cls.RefSpace = "refs/cvs/%s/" % (remote)
+        cls.NotesRef = "refs/notes/cvs/%s" % (remote)
+        cls.CVSRepo = (git_config["remote.%s.cvsroot" % (remote)],
+                       git_config["remote.%s.cvsmodule" % (remote)])
+        cls.WorkDir = os.path.join(get_git_dir(), "info/cvs", remote)
+        cls.CachedSymbolsOnly = git_config_bool(git_config.get(
+            "remote.%s.cachedsymbolsonly" % (remote), "false"))
+
+
+def work_path (*args):
+    """Return the given path appended to git-remote-cvs's cache/work dir."""
+    return os.path.join(Config.WorkDir, *args)
+
+
+def cvs_to_refname (cvsname):
+    """Return the git ref name for the given CVS symbolic name."""
+    if cvsname.startswith(Config.RefSpace):  # Already converted
+        return cvsname
+    return Config.RefSpace + cvsname
+
+
+def ref_to_cvsname (refname):
+    """Return the CVS symbolic name for the given git ref name."""
+    if refname.startswith(Config.RefSpace):
+        return refname[len(Config.RefSpace):]
+    return refname
+
+
+def valid_cvs_symbol (symbol):
+    """Return True iff the given CVS symbol can be imported into Git."""
+    return valid_git_ref(cvs_to_refname(symbol))
+
+
+def die_usage (msg, *args):
+    """Abort program with a helpful usage message."""
+    # Use this file's docstring as a usage string
+    print >> sys.stderr, __doc__
+    die(msg, *args)
+
+
+def import_cvs_revs (symbol, prev_state, cur_state, progress):
+    """Import the CVS revisions needed to satisfy the given CVS symbol.
+
+    This method will determine the CVS revisions involved in moving
+    from the given prev_state to the given cur_state.  This includes
+    looking at revision metadata in CVS, and importing needed blobs
+    from CVS.
+
+    The revision metadata is returned as a 2-level dict of CVSRev
+    objects: mapping path -> revnum -> CVSRev object.
+
+    """
+    # Calculate the revisions involved in moving from prev_state to
+    # cur_state, and fetch CVSRev objects for these revisions.
+    progress.pushprefix("Importing CVS revisions: ")
+    paths = set(prev_state.paths()).union(cur_state.paths())
+    num_fetched_revs = 0  # Number of CVSRev objects involved
+    num_imported_blobs = 0  # Number of blobs actually imported
+    cvs_revs = {}  # path -> revnum -> CVSRev
+    for i, path in enumerate(sorted(paths)):
+        progress.pushprefix("(%i/%i) %s: " % (i + 1, len(paths), path))
+        progress("")
+        prev_rev = prev_state.get(path)
+        cur_rev = cur_state.get(path)
+        if prev_rev and cur_rev and prev_rev == cur_rev:
+            # No changes since last import
+            progress.popprefix()
+            continue
+
+        # Fetch CVSRev objects for range [path:prev_rev, path:symbol]
+        path_revs = fetch_revs(path, prev_rev, cur_rev, symbol, Config.CVSRepo)
+        if not path_revs:
+            # Failed to find revs between prev_rev and symbol
+            if cur_rev:
+                assert not cur_rev.follows(prev_rev)
+                # The CVS symbol has been moved/reset since the
+                # last import in such a way that we cannot
+                # deduce the history between the last import
+                # and the current import.
+                # FIXME: Can we can work around this?
+                die("CVS symbol %s has been moved/reset from %s:%s to %s:%s "
+                    "since the last import.  This is not supported",
+                    symbol, path, prev_rev, path, cur_rev)
+            else:
+                # CVS symbol has been removed from this path.
+                # We cannot conclusively determine the history
+                # of this path following prev_rev.
+                # FIXME: Can we can work around this?
+                die("CVS symbol %s has been removed from %s since the last "
+                    "import.  This is not supported",
+                    symbol, path)
+
+        # OK.  We've got the revs in range [prev_rev, symbol]
+        # Verify/determine cur_rev
+        real_cur_rev = max(path_revs.keys())
+        if cur_rev:
+            assert cur_rev == real_cur_rev
+        else: cur_rev = real_cur_rev
+        # No need to re-import prev_rev if already imported
+        if prev_rev:
+            assert cur_rev.follows(prev_rev)
+            assert prev_rev in path_revs
+            del path_revs[prev_rev]
+        assert path_revs  # There should be more revs than just prev_rev
+
+        # Sanity checks:
+        # All revs from prev_rev to cur_rev are about to be imported
+        check_rev = cur_rev
+        while check_rev and check_rev != prev_rev:
+            assert check_rev in path_revs
+            check_rev = check_rev.parent()
+        # All previous revs have already been imported
+        check_rev = prev_rev
+        while check_rev:
+            assert Globals.CVSRevisionMap.has_rev(path, check_rev)
+            check_rev = check_rev.parent()
+
+        # Import CVS revisions as Git blobs
+        j = 0
+        for num, rev in sorted(path_revs.iteritems(), reverse = True):
+            j += 1
+            progress("(%i/%i) %s" % (j, len(path_revs), num))
+            assert num == rev.num
+            # Skip if already imported
+            if Globals.CVSRevisionMap.has_rev(rev.path, rev.num):
+                continue
+            # ...or if rev is a deletion
+            elif rev.deleted:
+                continue
+            # Import blob for reals
+            data = Globals.CVSWorkDir.get_revision_data(rev.path, rev.num)
+            Globals.GitFastImport.comment("Importing CVS revision %s:%s" %
+                                          (rev.path, rev.num))
+            mark = Globals.GitFastImport.blob(data)
+            Globals.CVSRevisionMap.add_blob(rev.path, rev.num, mark)
+            num_imported_blobs += 1
+
+        # Add path_revs to the overall structure of revs to be imported
+        assert path not in cvs_revs
+        cvs_revs[path] = path_revs
+        num_fetched_revs += len(path_revs)
+        progress.popprefix()
+    progress.popprefix()
+    progress("Imported %i blobs (reused %i existing blobs)" %
+             (num_imported_blobs, num_fetched_revs - num_imported_blobs), True)
+
+    return cvs_revs
+
+
+def advance_state (state, changeset):
+    """Advance the given state by applying the given changeset."""
+    # Verify that the given changeset "fits" on top of the given state
+    for rev in changeset:
+        prev_num = rev.num.parent()
+        state_num = state.get(rev.path)
+        if prev_num is None and state_num is None:
+            # rev is the first revision of this path being added
+            state.add(rev.path, rev.num)
+        elif prev_num and state_num and prev_num == state_num:
+            if rev.deleted:  # rev deletes path from state
+                state.remove(rev.path, prev_num)
+            else:  # rev follows state's revision of this path
+                state.replace(rev.path, rev.num)
+        else:
+            error("Cannot apply changeset with %s:%s on top of CVS state "
+                  "with %s:%s.",
+                  rev.path, changeset[rev.path].num,
+                  rev.path, state.get(rev.path))
+            error("    changeset: %s", changeset)
+            error("    CVS state: \n---\n%s---", state)
+            die("Failed to apply changeset.  Aborting.")
+
+
+def revert_state (state, changeset):
+    """Revert the given state to before the given changeset is applied.
+
+    This is the reverse of the above advance_state() function.
+
+    """
+    for rev in changeset:
+        prev_num = rev.num.parent()
+        state_num = state.get(rev.path)
+        if state_num is None:  # Revert deletion of file
+            assert rev.deleted
+            state.add(rev.path, prev_num)
+        else:
+            assert state_num == rev.num
+            if prev_num is None:  # Revert addition of file
+                state.remove(rev.path, rev.num)
+            else:  # Regular revert to previous version
+                state.replace(rev.path, prev_num)
+
+
+def import_changesets (ref, changesets, from_state, to_state, progress):
+    """Apply the given list of Changeset objects to the given ref.
+
+    Also verify that the changesets bring us from the given from_state
+    to the given to_state.
+
+    """
+    state = from_state
+    for i, c in enumerate(changesets):
+        advance_state(state, c)
+        progress("(%i/%i) Committing %s" % (i + 1, len(changesets), c))
+        # Make a git commit from changeset c
+        commitdata = GitFICommit(
+            c.author, c.author + "@example.com",  # TODO: author_map handling
+            c.date.ts, c.date.tz_str(),
+            "".join(["%s\n" % (line) for line in c.message]))
+
+        for rev in c:
+            p, n = rev.path, rev.num
+            if rev.deleted:
+                commitdata.delete(p)
+                continue
+            blobname = Globals.CVSRevisionMap.get_blob(p, n)
+            mode = Globals.CVSRevisionMap.get_mode(p)
+            if mode is None:  # Must retrieve mode from CVS checkout
+                debug("Retrieving mode info for '%s'" % (p))
+                Globals.CVSWorkDir.update(n, [p])
+                mode = Globals.CVSWorkDir.get_modeinfo([p])[p]
+                Globals.CVSRevisionMap.add_path(p, mode)
+            commitdata.modify(mode, blobname, p)
+
+        commitname = Globals.GitFastImport.commit(ref, commitdata)
+        Globals.CommitStates.add(commitname, state, Globals.GitFastImport)
+        for path, revnum in state:
+            Globals.CVSRevisionMap.add_commit(path, revnum, commitname)
+        assert commitname in Globals.CVSStateMap.get_commits(state)
+    assert state == to_state
+    return len(changesets)
+
+
+def import_cvs_symbol (cvs_symbol, progress):
+    """Import the given CVS symbol from CVS to Git.
+
+    Return False if nothing was imported, True otherwise.
+
+    """
+    git_ref = cvs_to_refname(cvs_symbol)
+    progress.pushprefix("%s: " % (cvs_symbol))
+
+    # Verify that we are asked to import valid git ref names
+    if not valid_git_ref(git_ref):
+        progress("Invalid git ref '%s'.  Skipping." % (git_ref), True)
+        progress.popprefix()
+        return False
+
+    # Retrieve previously imported CVS state
+    progress("Loading previously imported state of %s..." % (git_ref))
+    prev_commit = Globals.GitRefMap.get(git_ref)
+    prev_state = Globals.CommitStates.get(prev_commit, CVSState())
+
+    # Retrieve current CVS state of symbol
+    # Also: At some point we will need mode information for all CVS
+    # paths (stored in CVSRevisionMap).  This information can be added
+    # for each path on demand (using CVSWorkDir.get_modeinfo()), but
+    # doing so may be an expensive process.  It is much cheaper to load
+    # mode information for as many paths as possible in a _single_
+    # operation.  We do this below, by calling
+    # CVSRevisionMap.sync_modeinfo_from_cvs() in appropriate places.
+    if Config.CachedSymbolsOnly:
+        progress("Synchronizing local CVS symbol cache for symbol...")
+        # The symbol cache is likely not up-to-date.  Synchronize the
+        # given CVS symbol explicitly, to make sure we get the version
+        # current with the CVS server.
+        Globals.CVSSymbolCache.sync_symbol(cvs_symbol, Globals.CVSWorkDir,
+                                           progress)
+        # The above method updates the CVS workdir to the current CVS
+        # version.  Hence, now is a convenient time to preload mode
+        # info from the currently checked-out CVS files.  There may be
+        # more files for which we'll need mode information, but we'll
+        # deal with those when needed.
+        progress("Updating path mode info from current CVS checkout.")
+        Globals.CVSRevisionMap.sync_modeinfo_from_cvs(Globals.CVSWorkDir)
+    elif not Globals.CVSRevisionMap:  # No info for any paths, yet
+        # Pure optimization: We didn't get to preload all the mode info
+        # above.  Normally, the only alternative is load mode info for
+        # each path on-demand.  However, if our CVSRevisionMap is
+        # currently empty, that's probably going to be very expensive.
+        # Therefore, in this case, do an explicit CVS update here, and
+        # preload mode info for all paths.
+        progress("Updating CVS checkout to sync path mode info.")
+        Globals.CVSWorkDir.update(cvs_symbol)
+        Globals.CVSRevisionMap.sync_modeinfo_from_cvs(Globals.CVSWorkDir)
+
+    progress("Loading current CVS state...")
+    try:
+        cur_state = Globals.CVSSymbolCache[cvs_symbol]
+    except KeyError:
+        progress("Couldn't find symbol '%s'.  Skipping." % (cvs_symbol), True)
+        progress.popprefix()
+        return False
+
+    # Optimization: Check if the previous import of this symbol is
+    # still up-to-date.  If so, there's nothing more to be done.
+    progress("Checking if we're already up-to-date...")
+    if cur_state == prev_state:
+        progress("Already up-to-date.  Skipping.", True)
+        progress.popprefix()
+        return False
+
+    progress("Fetching CVS revisions...")
+    cvs_revs = import_cvs_revs(cvs_symbol, prev_state, cur_state, progress)
+    progress("Organizing revisions into chronological list of changesets...")
+    changesets = build_changesets_from_revs(cvs_revs)
+
+    # When importing a new branch, try to optimize branch start point,
+    # instead of importing entire branch from scratch
+    if prev_commit is None:
+        progress("Finding startpoint for new symbol...")
+        i = len(changesets)
+        state = cur_state.copy()
+        for c in reversed(changesets):
+            commit = Globals.CVSStateMap.get_exact_commit(state,
+                                                          Globals.CommitStates)
+            if commit is not None:
+                # We have found a commit that exactly matches the state
+                # after commit #i (changesets[i - 1])
+                Globals.GitFastImport.reset(git_ref, commit)
+                changesets = changesets[i:]
+                break
+            revert_state(state, c)
+            i -= 1
+    num_changesets = len(changesets)
+    num_applied = 0
+
+    # Apply changesets, bringing git_ref from prev_state to cur_state
+    if num_changesets:
+        progress("Importing changesets...")
+        num_applied = import_changesets(git_ref, changesets, prev_state,
+                                        cur_state, progress)
+    progress("Imported %i changesets (reused %i existing changesets)" %
+             (num_applied, num_changesets - num_applied), True)
+    progress.popprefix()
+    return True
+
+
+def do_import (*args):
+    """Do the 'import' command; import refs from a remote."""
+    if not args:
+        die_usage("'import' takes at least one parameter: ref...")
+
+    progress = ProgressIndicator("    ", sys.stderr)
+    cvs_symbols = map(ref_to_cvsname, args)
+    empty_import = True
+    for symbol in cvs_symbols:
+        if import_cvs_symbol(symbol, progress):
+            empty_import = False
+    if empty_import:
+        progress.finish("Everything up-to-date", True)
+        return 0
+    progress.finish("Finished importing %i CVS symbols to Git" %
+                    (len(cvs_symbols)), True)
+    return 0
+
+
+def do_list (*args):
+    """Do the 'list' command; list refs available from a CVS remote."""
+    if args:
+        die_usage("'list' takes no parameters")
+
+    progress = ProgressIndicator("    ", sys.stderr)
+    if Config.CachedSymbolsOnly:
+        progress("Listing symbols in local symbol cache...", True)
+        for symbol in sorted(Globals.CVSSymbolCache):
+            print cvs_to_refname(symbol)
+        progress.finish()
+        print  # Terminate output with a blank line
+        return 0
+
+    # Synchronize local symbol cache with CVS server
+    progress("Synchronizing local symbol cache with CVS server...")
+    Globals.CVSSymbolCache.sync_all_symbols(Config.CVSRepo, progress,
+                                            valid_cvs_symbol)
+    # Load current states of Git refs
+    progress("Loading current state of Git refs...")
+    changed, unchanged = 0, 0
+    for cvs_symbol, cvs_state in sorted(Globals.CVSSymbolCache.items()):
+        git_ref = cvs_to_refname(cvs_symbol)
+        progress("\tChecking if Git ref is up-to-date: %s" % (git_ref))
+        git_commit = Globals.GitRefMap.get(git_ref)
+        git_state = Globals.CommitStates.get(git_commit)
+        attrs = ""
+        if git_state and git_state == cvs_state:
+            attrs = " unchanged"
+            unchanged += 1
+        else:
+            git_commit = "?"
+            changed += 1
+        print "%s %s%s" % (git_commit, git_ref, attrs)
+    progress.finish("Found %i CVS symbols (%i changed, %i unchanged)" %
+                    (changed + unchanged, changed, unchanged))
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_capabilities (*args):
+    """Do the 'capabilities' command; report supported features."""
+    if args:
+        die_usage("'capabilities' takes no parameters")
+    print "import"
+    print "marks %s" % (work_path("marks"))
+#   print "export"
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_addsymbol (*args):
+    """Do the 'addsymbol' command; add given CVS symbol to local cache."""
+    if len(args) != 1:
+        die_usage("'addsymbol' takes one parameter: symbol")
+    symbol = args[0]
+    progress = ProgressIndicator("    ", sys.stderr)
+    if valid_cvs_symbol(symbol):
+        Globals.CVSSymbolCache.sync_symbol(symbol, Globals.CVSWorkDir,
+                                           progress)
+        progress.finish("Added '%s' to CVS symbol cache" % (symbol), True)
+    else:
+        error("Skipping CVS symbol '%s'; it is not a valid git ref", symbol)
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_syncsymbols (*args):
+    """Do the 'syncsymbols' command; sync all symbols with CVS server."""
+    if args:
+        die_usage("'syncsymbols' takes no parameters")
+    progress = ProgressIndicator("    ", sys.stderr)
+    Globals.CVSSymbolCache.sync_all_symbols(Config.CVSRepo, progress,
+                                            valid_cvs_symbol)
+    progress.finish()
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_verify (*args):
+    """Do the 'verify' command; Compare CVS checkout and Git tree."""
+    if len(args) != 1:
+        die_usage("'verify' takes one parameter: symbol")
+    symbol = args[0]
+    gitref = cvs_to_refname(symbol)
+    assert valid_git_ref(gitref)
+
+    progress = ProgressIndicator("    ", sys.stderr)
+    progress("Checking out '%s' from CVS..." % (symbol))
+    Globals.CVSWorkDir.update(symbol)
+
+    add_env = {"GIT_INDEX_FILE": os.path.abspath(work_path("temp_index"))}
+    progress("Creating Git index from tree object @ '%s'..." % (gitref))
+    cmd = ("git", "read-tree", gitref)
+    assert run_command(cmd, add_env = add_env)[0] == 0
+
+    progress("Comparing CVS checkout to Git index...", True)
+    cmd = ("git", "--work-tree=%s" % (os.path.abspath(work_path("cvs"))),
+           "ls-files", "--exclude=CVS", "--deleted", "--modified", "--others",
+           "-t")
+    exit_code, output, errors = run_command(cmd, add_env = add_env)
+    assert exit_code == 0 and not errors
+
+    if output:
+        progress.finish("Failed verification of '%s'" % (symbol), True)
+        error("The '%s' command returned:\n---\n%s---", " ".join(cmd), output)
+    else:
+        progress.finish("Successfully verified '%s'" % (symbol), True)
+    print  # Terminate with a blank line
+    return exit_code
+
+
+def not_implemented (*args):
+    """Abort, while informing user that this command is not yet implemented."""
+    die_usage("Command not implemented")
+
+
+COMMANDS = {
+    "capabilities": do_capabilities,
+    "list": do_list,
+    # Special handling of 'import' in main()
+    # "import": do_import,
+    "export": not_implemented,
+    # Custom commands
+    "addsymbol": do_addsymbol,
+    "syncsymbols": do_syncsymbols,
+    "verify": do_verify,
+}
+
+
+class Globals(object):
+
+    """Global variables are placed here at the start of main()."""
+
+    pass
+
+
+def main (*args):
+    """Main program logic; execution starts here."""
+    debug("Invoked '%s'", " ".join(args))
+
+    # Initialization of subsystems
+    assert len(args) >= 2
+    # Read config for the given remote
+    Config.init(args[1])
+    # Local CVS symbol cache (CVS symbol -> CVS state mapping)
+    Globals.CVSSymbolCache = CVSSymbolCache(work_path("symbols"))
+    # Local CVS checkout
+    Globals.CVSWorkDir = CVSWorkDir(work_path("cvs"), Config.CVSRepo)
+    # Interface to 'git cat-file --batch'
+    Globals.GitObjectFetcher = GitObjectFetcher()
+    # Interface to Git object notes
+    Globals.GitNotes = GitNotes(Config.NotesRef, Globals.GitObjectFetcher)
+    # Mapping from Git commit objects to CVS states
+    Globals.CommitStates = CommitStates(Globals.GitNotes)
+    # Mapping from Git ref names to Git object names
+    Globals.GitRefMap = GitRefMap(Globals.GitObjectFetcher)
+    # Mapping from CVS revision to Git blob and commit objects
+    Globals.CVSRevisionMap = CVSRevisionMap(cvs_to_refname("_metadata"),
+                                            Globals.GitObjectFetcher)
+    last_mark = 0
+    if Globals.CVSRevisionMap.has_unresolved_marks():
+        # Update with marks from last import
+        last_mark = Globals.CVSRevisionMap.load_marks_file(work_path("marks"))
+    else:
+        # Truncate marks file.  We cannot automatically do this after
+        # .load_marks_file() above, since we cannot yet guarantee that
+        # we will be able to save the revision map persistently.  (That
+        # can only happen if we are given one or more import commands
+        # below.) We can only truncate this file when we know there are
+        # no unresolved marks in the revision map.
+        open(work_path("marks"), "w").close()
+    # Mapping from CVS states to commit objects that contain said state
+    Globals.CVSStateMap = CVSStateMap(Globals.CVSRevisionMap)
+
+    # Main program loop
+    import_refs = []  # Accumulate import commands here
+    # Cannot use "for line in sys.stdin" for buffering (?) reasons
+    line = sys.stdin.readline()
+    while (line):
+        cmdline = line.strip().split()
+        if not cmdline:
+            break  # Blank line means we're about to quit
+        debug("Got command '%s'", " ".join(cmdline))
+        cmd = cmdline.pop(0)
+        if cmd == "import":
+            import_refs.extend(cmdline)
+        else:
+            if cmd not in COMMANDS:
+                die_usage("Unknown command '%s'", cmd)
+            if COMMANDS[cmd](*cmdline):
+                die("Command '%s' failed", line.strip())
+            sys.stdout.flush()
+        line = sys.stdin.readline()
+
+    # Trigger import processing after last import command
+    ret = 0
+    if import_refs:
+        # Init producer of output in the git-fast-import format
+        Globals.GitFastImport = GitFastImport(
+            sys.stdout, Globals.GitObjectFetcher, last_mark)
+
+        # Perform import of given refs
+        ret = do_import(*import_refs)
+
+        # Notes on persistent storage of subsystems' data structures:
+        #
+        # Because the "import" command has been called, we here _know_
+        # that there is a fast-import process running in parallel.
+        # (This is NOT the case when there are no "import" commands).
+        # We can therefore now (and only now) safely commit the extra
+        # information that we store in the Git repo.
+        # In other words, the data structures that we commit to
+        # persistent storage with the following calls will NOT be
+        # committed if there are no "import" commands.  The data
+        # structures must handle this in one of two ways:
+        # - In the no-"import" scenario, there is simply nothing to
+        #   commit, so it can safely be skipped.
+        # - Any information that should have been committed in the
+        #   no-"import" scenario can be reconstructed repeatedly in
+        #   subsequent executions of this program, until the next
+        #   invocation of an "import" command provides an opportunity
+        #   to commit the data structure to persistent storage.
+
+        # Write out commit notes (mapping git commits to CVSStates)
+        # The following call would be a no-op in the no-"import" case
+        Globals.GitNotes.commit_notes(Globals.GitFastImport, Config.Author,
+            'Annotate commits imported by "git remote-cvs"\n')
+
+        # Save CVS revision metadata
+        # This data structure can handle the no-"import" case as long
+        # as the marks file from the last fast-import run is still
+        # present upon the next execution of this program.
+        Globals.CVSRevisionMap.commit_map(Globals.GitFastImport, Config.Author,
+            'Updated metadata used by "git remote-cvs"\n')
+
+    return ret
+
+
+if __name__ == '__main__':
+    sys.exit(main(*sys.argv))
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 15/19] Add simple selftests of git-remote-cvs functionality
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

Add two new selftests:

- t9800-remote-cvs-basic: Test the git-remote-cvs implementation of the
  remote helper API, by verifying the expected output of the git-remote-cvs
  program when invoking remote helper API commands while doing some simple
  CVS operations.

- t9801-remote-cvs-fetch: A more high-level test of the fetch/import-side
  of the git-remote-cvs implementation, by verifying the expected repository
  state after doing "git fetch" from a CVS remote where a variety of CVS
  operations are being performed.

Signed-off-by: Johan Herland <johan@herland.net>
---

	Unchanged.

 t/t9800-remote-cvs-basic.sh |  524 +++++++++++++++++++++++++++++++++++++++++++
 t/t9801-remote-cvs-fetch.sh |  291 ++++++++++++++++++++++++
 2 files changed, 815 insertions(+), 0 deletions(-)
 create mode 100755 t/t9800-remote-cvs-basic.sh
 create mode 100755 t/t9801-remote-cvs-fetch.sh

diff --git a/t/t9800-remote-cvs-basic.sh b/t/t9800-remote-cvs-basic.sh
new file mode 100755
index 0000000..6ddec17
--- /dev/null
+++ b/t/t9800-remote-cvs-basic.sh
@@ -0,0 +1,524 @@
+#!/bin/sh
+
+test_description='git remote-cvs basic tests'
+. ./test-lib.sh
+
+if ! test_have_prereq PYTHON; then
+	say 'skipping CVS foreign-vcs helper tests, python not available'
+	test_done
+fi
+
+CVS_EXEC=cvs
+CVS_OPTS="-f -q"
+CVS="$CVS_EXEC $CVS_OPTS"
+
+CVSROOT=$(pwd)/cvsroot
+export CVSROOT
+unset CVS_SERVER
+
+CVSMODULE=cvsmodule
+GITREMOTE=cvsremote
+
+if ! type $CVS_EXEC >/dev/null 2>&1
+then
+	say 'skipping remote-cvs tests, $CVS_EXEC not found'
+	test_done
+fi
+
+test_expect_success 'setup cvsroot' '$CVS init'
+
+test_expect_success '#1: setup a cvs module' '
+
+	mkdir "$CVSROOT/$CVSMODULE" &&
+	$CVS co -d module-cvs $CVSMODULE &&
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortuna
+velut luna
+statu variabilis,
+
+semper crescis
+aut decrescis;
+vita detestabilis
+
+nunc obdurat
+et tunc curat
+ludo mentis aciem,
+
+egestatem,
+potestatem
+dissolvit ut glaciem.
+EOF
+		$CVS add o_fortuna &&
+		cat <<EOF >message &&
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+EOF
+		$CVS commit -f -F message o_fortuna
+	)
+'
+
+test_expect_success 'set up CVS repo as a foreign remote' '
+
+	git config "user.name" "Test User"
+	git config "user.email" "test@example.com"
+	git config "remote.$GITREMOTE.vcs" cvs
+	git config "remote.$GITREMOTE.cvsRoot" "$CVSROOT"
+	git config "remote.$GITREMOTE.cvsModule" "$CVSMODULE"
+	git config "remote.$GITREMOTE.fetch" \
+		"+refs/cvs/$GITREMOTE/*:refs/remotes/$GITREMOTE/*"
+
+'
+
+test_expect_success '#1: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#1: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+? refs/cvs/$GITREMOTE/HEAD
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#1: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+# Importing CVS revision o_fortuna:1.1
+blob
+mark :1
+data 180
+O Fortuna
+velut luna
+statu variabilis,
+
+semper crescis
+aut decrescis;
+vita detestabilis
+
+nunc obdurat
+et tunc curat
+ludo mentis aciem,
+
+egestatem,
+potestatem
+dissolvit ut glaciem.
+
+commit refs/cvs/$GITREMOTE/HEAD
+mark :2
+data 82
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+
+M 644 :1 o_fortuna
+
+# Importing note for object 2
+blob
+mark :3
+data 14
+o_fortuna:1.1
+
+commit refs/notes/cvs/$GITREMOTE
+mark :4
+data 46
+Annotate commits imported by "git remote-cvs"
+
+N :3 :2
+
+blob
+mark :5
+data 32
+1 o_fortuna:1.1
+2 o_fortuna:1.1
+
+blob
+mark :6
+data 16
+blob 1
+commit 2
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :7
+data 42
+Updated metadata used by "git remote-cvs"
+
+M 644 :5 CVS/marks
+M 644 :6 o_fortuna/1.1
+
+EOF
+	grep -v "^committer " actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#1: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#1: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_expect_success '#2: update cvs module' '
+
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortune,
+like the moon
+you are changeable,
+
+ever waxing
+and waning;
+hateful life
+
+first oppresses
+and then soothes
+as fancy takes it;
+
+poverty
+and power
+it melts them like ice.
+EOF
+		cat <<EOF >message &&
+translate to English
+
+My Latin is terrible.
+EOF
+		$CVS commit -f -F message o_fortuna
+	)
+'
+
+test_expect_success '#2: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#2: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+? refs/cvs/$GITREMOTE/HEAD
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#2: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+# Importing CVS revision o_fortuna:1.2
+blob
+mark :8
+data 179
+O Fortune,
+like the moon
+you are changeable,
+
+ever waxing
+and waning;
+hateful life
+
+first oppresses
+and then soothes
+as fancy takes it;
+
+poverty
+and power
+it melts them like ice.
+
+commit refs/cvs/$GITREMOTE/HEAD
+mark :9
+data 44
+translate to English
+
+My Latin is terrible.
+
+from refs/cvs/$GITREMOTE/HEAD^0
+M 644 :8 o_fortuna
+
+# Importing note for object 9
+blob
+mark :10
+data 14
+o_fortuna:1.2
+
+commit refs/notes/cvs/$GITREMOTE
+mark :11
+data 46
+Annotate commits imported by "git remote-cvs"
+
+from refs/notes/cvs/$GITREMOTE^0
+N :10 :9
+
+blob
+mark :12
+data 32
+8 o_fortuna:1.2
+9 o_fortuna:1.2
+
+blob
+mark :13
+data 94
+
+blob
+mark :14
+data 16
+blob 8
+commit 9
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :15
+data 42
+Updated metadata used by "git remote-cvs"
+
+from refs/cvs/$GITREMOTE/_metadata^0
+M 644 :12 CVS/marks
+M 644 :13 o_fortuna/1.1
+M 644 :14 o_fortuna/1.2
+
+EOF
+	grep -v -e "^committer " -e "\b[0-9a-f]\{40\}\b" actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#2: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--import-marks=".git/info/cvs/$GITREMOTE/marks" \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#2: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_expect_success '#3: update cvs module' '
+
+	(
+		cd module-cvs &&
+		echo 1 >tick &&
+		$CVS add tick &&
+		$CVS commit -f -m 1 tick
+	)
+
+'
+
+test_expect_success '#3: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#3: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+? refs/cvs/$GITREMOTE/HEAD
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#3: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+# Importing CVS revision tick:1.1
+blob
+mark :16
+data 2
+1
+
+commit refs/cvs/$GITREMOTE/HEAD
+mark :17
+data 2
+1
+
+from refs/cvs/$GITREMOTE/HEAD^0
+M 644 :16 tick
+
+# Importing note for object 17
+blob
+mark :18
+data 23
+o_fortuna:1.2
+tick:1.1
+
+commit refs/notes/cvs/$GITREMOTE
+mark :19
+data 46
+Annotate commits imported by "git remote-cvs"
+
+from refs/notes/cvs/$GITREMOTE^0
+N :18 :17
+
+blob
+mark :20
+data 41
+16 tick:1.1
+17 tick:1.1
+17 o_fortuna:1.2
+
+blob
+mark :21
+data 104
+commit 17
+
+blob
+mark :22
+data 18
+blob 16
+commit 17
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :23
+data 42
+Updated metadata used by "git remote-cvs"
+
+from refs/cvs/$GITREMOTE/_metadata^0
+M 644 :20 CVS/marks
+M 644 :21 o_fortuna/1.2
+M 644 :22 tick/1.1
+
+EOF
+	grep -v -e "^committer " -e "\b[0-9a-f]\{40\}\b" actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#3: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--import-marks=".git/info/cvs/$GITREMOTE/marks" \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#3: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_expect_success '#4: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+commit=$(git rev-parse "refs/cvs/$GITREMOTE/HEAD")
+
+test_expect_success '#4: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+$commit refs/cvs/$GITREMOTE/HEAD unchanged
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#4: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+blob
+mark :24
+data 0
+
+blob
+mark :25
+data 142
+
+blob
+mark :26
+data 94
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :27
+data 42
+Updated metadata used by "git remote-cvs"
+
+from refs/cvs/$GITREMOTE/_metadata^0
+M 644 :24 CVS/marks
+M 644 :25 o_fortuna/1.2
+M 644 :26 tick/1.1
+
+EOF
+	grep -v -e "^committer " -e "\b[0-9a-f]\{40\}\b" actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#4: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--import-marks=".git/info/cvs/$GITREMOTE/marks" \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#4: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_done
diff --git a/t/t9801-remote-cvs-fetch.sh b/t/t9801-remote-cvs-fetch.sh
new file mode 100755
index 0000000..93d44a7
--- /dev/null
+++ b/t/t9801-remote-cvs-fetch.sh
@@ -0,0 +1,291 @@
+#!/bin/sh
+
+test_description='git remote-cvs basic tests'
+. ./test-lib.sh
+
+if ! test_have_prereq PYTHON; then
+	say 'skipping CVS foreign-vcs helper tests, python not available'
+	test_done
+fi
+
+CVS_EXEC=cvs
+CVS_OPTS="-f -q"
+CVS="$CVS_EXEC $CVS_OPTS"
+
+CVSROOT=$(pwd)/cvsroot
+export CVSROOT
+unset CVS_SERVER
+
+CVSMODULE=cvsmodule
+GITREMOTE=cvsremote
+
+if ! type $CVS_EXEC >/dev/null 2>&1
+then
+	say 'skipping remote-cvs tests, $CVS_EXEC not found'
+	test_done
+fi
+
+verify () {
+	git log --reverse --format="--- %T%n%s%n%n%b" "$GITREMOTE/$1" >actual &&
+	test_cmp "expect.$1" actual &&
+	echo "verify $1" | git remote-cvs "$GITREMOTE"
+}
+
+test_expect_success 'setup CVS repo' '$CVS init'
+
+test_expect_success 'create CVS module with initial commit' '
+
+	mkdir "$CVSROOT/$CVSMODULE" &&
+	$CVS co -d module-cvs $CVSMODULE &&
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortuna
+velut luna
+statu variabilis,
+
+semper crescis
+aut decrescis;
+vita detestabilis
+
+nunc obdurat
+et tunc curat
+ludo mentis aciem,
+
+egestatem,
+potestatem
+dissolvit ut glaciem.
+EOF
+		$CVS add o_fortuna &&
+		cat <<EOF >message &&
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+EOF
+		$CVS commit -f -F message o_fortuna
+	)
+'
+
+test_expect_success 'set up CVS repo/module as a foreign remote' '
+
+	git config "user.name" "Test User"
+	git config "user.email" "test@example.com"
+	git config "remote.$GITREMOTE.vcs" cvs
+	git config "remote.$GITREMOTE.cvsRoot" "$CVSROOT"
+	git config "remote.$GITREMOTE.cvsModule" "$CVSMODULE"
+	git config "remote.$GITREMOTE.fetch" \
+		"+refs/cvs/$GITREMOTE/*:refs/remotes/$GITREMOTE/*"
+
+'
+
+test_expect_success 'initial fetch from CVS remote' '
+
+	cat <<EOF >expect.HEAD &&
+--- 0e06d780dedab23e683c686fb041daa9a84c936c
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit' '
+
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortune,
+like the moon
+you are changeable,
+
+ever waxing
+and waning;
+hateful life
+
+first oppresses
+and then soothes
+as fancy takes it;
+
+poverty
+and power
+it melts them like ice.
+EOF
+		cat <<EOF >message &&
+translate to English
+
+My Latin is terrible.
+EOF
+		$CVS commit -f -F message o_fortuna
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- daa87269a5e00388135ad9542dc16ab6754466e5
+translate to English
+
+My Latin is terrible.
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit with new file' '
+
+	(
+		cd module-cvs &&
+		echo 1 >tick &&
+		$CVS add tick &&
+		$CVS commit -f -m 1 tick
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- 486935b4fccecea9b64cbed3a797ebbcbe2b7461
+1
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'fetch without CVS changes' '
+
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'add 2 CVS commits' '
+
+	(
+		cd module-cvs &&
+		echo 2 >tick &&
+		$CVS commit -f -m 2 tick &&
+		echo 3 >tick &&
+		$CVS commit -f -m 3 tick
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- 83437ab3e57bf0a42915de5310e3419792b5a36f
+2
+
+
+--- 60fc50406a82dc6bd32dc6e5f7bd23e4c3cdf7ef
+3
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit with removed file' '
+
+	(
+		cd module-cvs &&
+		$CVS remove -f tick &&
+		$CVS commit -f -m "remove file" tick
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- daa87269a5e00388135ad9542dc16ab6754466e5
+remove file
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit with several new files' '
+
+	(
+		cd module-cvs &&
+		echo spam >spam &&
+		echo sausage >sausage &&
+		echo eggs >eggs &&
+		$CVS add spam sausage eggs &&
+		$CVS commit -f -m "spam, sausage, and eggs" spam sausage eggs
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- 3190dfce44a6d5e9916b4870dbf8f37d1ca4ddaf
+spam, sausage, and eggs
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'new CVS branch' '
+
+	(
+		cd module-cvs &&
+		$CVS tag -b foo
+	) &&
+	cp expect.HEAD expect.foo &&
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo
+
+'
+
+test_expect_success 'CVS commit on branch' '
+
+	(
+		cd module-cvs &&
+		$CVS up -r foo &&
+		echo "spam spam spam" >spam &&
+		$CVS commit -f -m "commit on branch foo" spam
+	) &&
+	cat <<EOF >>expect.foo &&
+--- 1aba123e5c83898ce3a8b976cc6064d60246aef4
+commit on branch foo
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo
+
+'
+
+test_expect_success 'create CVS tag' '
+
+	(
+		cd module-cvs &&
+		$CVS tag bar
+	) &&
+	cp expect.foo expect.bar &&
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo &&
+	verify bar
+
+'
+
+test_expect_success 'another CVS commit on branch' '
+
+	(
+		cd module-cvs &&
+		echo "spam spam spam spam spam spam" >> spam &&
+		$CVS commit -f -m "another commit on branch foo" spam
+	) &&
+	cat <<EOF >>expect.foo &&
+--- 15a2635e76e8e5a5a8746021643de317452f2340
+another commit on branch foo
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo &&
+	verify bar
+
+'
+
+test_done
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 13/19] 2/2: Add Python support library for CVS remote helper
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland, David Aguilar
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

This patch introduces the rest of a Python package called "git_remote_cvs"
containing the building blocks of the CVS remote helper. The CVS remote
helper itself is NOT part of this patch.

The patch includes the necessary Makefile additions to build and install
the git_remote_cvs Python package along with the rest of Git.

This patch has been improved by the following contributions:
- David Aguilar: Lots of Python coding style fixes
- David Aguilar: DESTDIR support in Makefile

Cc: David Aguilar <davvid@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

	Unchanged.

 Makefile                           |    9 +
 git_remote_cvs/.gitignore          |    2 +
 git_remote_cvs/Makefile            |   35 ++
 git_remote_cvs/__init__.py         |   27 ++
 git_remote_cvs/commit_states.py    |   62 ++++
 git_remote_cvs/cvs_revision_map.py |  418 ++++++++++++++++++++++
 git_remote_cvs/git.py              |  680 ++++++++++++++++++++++++++++++++++++
 git_remote_cvs/setup.py            |   17 +
 8 files changed, 1250 insertions(+), 0 deletions(-)
 create mode 100644 git_remote_cvs/.gitignore
 create mode 100644 git_remote_cvs/Makefile
 create mode 100644 git_remote_cvs/__init__.py
 create mode 100644 git_remote_cvs/commit_states.py
 create mode 100644 git_remote_cvs/cvs_revision_map.py
 create mode 100644 git_remote_cvs/git.py
 create mode 100644 git_remote_cvs/setup.py

diff --git a/Makefile b/Makefile
index b27a7d6..944f4d0 100644
--- a/Makefile
+++ b/Makefile
@@ -1399,6 +1399,9 @@ endif
 ifndef NO_PERL
 	$(QUIET_SUBDIR0)perl $(QUIET_SUBDIR1) PERL_PATH='$(PERL_PATH_SQ)' prefix='$(prefix_SQ)' all
 endif
+ifndef NO_PYTHON
+	$(QUIET_SUBDIR0)git_remote_cvs $(QUIET_SUBDIR1) PYTHON_PATH='$(PYTHON_PATH_SQ)' prefix='$(prefix_SQ)' all
+endif
 	$(QUIET_SUBDIR0)templates $(QUIET_SUBDIR1)
 
 please_set_SHELL_PATH_to_a_more_modern_shell:
@@ -1741,6 +1744,9 @@ install: all
 ifndef NO_PERL
 	$(MAKE) -C perl prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' install
 endif
+ifndef NO_PYTHON
+	$(MAKE) -C git_remote_cvs prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' install
+endif
 ifndef NO_TCLTK
 	$(MAKE) -C gitk-git install
 	$(MAKE) -C git-gui gitexecdir='$(gitexec_instdir_SQ)' install
@@ -1855,6 +1861,9 @@ ifndef NO_PERL
 	$(RM) gitweb/gitweb.cgi
 	$(MAKE) -C perl clean
 endif
+ifndef NO_PYTHON
+	$(MAKE) -C git_remote_cvs clean
+endif
 	$(MAKE) -C templates/ clean
 	$(MAKE) -C t/ clean
 ifndef NO_TCLTK
diff --git a/git_remote_cvs/.gitignore b/git_remote_cvs/.gitignore
new file mode 100644
index 0000000..2247d5f
--- /dev/null
+++ b/git_remote_cvs/.gitignore
@@ -0,0 +1,2 @@
+/build
+/dist
diff --git a/git_remote_cvs/Makefile b/git_remote_cvs/Makefile
new file mode 100644
index 0000000..061c247
--- /dev/null
+++ b/git_remote_cvs/Makefile
@@ -0,0 +1,35 @@
+#
+# Makefile for the git_remote_cvs python support modules
+#
+pysetupfile:=setup.py
+
+# Shell quote (do not use $(call) to accommodate ancient setups);
+DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
+
+ifndef PYTHON_PATH
+	PYTHON_PATH = /usr/bin/python
+endif
+ifndef prefix
+	prefix = $(HOME)
+endif
+ifndef V
+	QUIET = @
+	QUIETSETUP = --quiet
+endif
+
+PYLIBDIR=$(shell $(PYTHON_PATH) -c \
+	 "import sys; \
+	 print 'lib/python%i.%i/site-packages' % sys.version_info[:2]")
+
+all: $(pysetupfile)
+	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) build
+
+install: $(pysetupfile)
+	$(PYTHON_PATH) $(pysetupfile) install --prefix $(DESTDIR_SQ)$(prefix)
+
+instlibdir: $(pysetupfile)
+	@echo "$(prefix)/$(PYLIBDIR)"
+
+clean:
+	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) clean -a
+	$(RM) *.pyo *.pyc
diff --git a/git_remote_cvs/__init__.py b/git_remote_cvs/__init__.py
new file mode 100644
index 0000000..4956d2d
--- /dev/null
+++ b/git_remote_cvs/__init__.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python
+
+"""Support library package for git-remote-cvs.
+
+git-remote-cvs is a Git remote helper command that interfaces with a
+CVS repository to provide automatic import of CVS history into a Git
+repository.
+
+This package provides the support library needed by git-remote-cvs.
+The following modules are included:
+
+- cvs - Interaction with CVS repositories
+
+- cvs_symbol_cache - Local CVS symbol cache
+
+- changeset - Collect individual CVS revisions into commits
+
+- git - Interaction with Git repositories
+
+- commit_states - Map Git commits to CVS states
+
+- cvs_revision_map - Map CVS revisions to various metainformation
+
+- util - General utility functionality use by the other modules in
+         this package, and also used directly by git-remote-cvs.
+
+"""
diff --git a/git_remote_cvs/commit_states.py b/git_remote_cvs/commit_states.py
new file mode 100644
index 0000000..2e0af6c
--- /dev/null
+++ b/git_remote_cvs/commit_states.py
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+
+"""Code for relating Git commits to corresponding CVSState objects."""
+
+from git_remote_cvs.util import debug, error, die
+from git_remote_cvs.cvs import CVSState
+
+
+class CommitStates(object):
+
+    """Provide a mapping from Git commits to CVSState objects.
+
+    Behaves like a dictionary of Git commit names -> CVSState mappings.
+
+    Every Git commit converted from CVS has a corresponding CVSState,
+    which describes exactly which CVS revisions are present in a
+    checkout of that commit.
+
+    This class provides the interface to map from a Git commit to its
+    corresponding CVSState.  The mapping uses GitNotes as a persistent
+    storage backend.
+
+    """
+
+    def __init__ (self, notes):
+        """Create a new Git commit -> CVSState map."""
+        self.notes = notes
+        self._cache = {}  # commitname -> CVSState
+
+    def _load (self, commit):
+        """Retrieve the CVSState associated with the given Git commit."""
+        if commit is None:
+            return None
+        if commit in self._cache:
+            return self._cache[commit]
+        notedata = self.notes.get(commit)
+        if notedata is None:  # Given commit has no associated note
+            return None
+        state = CVSState()
+        state.load_data(notedata)
+        self._cache[commit] = state
+        return state
+
+    def add (self, commit, state, gfi):
+        """Add the given Git commit -> CVSState mapping."""
+        assert commit not in self._cache
+        self._cache[commit] = state
+        self.notes.import_note(commit, str(state), gfi)
+
+    def __getitem__ (self, commit):
+        """Return the CVSState associated with the given commit."""
+        state = self._load(commit)
+        if state is None:
+            raise KeyError("Unknown commit '%s'" % (commit))
+        return state
+
+    def get (self, commit, default = None):
+        """Return the CVSState associated with the given commit."""
+        state = self._load(commit)
+        if state is None:
+            return default
+        return state
diff --git a/git_remote_cvs/cvs_revision_map.py b/git_remote_cvs/cvs_revision_map.py
new file mode 100644
index 0000000..0e65ba6
--- /dev/null
+++ b/git_remote_cvs/cvs_revision_map.py
@@ -0,0 +1,418 @@
+#!/usr/bin/env python
+
+"""Functionality for mapping CVS revisions to associated metainfo.
+
+This modules provides the following main classes:
+
+CVSRevisionMap - provides a mapping from CVS revisions to the
+                 the following associated metainfo:
+                 - Mode/permission of the associated CVS path
+                 - The Git blob holding the revision data
+                 - The Git commits that correspond to the CVS
+                   states in which this CVS revision is present
+
+CVSStateMap - provides a mapping from CVS states to corresponding
+              Git commits (i.e. Git commits, whose tree state is
+              identical to a given CVS state)
+"""
+
+import os
+
+from git_remote_cvs.util import debug, error, die, file_reader_method
+from git_remote_cvs.cvs import CVSNum, CVSDate
+from git_remote_cvs.git import GitFICommit
+
+
+class _CVSPathInfo(object):
+
+    """Information on a single CVS path."""
+
+    __slots__ = ('revs', 'mode')
+
+    def __init__ (self, mode = None):
+        self.revs = {}
+        self.mode = mode
+
+
+class _CVSRevInfo(object):
+
+    """Information on a single CVS revision."""
+
+    __slots__ = ('blob', 'commits')
+
+    def __init__ (self, blob):
+        self.blob = blob
+        self.commits = []
+
+
+class CVSRevisionMap(object):
+
+    """Encapsulate the mapping of CVS revisions to associated metainfo.
+
+    This container maps CVS revisions (a combination of a CVS path and
+    a CVS revision number) into Git blob names, Git commit names, and
+    CVS path information.  Git object (blob/commit) names are either
+    40-char SHA1 strings, or "git fast-import" mark numbers if the form
+    ":<num>".
+
+    The data structure is organized as follows:
+    - A CVSRevisionMap instance knows about a set of CVS paths.
+      For each CVS path, the following is known:
+      - The mode (permission bits) of that CVS path (644 or 755)
+      - The CVS revision numbers that exist for that CVS path.
+        For each revision number, the following is known:
+        - Exactly 1 blob name; the Git blob containing the contents of
+          the revision (the contents of the CVS path at that CVS
+          revision).
+        - One or more commit names; the Git commits which encapsulate a
+          CVS state in which this CVS revision
+
+    To store this data structure persistently, this class uses a Git
+    ref that points to a tree structure containing the above
+    information.  When changes to the structure are made, this class
+    will produce git-fast-import commands to update that tree structure
+    accordingly.
+
+    IMPORTANT: No changes to the CVSRevisionMap will be stored unless
+    self.commit_map() is called with a valid GitFastImport instance.
+
+    NOTES: Mark numbers are only transient references bound to the
+    current "git fast-import" process (assumed to be running alongside
+    this process).  Therefore, when the "git fast-import" process ends,
+    it must write out a mark number -> SHA1 mapping (see the
+    "--export-marks" argument to "git fast-import").  Subsequently,
+    this mapping must be parsed, and the mark numbers in this
+    CVSRevisionMap must be resolved into persistent SHA1 names.
+    In order to quickly find all the unresolved mark number entries in
+    the data structure, and index mapping mark numbers to revisions is
+    kept in a separate file in the tree structure.
+    See the loadMarksFile() method for more details.
+
+    """
+
+    MarkNumIndexFile = "CVS/marks"  # Invalid CVS path name
+
+    def __init__ (self, git_ref, obj_fetcher):
+        """Create a new CVS revision map bound to the given Git ref."""
+        self.git_ref = git_ref
+        self.obj_fetcher = obj_fetcher
+
+        # The following data structure is a cache of the persistent
+        # data structure found at self.git_ref.
+        # It is structured as follows:
+        # - self.d is a mapping from CVS paths to _CVSPathInfo objects.
+        #   _CVSPathInfo object have two fields: revs, mode:
+        #   - mode is either 644 (non-executable) or 755 (executable).
+        #   - revs is a dict, mapping CVS revision numbers (CVSNum
+        #     instances) to _CVSRevInfo objects.  _CVSRevInfo objects
+        #     have two fields: blob, commits:
+        #     - blob is the name of the Git blob object holding the
+        #       contents of that revision.
+        #     - commits is a collection of zero or more Git commit
+        #       names where the commit contains this revision.
+        self.d = {}  # dict: path -> _CVSPathInfo
+        self.mods = set()  # (path, revnum) pairs for all modified revs
+        self.marknum_idx = {}  # dict: mark num -> [(path, revnum), ...]
+        self._load_marknum_idx()
+
+    def __del__ (self):
+        """Verify that self.commit_map() was called before destruction."""
+        if self.mods:
+            error("Missing call to self.commit_map().")
+            error("%i revision changes are lost!", len(self.mods))
+
+    def __nonzero__ (self):
+        """Return True iff any information is currently stored here."""
+        return bool(self.d)
+
+    # Private methods:
+
+    def _add_to_marknum_idx(self, marknum, path, revnum):
+        """Add the given marknum -> (path, revnum) association."""
+        entry = self.marknum_idx.setdefault(marknum, [])
+        entry.append((path, revnum))
+
+    def _load_marknum_idx(self):
+        """Load contents of MarkNumIndexFile into self.marknum_idx."""
+        blobspec = "%s:%s" % (self.git_ref, self.MarkNumIndexFile)
+        try:
+            f = self.obj_fetcher.open_obj(blobspec)
+        except KeyError:
+            return  # Missing object; nothing to do
+
+        for line in f:
+            # Format of line is "<marknum> <path>:<revnum>"
+            mark, rest = line.strip().split(' ', 1)
+            path, rev = rest.rsplit(':', 1)
+            self._add_to_marknum_idx(int(mark), path, CVSNum(rev))
+        f.close()
+
+    def _save_marknum_idx(self):
+        """Prepare data for storage into MarkNumIndexFile.
+
+        The returned string contains the current contents of
+        self.marknum_idx, formatted to be stored verbatim in
+        self.MarkNumIndexFile.
+
+        """
+        lines = []
+        for marknum, revs in sorted(self.marknum_idx.iteritems()):
+            for path, revnum in revs:
+                # Format of line is "<marknum> <path>:<revnum>"
+                line = "%i %s:%s\n" % (marknum, path, revnum)
+                lines.append(line)
+        return "".join(lines)
+
+    def _save_rev(self, path, revnum):
+        """Return blob data for storing the given revision persistently.
+
+        Generate the blob contents that will reconstitute the same
+        revision entry when read back in with _fetch_path().
+
+        """
+        lines = []
+        rev_info = self.d[path].revs[revnum]
+        lines.append("blob %s\n" % (rev_info.blob))
+        for commit in rev_info.commits:
+            lines.append("commit %s\n" % (commit))
+        return "".join(lines)
+
+    @staticmethod
+    def _valid_objname (objname):
+        """Return the argument as a SHA1 (string) or mark num (int)."""
+        # Blob names are either 40-char SHA1 strings, or mark numbers
+        if isinstance(objname, int) or len(objname) != 40:  # Mark number
+            return int(objname)
+        return objname
+
+    def _load (self, path, mode, data):
+        """GitObjectFetcher.walk_tree() callback."""
+        assert mode in (644, 755)
+        cvs_path, revnum = os.path.split(path)
+        revnum = CVSNum(revnum)
+        if cvs_path in self.d:
+            assert mode == self.d[cvs_path].mode
+        else:
+            self.d[cvs_path] = _CVSPathInfo(mode)
+        assert revnum not in self.d[cvs_path].revs
+        rev_info = None
+        for line in data.split("\n"):
+            if not line:
+                continue
+            t, objname = line.split()
+            objname = self._valid_objname(objname)
+            if t == "blob":
+                assert rev_info is None
+                rev_info = _CVSRevInfo(objname)
+            elif t == "commit":
+                assert rev_info is not None
+                rev_info.commits.append(objname)
+            else:
+                assert False, "Unknown type '%s'" % (t)
+        assert rev_info.commits  # Each rev is in at least one commit
+        self.d[cvs_path].revs[revnum] = rev_info
+
+    def _fetch_path (self, path):
+        """If the given path exists, create a path entry in self.d."""
+        tree_spec = "%s:%s" % (self.git_ref, path)
+        self.obj_fetcher.walk_tree(tree_spec, self._load, path)
+        # TODO: Don't load entire tree at once?
+
+    # Public methods:
+
+    def has_path (self, path):
+        """Return True iff the given path is present."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return path in self.d
+
+    def has_rev (self, path, revnum):
+        """Return True iff the given path:revnum is present."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return path in self.d and revnum in self.d[path].revs
+
+    def get_mode (self, path):
+        """Return mode bits for the given path."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return self.d[path].mode
+
+    def get_blob (self, path, revnum):
+        """Return the blob name for the given revision."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return self.d[path].revs[revnum].blob
+
+    def get_commits (self, path, revnum):
+        """Return the commit names containing the given revision."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return self.d[path].revs[revnum].commits
+
+    def has_unresolved_marks (self):
+        """Return True iff there are mark numbers in the data structure."""
+        return self.marknum_idx
+
+    # Public non-const methods
+
+    def add_path (self, path, mode):
+        """Add the given path and associated mode bits to this map."""
+        if path not in self.d:
+            self._fetch_path(path)
+        if path in self.d:
+            if self.d[path].mode:
+                assert mode == self.d[path].mode, \
+                    "The mode of %s has changed from %s " \
+                    "to %s since the last import.  This " \
+                    "is not supported." % (
+                        path, self.d[path].mode, mode)
+            else:
+                self.d[path].mode = mode
+        else:
+            self.d[path] = _CVSPathInfo(mode)
+        # Do not add to self.mods yet, as we expect revisions to be
+        # added before commit_map() is called
+
+    def add_blob (self, path, revnum, blobname):
+        """Add the given path:revnum -> blobname association."""
+        assert blobname
+        if path not in self.d:
+            self._fetch_path(path)
+        blobname = self._valid_objname(blobname)
+        if isinstance(blobname, int):  # Mark number
+            self._add_to_marknum_idx(blobname, path, revnum)
+        entry = self.d.setdefault(path, _CVSPathInfo())
+        assert revnum not in entry.revs
+        entry.revs[revnum] = _CVSRevInfo(blobname)
+        self.mods.add((path, revnum))
+
+    def add_commit (self, path, revnum, commitname):
+        """Add the given path:revnum -> commitname association."""
+        if path not in self.d:
+            self._fetch_path(path)
+        commitname = self._valid_objname(commitname)
+        if isinstance(commitname, int):  # Mark number
+            self._add_to_marknum_idx(commitname, path, revnum)
+        assert revnum in self.d[path].revs
+        assert commitname not in self.d[path].revs[revnum].commits
+        self.d[path].revs[revnum].commits.append(commitname)
+        self.mods.add((path, revnum))
+
+    @file_reader_method(missing_ok = True)
+    def load_marks_file (self, f):
+        """Load mark -> SHA1 mappings from git-fast-import marks file.
+
+        Replace all mark numbers with proper SHA1 names in this data
+        structure (using self.marknum_idx to find them quickly).
+
+        """
+        if not f:
+            return 0
+        marks = {}
+        last_mark = 0
+        for line in f:
+            (mark, sha1) = line.strip().split()
+            assert mark.startswith(":")
+            mark = int(mark[1:])
+            assert mark not in marks
+            marks[mark] = sha1
+            if mark > last_mark:
+                last_mark = mark
+        for marknum, revs in self.marknum_idx.iteritems():
+            sha1 = marks[marknum]
+            for path, revnum in revs:
+                if path not in self.d:
+                    self._fetch_path(path)
+                rev_info = self.d[path].revs[revnum]
+                if rev_info.blob == marknum:  # Replace blobname
+                    rev_info.blob = sha1
+                else:  # Replace commitname
+                    assert marknum in rev_info.commits
+                    i = rev_info.commits.index(marknum)
+                    rev_info.commits[i] = sha1
+                    assert marknum not in rev_info.commits
+                self.mods.add((path, revnum))
+        self.marknum_idx = {}  # Resolved all transient mark numbers
+        return last_mark
+
+    def sync_modeinfo_from_cvs (self, cvs):
+        """Update with mode information from current CVS checkout.
+
+        This method will retrieve mode information on all paths in the
+        current CVS checkout, and update this data structure
+        correspondingly.  In the case where mode information is already
+        present for a given CVS path, this method will verify that such
+        information is correct.
+
+        """
+        for path, mode in cvs.get_modeinfo().iteritems():
+            self.add_path(path, mode)
+
+    def commit_map (self, gfi, author, message):
+        """Produce git-fast-import commands for storing changes.
+
+        The given GitFastImport object is used to produce a commit
+        making the changes done to this data structure persistent.
+
+        """
+        now = CVSDate("now")
+        commitdata = GitFICommit(
+            author[0], author[1], now.ts, now.tz_str(), message)
+
+        # Add updated MarkNumIndexFile to commit
+        mark = gfi.blob(self._save_marknum_idx())
+        commitdata.modify(644, mark, self.MarkNumIndexFile)
+
+        for path, revnum in self.mods:
+            mark = gfi.blob(self._save_rev(path, revnum))
+            mode = self.d[path].mode
+            assert mode in (644, 755)
+            commitdata.modify(mode, mark, "%s/%s" % (path, revnum))
+
+        gfi.commit(self.git_ref, commitdata)
+        self.mods = set()
+
+
+class CVSStateMap(object):
+
+    """Map CVSState object to the commit names which produces that state."""
+
+    def __init__ (self, cvs_rev_map):
+        """Create a CVSStateMap object, bound to the given CVS revision map."""
+        self.cvs_rev_map = cvs_rev_map
+
+    def get_commits (self, state):
+        """Map the given CVSState to commits that contain this state.
+
+        Return all commits where the given state is a subset of the
+        state produced by that commit.
+
+        Returns a set of commit names.  The set may be empty.
+
+        """
+        candidate_commits = None
+        for path, revnum in state:
+            commits = self.cvs_rev_map.get_commits(path, revnum)
+            if candidate_commits is None:
+                candidate_commits = set(commits)
+            else:
+                candidate_commits.intersection_update(commits)
+        return candidate_commits
+
+    def get_exact_commit (self, state, commit_map):
+        """Map the given CVSState to the commit with this exact state.
+
+        The given commit_map must be a CommitStates object.
+
+        Return the only commit (if any) that produces the exact given
+        CVSState.
+
+        Returns a commit name, or None if no matching commit is found.
+
+        """
+        commits = self.get_commits(state)
+        for c in commits:
+            if state == commit_map.get(c):
+                return c
+        return None
diff --git a/git_remote_cvs/git.py b/git_remote_cvs/git.py
new file mode 100644
index 0000000..cf037e3
--- /dev/null
+++ b/git_remote_cvs/git.py
@@ -0,0 +1,680 @@
+#!/usr/bin/env python
+
+"""Functionality for interacting with Git repositories.
+
+This module provides classes for interfacing with a Git repository.
+
+"""
+
+import os
+import re
+from binascii import hexlify
+from cStringIO import StringIO
+import unittest
+
+from git_remote_cvs.util import debug, error, die, start_command, run_command
+from git_remote_cvs.cvs import CVSDate
+
+
+def get_git_dir ():
+    """Return the path to the GIT_DIR for this repo."""
+    args = ("git", "rev-parse", "--git-dir")
+    exit_code, output, errors = run_command(args)
+    if exit_code:
+        die("Failed to retrieve git dir")
+    assert not errors
+    return output.strip()
+
+
+def parse_git_config ():
+    """Return a dict containing the parsed version of 'git config -l'."""
+    exit_code, output, errors = run_command(("git", "config", "-z", "-l"))
+    if exit_code:
+        die("Failed to retrieve git configuration")
+    assert not errors
+    return dict([e.split('\n', 1) for e in output.split("\0") if e])
+
+
+def git_config_bool (value):
+    """Convert the given git config string value to True or False.
+
+    Raise ValueError if the given string was not recognized as a
+    boolean value.
+
+    """
+    norm_value = str(value).strip().lower()
+    if norm_value in ("true", "1", "yes", "on", ""):
+        return True
+    if norm_value in ("false", "0", "no", "off", "none"):
+        return False
+    raise ValueError("Failed to parse '%s' into a boolean value" % (value))
+
+
+def valid_git_ref (ref_name):
+    """Return True iff the given ref name is a valid git ref name."""
+    # The following is a reimplementation of the git check-ref-format
+    # command.  The rules were derived from the git check-ref-format(1)
+    # manual page.  This code should be replaced by a call to
+    # check_ref_format() in the git library, when such is available.
+    if ref_name.endswith('/') or \
+       ref_name.startswith('.') or \
+       ref_name.count('/.') or \
+       ref_name.count('..') or \
+       ref_name.endswith('.lock'):
+        return False
+    for c in ref_name:
+        if ord(c) < 0x20 or ord(c) == 0x7f or c in " ~^:?*[":
+            return False
+    return True
+
+
+class GitObjectFetcher(object):
+
+    """Provide parsed access to 'git cat-file --batch'.
+
+    This provides a read-only interface to the Git object database.
+
+    """
+
+    def __init__ (self):
+        """Initiate a 'git cat-file --batch' session."""
+        self.queue = []  # List of object names to be submitted
+        self.in_transit = None  # Object name currently in transit
+
+        # 'git cat-file --batch' produces binary output which is likely
+        # to be corrupted by the default "rU"-mode pipe opened by
+        # start_command.  (Mode == "rU" does universal new-line
+        # conversion, which mangles carriage returns.) Therefore, we
+        # open an explicitly binary-safe pipe for transferring the
+        # output from 'git cat-file --batch'.
+        pipe_r_fd, pipe_w_fd = os.pipe()
+        pipe_r = os.fdopen(pipe_r_fd, "rb")
+        pipe_w = os.fdopen(pipe_w_fd, "wb")
+        self.proc = start_command(("git", "cat-file", "--batch"),
+                                  stdout = pipe_w)
+        self.f = pipe_r
+
+    def __del__ (self):
+        """Verify completed communication with 'git cat-file --batch'."""
+        assert not self.queue
+        assert self.in_transit is None
+        self.proc.stdin.close()
+        assert self.proc.wait() == 0  # Zero exit code
+        assert self.f.read() == ""  # No remaining output
+
+    def _submit_next_object (self):
+        """Submit queue items to the 'git cat-file --batch' process.
+
+        If there are items in the queue, and there is currently no item
+        currently in 'transit', then pop the first item off the queue,
+        and submit it.
+
+        """
+        if self.queue and self.in_transit is None:
+            self.in_transit = self.queue.pop(0)
+            print >> self.proc.stdin, self.in_transit[0]
+
+    def push (self, obj, callback):
+        """Push the given object name onto the queue.
+
+        The given callback function will at some point in the future
+        be called exactly once with the following arguments:
+        - self - this GitObjectFetcher instance
+        - obj  - the object name provided to push()
+        - sha1 - the SHA1 of the object, if 'None' obj is missing
+        - t    - the type of the object (tag/commit/tree/blob)
+        - size - the size of the object in bytes
+        - data - the object contents
+
+        """
+        self.queue.append((obj, callback))
+        self._submit_next_object()  # (Re)start queue processing
+
+    def process_next_entry (self):
+        """Read the next entry off the queue and invoke callback."""
+        obj, cb = self.in_transit
+        self.in_transit = None
+        header = self.f.readline()
+        if header == "%s missing\n" % (obj):
+            cb(self, obj, None, None, None, None)
+            return
+        sha1, t, size = header.split(" ")
+        assert len(sha1) == 40
+        assert t in ("tag", "commit", "tree", "blob")
+        assert size.endswith("\n")
+        size = int(size.strip())
+        data = self.f.read(size)
+        assert self.f.read(1) == "\n"
+        cb(self, obj, sha1, t, size, data)
+        self._submit_next_object()
+
+    def process (self):
+        """Process the current queue until empty."""
+        while self.in_transit is not None:
+            self.process_next_entry()
+
+    # High-level convenience methods:
+
+    def get_sha1 (self, objspec):
+        """Return the SHA1 of the object specified by 'objspec'.
+
+        Return None if 'objspec' does not specify an existing object.
+
+        """
+        class _ObjHandler(object):
+            """Helper class for getting the returned SHA1."""
+            def __init__ (self, parser):
+                self.parser = parser
+                self.sha1 = None
+
+            def __call__ (self, parser, obj, sha1, t, size, data):
+                # FIXME: Many unused arguments. Could this be cheaper?
+                assert parser == self.parser
+                self.sha1 = sha1
+
+        handler = _ObjHandler(self)
+        self.push(objspec, handler)
+        self.process()
+        return handler.sha1
+
+    def open_obj (self, objspec):
+        """Return a file object wrapping the contents of a named object.
+
+        The caller is responsible for calling .close() on the returned
+        file object.
+
+        Raise KeyError if 'objspec' does not exist in the repo.
+
+        """
+        class _ObjHandler(object):
+            """Helper class for parsing the returned git object."""
+            def __init__ (self, parser):
+                """Set up helper."""
+                self.parser = parser
+                self.contents = StringIO()
+                self.err = None
+
+            def __call__ (self, parser, obj, sha1, t, size, data):
+                """Git object callback (see GitObjectFetcher documentation)."""
+                assert parser == self.parser
+                if not sha1:  # Missing object
+                    self.err = "Missing object '%s'" % obj
+                else:
+                    assert size == len(data)
+                    self.contents.write(data)
+
+        handler = _ObjHandler(self)
+        self.push(objspec, handler)
+        self.process()
+        if handler.err:
+            raise KeyError(handler.err)
+        handler.contents.seek(0)
+        return handler.contents
+
+    def walk_tree (self, tree_objspec, callback, prefix = ""):
+        """Recursively walk the given Git tree object.
+
+        Recursively walk all subtrees of the given tree object, and
+        invoke the given callback passing three arguments:
+        (path, mode, data) with the path, permission bits, and contents
+        of all the blobs found in the entire tree structure.
+
+        """
+        class _ObjHandler(object):
+            """Helper class for walking a git tree structure."""
+            def __init__ (self, parser, cb, path, mode = None):
+                """Set up helper."""
+                self.parser = parser
+                self.cb = cb
+                self.path = path
+                self.mode = mode
+                self.err = None
+
+            def parse_tree (self, treedata):
+                """Parse tree object data, yield tree entries.
+
+                Each tree entry is a 3-tuple (mode, sha1, path)
+
+                self.path is prepended to all paths yielded
+                from this method.
+
+                """
+                while treedata:
+                    mode = int(treedata[:6], 10)
+                    # Turn 100xxx into xxx
+                    if mode > 100000:
+                        mode -= 100000
+                    assert treedata[6] == " "
+                    i = treedata.find("\0", 7)
+                    assert i > 0
+                    path = treedata[7:i]
+                    sha1 = hexlify(treedata[i + 1: i + 21])
+                    yield (mode, sha1, self.path + path)
+                    treedata = treedata[i + 21:]
+
+            def __call__ (self, parser, obj, sha1, t, size, data):
+                """Git object callback (see GitObjectFetcher documentation)."""
+                assert parser == self.parser
+                if not sha1:  # Missing object
+                    self.err = "Missing object '%s'" % (obj)
+                    return
+                assert size == len(data)
+                if t == "tree":
+                    if self.path:
+                        self.path += "/"
+                    # Recurse into all blobs and subtrees
+                    for m, s, p in self.parse_tree(data):
+                        parser.push(s,
+                                    self.__class__(self.parser, self.cb, p, m))
+                elif t == "blob":
+                    self.cb(self.path, self.mode, data)
+                else:
+                    raise ValueError("Unknown object type '%s'" % (t))
+
+        self.push(tree_objspec, _ObjHandler(self, callback, prefix))
+        self.process()
+
+
+class GitRefMap(object):
+
+    """Map Git ref names to the Git object names they currently point to.
+
+    Behaves like a dictionary of Git ref names -> Git object names.
+
+    """
+
+    def __init__ (self, obj_fetcher):
+        """Create a new Git ref -> object map."""
+        self.obj_fetcher = obj_fetcher
+        self._cache = {}  # dict: refname -> objname
+
+    def _load (self, ref):
+        """Retrieve the object currently bound to the given ref.
+
+        The name of the object pointed to by the given ref is stored
+        into this mapping, and also returned.
+
+        """
+        if ref not in self._cache:
+            self._cache[ref] = self.obj_fetcher.get_sha1(ref)
+        return self._cache[ref]
+
+    def __contains__ (self, refname):
+        """Return True if the given refname is present in this cache."""
+        return bool(self._load(refname))
+
+    def __getitem__ (self, refname):
+        """Return the git object name pointed to by the given refname."""
+        commit = self._load(refname)
+        if commit is None:
+            raise KeyError("Unknown ref '%s'" % (refname))
+        return commit
+
+    def get (self, refname, default = None):
+        """Return the git object name pointed to by the given refname."""
+        commit = self._load(refname)
+        if commit is None:
+            return default
+        return commit
+
+
+class GitFICommit(object):
+
+    """Encapsulate the data in a Git fast-import commit command."""
+
+    SHA1RE = re.compile(r'^[0-9a-f]{40}$')
+
+    @classmethod
+    def parse_mode (cls, mode):
+        """Verify the given git file mode, and return it as a string."""
+        assert mode in (644, 755, 100644, 100755, 120000)
+        return "%i" % (mode)
+
+    @classmethod
+    def parse_objname (cls, objname):
+        """Return the given object name (or mark number) as a string."""
+        if isinstance(objname, int):  # Object name is a mark number
+            assert objname > 0
+            return ":%i" % (objname)
+
+        # No existence check is done, only checks for valid format
+        assert cls.SHA1RE.match(objname)  # Object name is valid SHA1
+        return objname
+
+    @classmethod
+    def quote_path (cls, path):
+        """Return a quoted version of the given path."""
+        path = path.replace("\\", "\\\\")
+        path = path.replace("\n", "\\n")
+        path = path.replace('"', '\\"')
+        return '"%s"' % (path)
+
+    @classmethod
+    def parse_path (cls, path):
+        """Verify that the given path is valid, and quote it, if needed."""
+        assert not isinstance(path, int)  # Cannot be a mark number
+
+        # These checks verify the rules on the fast-import man page
+        assert not path.count("//")
+        assert not path.endswith("/")
+        assert not path.startswith("/")
+        assert not path.count("/./")
+        assert not path.count("/../")
+        assert not path.endswith("/.")
+        assert not path.endswith("/..")
+        assert not path.startswith("./")
+        assert not path.startswith("../")
+
+        if path.count('"') + path.count('\n') + path.count('\\'):
+            return cls.quote_path(path)
+        return path
+
+    def __init__ (self, name, email, timestamp, timezone, message):
+        """Create a new Git fast-import commit, with the given metadata."""
+        self.name = name
+        self.email = email
+        self.timestamp = timestamp
+        self.timezone = timezone
+        self.message = message
+        self.pathops = []  # List of path operations in this commit
+
+    def modify (self, mode, blobname, path):
+        """Add a file modification to this Git fast-import commit."""
+        self.pathops.append(("M",
+                             self.parse_mode(mode),
+                             self.parse_objname(blobname),
+                             self.parse_path(path)))
+
+    def delete (self, path):
+        """Add a file deletion to this Git fast-import commit."""
+        self.pathops.append(("D", self.parse_path(path)))
+
+    def copy (self, path, newpath):
+        """Add a file copy to this Git fast-import commit."""
+        self.pathops.append(("C",
+                             self.parse_path(path),
+                             self.parse_path(newpath)))
+
+    def rename (self, path, newpath):
+        """Add a file rename to this Git fast-import commit."""
+        self.pathops.append(("R",
+                             self.parse_path(path),
+                             self.parse_path(newpath)))
+
+    def note (self, blobname, commit):
+        """Add a note object to this Git fast-import commit."""
+        self.pathops.append(("N",
+                             self.parse_objname(blobname),
+                             self.parse_objname(commit)))
+
+    def deleteall (self):
+        """Delete all files in this Git fast-import commit."""
+        self.pathops.append("deleteall")
+
+
+class TestGitFICommit(unittest.TestCase):
+
+    """GitFICommit selftests."""
+
+    def test_basic (self):
+        """GitFICommit basic selftests."""
+
+        def expect_fail (method, data):
+            """Verify that the method(data) raises an AssertionError."""
+            try:
+                method(data)
+            except AssertionError:
+                return
+            raise AssertionError("Failed test for invalid data '%s(%s)'" %
+                                 (method.__name__, repr(data)))
+
+    def test_parse_mode (self):
+        """GitFICommit.parse_mode() selftests."""
+        self.assertEqual(GitFICommit.parse_mode(644), "644")
+        self.assertEqual(GitFICommit.parse_mode(755), "755")
+        self.assertEqual(GitFICommit.parse_mode(100644), "100644")
+        self.assertEqual(GitFICommit.parse_mode(100755), "100755")
+        self.assertEqual(GitFICommit.parse_mode(120000), "120000")
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, 0)
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, 123)
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, 600)
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, "644")
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, "abc")
+
+    def test_parse_objname (self):
+        """GitFICommit.parse_objname() selftests."""
+        self.assertEqual(GitFICommit.parse_objname(1), ":1")
+        self.assertRaises(AssertionError, GitFICommit.parse_objname, 0)
+        self.assertRaises(AssertionError, GitFICommit.parse_objname, -1)
+        self.assertEqual(GitFICommit.parse_objname("0123456789" * 4),
+                         "0123456789" * 4)
+        self.assertEqual(GitFICommit.parse_objname("2468abcdef" * 4),
+                         "2468abcdef" * 4)
+        self.assertRaises(AssertionError, GitFICommit.parse_objname,
+                          "abcdefghij" * 4)
+
+    def test_parse_path (self):
+        """GitFICommit.parse_path() selftests."""
+        self.assertEqual(GitFICommit.parse_path("foo/bar"), "foo/bar")
+        self.assertEqual(GitFICommit.parse_path("path/with\n and \" in it"),
+                         '"path/with\\n and \\" in it"')
+        self.assertRaises(AssertionError, GitFICommit.parse_path, 1)
+        self.assertRaises(AssertionError, GitFICommit.parse_path, 0)
+        self.assertRaises(AssertionError, GitFICommit.parse_path, -1)
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo//bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/bar/")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "/foo/bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/./bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/../bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/bar/.")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/bar/..")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "./foo/bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "../foo/bar")
+
+
+class GitFastImport(object):
+
+    """Encapsulate communication with git fast-import."""
+
+    def __init__ (self, f, obj_fetcher, last_mark = 0):
+        """Set up self to communicate with a fast-import process through f."""
+        self.f = f  # File object where fast-import stream is written
+        self.obj_fetcher = obj_fetcher  # GitObjectFetcher instance
+        self.next_mark = last_mark + 1  # Next mark number
+        self.refs = set()  # Keep track of the refnames we've seen
+
+    def comment (self, s):
+        """Write the given comment in the fast-import stream."""
+        assert "\n" not in s, "Malformed comment: '%s'" % (s)
+        self.f.write("# %s\n" % (s))
+
+    def commit (self, ref, commitdata):
+        """Make a commit on the given ref, with the given GitFICommit.
+
+        Return the mark number identifying this commit.
+
+        """
+        self.f.write("""\
+commit %(ref)s
+mark :%(mark)i
+committer %(name)s <%(email)s> %(timestamp)i %(timezone)s
+data %(msgLength)i
+%(msg)s
+""" % {
+    'ref': ref,
+    'mark': self.next_mark,
+    'name': commitdata.name,
+    'email': commitdata.email,
+    'timestamp': commitdata.timestamp,
+    'timezone': commitdata.timezone,
+    'msgLength': len(commitdata.message),
+    'msg': commitdata.message,
+})
+
+        if ref not in self.refs:
+            self.refs.add(ref)
+            parent = ref + "^0"
+            if self.obj_fetcher.get_sha1(parent):
+                self.f.write("from %s\n" % (parent))
+
+        for op in commitdata.pathops:
+            self.f.write(" ".join(op))
+            self.f.write("\n")
+        self.f.write("\n")
+        retval = self.next_mark
+        self.next_mark += 1
+        return retval
+
+    def blob (self, data):
+        """Import the given blob.
+
+        Return the mark number identifying this blob.
+
+        """
+        self.f.write("blob\nmark :%i\ndata %i\n%s\n" %
+                     (self.next_mark, len(data), data))
+        retval = self.next_mark
+        self.next_mark += 1
+        return retval
+
+    def reset (self, ref, objname):
+        """Reset the given ref to point at the given Git object."""
+        self.f.write("reset %s\nfrom %s\n\n" %
+                     (ref, GitFICommit.parse_objname(objname)))
+        if ref not in self.refs:
+            self.refs.add(ref)
+
+
+class GitNotes(object):
+
+    """Encapsulate access to Git notes.
+
+    Simulates a dictionary of object name (SHA1) -> Git note mappings.
+
+    """
+
+    def __init__ (self, notes_ref, obj_fetcher):
+        """Create a new Git notes interface, bound to the given notes ref."""
+        self.notes_ref = notes_ref
+        self.obj_fetcher = obj_fetcher  # Used to get objects from repo
+        self.imports = []  # list: (objname, note data blob name) tuples
+
+    def __del__ (self):
+        """Verify that self.commit_notes() was called before destruction."""
+        if self.imports:
+            error("Missing call to self.commit_notes().")
+            error("%i notes are not committed!", len(self.imports))
+
+    def _load (self, objname):
+        """Return the note data associated with the given git object.
+
+        The note data is returned in string form. If no note is found
+        for the given object, None is returned.
+
+        """
+        try:
+            f = self.obj_fetcher.open_obj("%s:%s" % (self.notes_ref, objname))
+            ret = f.read()
+            f.close()
+        except KeyError:
+            ret = None
+        return ret
+
+    def __getitem__ (self, objname):
+        """Return the note contents associated with the given object.
+
+        Raise KeyError if given object has no associated note.
+
+        """
+        blobdata = self._load(objname)
+        if blobdata is None:
+            raise KeyError("Object '%s' has no note" % (objname))
+        return blobdata
+
+    def get (self, objname, default = None):
+        """Return the note contents associated with the given object.
+
+        Return given default if given object has no associated note.
+
+        """
+        blobdata = self._load(objname)
+        if blobdata is None:
+            return default
+        return blobdata
+
+    def import_note (self, objname, data, gfi):
+        """Tell git fast-import to store data as a note for objname.
+
+        This method uses the given GitFastImport object to create a
+        blob containing the given note data.  Also an entry mapping the
+        given object name to the created blob is stored until
+        commit_notes() is called.
+
+        Note that this method only works if it is later followed by a
+        call to self.commit_notes() (which produces the note commit
+        that refers to the blob produced here).
+
+        """
+        if not data.endswith("\n"):
+            data += "\n"
+        gfi.comment("Importing note for object %s" % (objname))
+        mark = gfi.blob(data)
+        self.imports.append((objname, mark))
+
+    def commit_notes (self, gfi, author, message):
+        """Produce a git fast-import note commit for the imported notes.
+
+        This method uses the given GitFastImport object to create a
+        commit on the notes ref, introducing the notes previously
+        submitted to import_note().
+
+        """
+        if not self.imports:
+            return
+        now = CVSDate("now")
+        commitdata = GitFICommit(author[0], author[1],
+                                 now.ts, now.tz_str(), message)
+        for objname, blobname in self.imports:
+            assert isinstance(objname, int) and objname > 0
+            assert isinstance(blobname, int) and blobname > 0
+            commitdata.note(blobname, objname)
+        gfi.commit(self.notes_ref, commitdata)
+        self.imports = []
+
+
+class GitCachedNotes(GitNotes):
+
+    """Encapsulate access to Git notes (cached version).
+
+    Only use this class if no caching is done at a higher level.
+
+    Simulates a dictionary of object name (SHA1) -> Git note mappings.
+
+    """
+
+    def __init__ (self, notes_ref, obj_fetcher):
+        """Set up a caching wrapper around GitNotes."""
+        GitNotes.__init__(self, notes_ref, obj_fetcher)
+        self._cache = {}  # Cache: object name -> note data
+
+    def __del__ (self):
+        """Verify that GitNotes' destructor is called."""
+        GitNotes.__del__(self)
+
+    def _load (self, objname):
+        """Extend GitNotes._load() with a local objname -> note cache."""
+        if objname not in self._cache:
+            self._cache[objname] = GitNotes._load(self, objname)
+        return self._cache[objname]
+
+    def import_note (self, objname, data, gfi):
+        """Extend GitNotes.import_note() with a local objname -> note cache."""
+        if not data.endswith("\n"):
+            data += "\n"
+        assert objname not in self._cache
+        self._cache[objname] = data
+        GitNotes.import_note(self, objname, data, gfi)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/git_remote_cvs/setup.py b/git_remote_cvs/setup.py
new file mode 100644
index 0000000..21f521a
--- /dev/null
+++ b/git_remote_cvs/setup.py
@@ -0,0 +1,17 @@
+#!/usr/bin/env python
+
+"""Distutils build/install script for the git_remote_cvs package."""
+
+from distutils.core import setup
+
+setup(
+    name = 'git_remote_cvs',
+    version = '0.1.0',
+    description = 'Git remote helper program for CVS repositories',
+    license = 'GPLv2',
+    author = 'The Git Community',
+    author_email = 'git@vger.kernel.org',
+    url = 'http://www.git-scm.com/',
+    package_dir = {'git_remote_cvs': ''},
+    packages = ['git_remote_cvs'],
+)
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 12/19] 1/2: Add Python support library for CVS remote helper
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland, David Aguilar, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

This patch introduces parts of a Python package called "git_remote_cvs"
containing the building blocks of the CVS remote helper.
The CVS remote helper itself is NOT part of this patch.

This patch has been improved by the following contributions:
- David Aguilar: Lots of Python coding style fixes

Cc: David Aguilar <davvid@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	This has my patch to util.py squashed in.

 git_remote_cvs/changeset.py        |  126 +++++
 git_remote_cvs/cvs.py              |  998 ++++++++++++++++++++++++++++++++++++
 git_remote_cvs/cvs_symbol_cache.py |  313 +++++++++++
 git_remote_cvs/util.py             |  194 +++++++
 4 files changed, 1631 insertions(+), 0 deletions(-)
 create mode 100644 git_remote_cvs/changeset.py
 create mode 100644 git_remote_cvs/cvs.py
 create mode 100644 git_remote_cvs/cvs_symbol_cache.py
 create mode 100644 git_remote_cvs/util.py

diff --git a/git_remote_cvs/changeset.py b/git_remote_cvs/changeset.py
new file mode 100644
index 0000000..9eea9d2
--- /dev/null
+++ b/git_remote_cvs/changeset.py
@@ -0,0 +1,126 @@
+#!/usr/bin/env python
+
+"""Code for collecting individual CVS revisions into "changesets"
+
+A changeset is a collection of CVSRev objects that belong together in
+the same "commit".  This is a somewhat artificial construct on top of
+CVS, which only stores changes at the per-file level.  Normally, CVS
+users create several CVS revisions simultaneously by applying the
+"cvs commit" command to several files with related changes.  This
+module tries to reconstruct this notion of related revisions.
+
+"""
+
+from git_remote_cvs.util import debug, error, die
+
+
+class Changeset(object):
+
+    """Encapsulate a single changeset/commit."""
+
+    __slots__ = ('revs', 'date', 'author', 'message')
+
+    # The maximum time between the changeset's date, and the date of a
+    # rev to included in that changeset.
+    MaxSecondsBetweenRevs = 8 * 60 * 60  # 8 hours
+
+    @classmethod
+    def from_rev (cls, rev):
+        """Return a Changeset based on the given CVSRev object."""
+        c = cls(rev.date, rev.author, rev.message)
+        result = c.add(rev)
+        assert result
+        return c
+
+    def __init__ (self, date, author, message):
+        """Create a new Changeset with the given metadata."""
+        self.revs = {}  # dict: path -> CVSRev object
+        self.date = date  # CVSDate object
+        self.author = author
+        self.message = message  # Lines of commit message
+
+    def __str__ (self):
+        """Stringify this Changeset object."""
+        msg = self.message[0]  # First line only
+        # Limit message to 25 chars
+        if len(msg) > 25:
+            msg = msg[:22] + "..."
+        return ("<Changeset @(%s) by %s (%s) updating %i files>" %
+                (self.date, self.author, msg, len(self.revs)))
+
+    def __iter__ (self):
+        """Return iterator traversing the CVSRevs in this Changeset."""
+        return self.revs.itervalues()
+
+    def __getitem__ (self, path):
+        """Look up a specific CVSRev in this Changeset."""
+        return self.revs[path]
+
+    def within_time_window (self, rev):
+        """Return True iff the rev is within the time window of self."""
+        return abs(rev.date.diff(self.date)) <= self.MaxSecondsBetweenRevs
+
+    def add (self, rev):
+        """Add the given CVSRev to this Changeset.
+
+        The addition will only succeed if the following holds:
+          - rev.author == self.author
+          - rev.message == self.message
+          - rev.path is not in self.revs
+          - rev.date is within MaxSecondsBetweenRevs of self.date
+        If the addition succeeds, True is returned; otherwise False.
+
+        """
+        if rev.author != self.author or \
+           rev.message != self.message or \
+           rev.path in self.revs or \
+           not self.within_time_window(rev):
+            return False
+
+        self.revs[rev.path] = rev
+        return True
+
+
+def build_changesets_from_revs (cvs_revs):
+    """Organize CVSRev objects into a chronological list of Changesets."""
+    # Construct chronological list of CVSRev objects
+    chron_revs = []
+    for path, d in cvs_revs.iteritems():
+        i = 0  # Current index into chronRevs
+        for revnum, cvsrev in sorted(d.iteritems()):
+            assert path == cvsrev.path
+            assert revnum == cvsrev.num
+            while i < len(chron_revs) and cvsrev.date > chron_revs[i].date:
+                i += 1
+            # Insert cvsRev at position i in chronRevs
+            chron_revs.insert(i, cvsrev)
+            i += 1
+
+    changesets = []  # Chronological list of Changeset objects
+    while len(chron_revs):
+        # There are still more revs to be added to Changesets
+        # Create Changeset based on the first rev in chronRevs
+        changeset = Changeset.from_rev(chron_revs.pop(0))
+        # Keep adding revs chronologically until MaxSecondsBetweenRevs
+        rejects = []  # Revs that cannot be added to this changeset
+        while len(chron_revs):
+            rev = chron_revs.pop(0)
+            reject = False
+            # First, if we have one of rev's parents in rejects, we
+            # must also reject rev
+            for r in rejects:
+                if r.path == rev.path:
+                    reject = True
+                    break
+            # Next, add rev to changeset, reject if add fails
+            if not reject:
+                reject = not changeset.add(rev)
+            if reject:
+                rejects.append(rev)
+                # stop trying when rev is too far in the future
+                if not changeset.within_time_window(rev):
+                    break
+        chron_revs = rejects + chron_revs  # Reconstruct remaining revs
+        changesets.append(changeset)
+
+    return changesets
diff --git a/git_remote_cvs/cvs.py b/git_remote_cvs/cvs.py
new file mode 100644
index 0000000..f870ae0
--- /dev/null
+++ b/git_remote_cvs/cvs.py
@@ -0,0 +1,998 @@
+#!/usr/bin/env python
+
+"""Functionality for interacting with CVS repositories.
+
+This module provides classes for interrogating a CVS repository via a
+CVS working directory (aka. checkout), or via direct queries using the
+"cvs rlog" command.
+
+Also, classes for encapsulating fundamental CVS concepts (like CVS
+revision/branch numbers) are provided.
+"""
+
+import sys
+import os
+import shutil
+import time
+from calendar import timegm
+import unittest
+
+from git_remote_cvs.util import (debug, error, die, ProgressIndicator,
+                                 start_command, run_command,
+                                 file_reader_method, file_writer_method)
+
+
+class CVSNum(object):
+
+    """Encapsulate a single CVS revision/branch number.
+
+    Provides functionality for common operations on CVS numbers.
+
+    A CVS number consists of a list of components separated by periods
+    ('.'), where each component is a decimal number.  Inspecting the
+    components from left to right, the odd-numbered (1st, 3rd, 5th,
+    etc.) components represent branches in the CVS history tree, while
+    the even-numbered (2nd, 4th, 6th, etc.) components represent
+    revisions on the branch specified in the previous position.
+    Thus "1.2" denotes the second revision on the first branch
+    (aka. trunk), while "1.2.4.6" denotes the sixth revision of the
+    fourth branch started from revision "1.2".
+
+    Therefore, in general, a CVS number with an even number of
+    components denotes a revision (we call this a "revision number"),
+    while an odd number of components denotes a branch (called a
+    "branch number").
+
+    There are a few complicating peculiarities: If there is an even
+    number of components, and the second-last component is 0, the
+    number is not a revision number, but is rather equivalent to the
+    branch number we get by removing the 0-component.  I.e. "1.2.0.4"
+    is equivalent to "1.2.4".
+
+    A branch number (except the trunk: "1") always has a "branch point"
+    revision, i.e. the revision from which the branch was started.
+    This revision is found by removing the last component of the branch
+    number.  For example the branch point of "1.2.4" is "1.2".
+
+    Conversely, all revision numbers belong to a corresponding branch,
+    whose branch number is found by removing the last component.
+    Examples: The "1.2.4.6" revision belong to the "1.2.4" branch,
+    while the "1.2" revision belongs to the "1" branch (the "trunk").
+
+    From this we can programatically determine the ancestry of any
+    revision number, by decrementing the last revision component until
+    it equals 1, and then trim off the last two components to get to
+    the branch point, and repeat the process from there until we reach
+    the initial revision (typically "1.1").  For example, recursively
+    enumerating the parent revisions of "1.2.4.6" yields the following
+    revisions:
+    "1.2.4.5", "1.2.4.4", "1.2.4.3", "1.2.4.2", "1.2.4.1", "1.2", "1.1"
+
+    """
+
+    __slots__ = ('c',)
+
+    @staticmethod
+    def decompose (cvsnum):
+        """Split the given CVS number into a list of int components.
+
+        Branch numbers are normalized to the odd-numbered components
+        form (i.e. removing the second last '0' component)
+
+        Examples:
+          '1.2.4.8' -> [1, 2, 4, 8]
+          '1.2.3'   -> [1, 2, 3]
+          '1.2.0.5' -> [1, 2, 5]
+
+        """
+        if cvsnum:
+            r = map(int, cvsnum.split('.'))
+        else:
+            r = []
+        if len(r) >= 2 and r[-2] == 0:
+            del r[-2]
+        if r[-1] == 0:
+            raise ValueError(cvsnum)
+        return tuple(r)
+
+    @staticmethod
+    def compose (c):
+        """Join the given list of integer components into a CVS number.
+
+        E.g.: (1, 2, 4, 8) -> '1.2.4.8'
+
+        """
+        if c[-1] == 0:
+            raise ValueError(str(c))
+        return ".".join(map(str, c))
+
+    @classmethod
+    def from_components (cls, args):
+        """Create a CVSNum from the given list of numerical components."""
+        return cls(cls.compose(args))
+
+    @classmethod
+    def disjoint (cls, a, b):
+        """Return True iff the CVS numbers are historically disjoint.
+
+        Two CVS numbers are disjoint if they do not share the same
+        historical line back to the initial revision.  In other words:
+        the two numbers are disjoint if the history (i.e. set of parent
+        revisions all the way back to the intial (1.1) revision) of
+        neither number is a superset of the other's history.
+        See test_disjoint() for practical examples:
+
+        """
+        if a.is_branch():
+            a = cls.from_components(a.c + (1,))
+        if b.is_branch():
+            b = cls.from_components(b.c + (1,))
+        if len(a.c) > len(b.c):
+            a, b = b, a  # a is now shortest
+        pairs = zip(a.c, b.c)
+        for pa, pb in pairs[:-1]:
+            if pa != pb:
+                return True
+        if len(a) == len(b):
+            return False
+        common_len = len(a)
+        if a.c[common_len - 1] <= b.c[common_len - 1]:
+            return False
+        return True
+
+
+    def __init__ (self, cvsnum):
+        """Create a CVSNum object from the given CVS number string."""
+        self.c = self.decompose(str(cvsnum))
+
+    def __repr__ (self):
+        """Return a string representation of this object."""
+        return self.compose(self.c)
+
+    def __str__ (self):
+        """Return a string representation of this object."""
+        return repr(self)
+
+    def __hash__ (self):
+        """Create a hash value for this CVS number."""
+        return hash(repr(self))
+
+    def __len__ (self):
+        """Return number of components in this CVS number."""
+        return len(self.c)
+
+    def __cmp__ (self, other):
+        """Comparison method for CVS numbers."""
+        try:
+            return cmp(self.c, other.c)
+        except AttributeError:
+            return 1
+
+    def __getitem__ (self, key):
+        """Return the Xth component of this CVS number."""
+        return self.c[key]
+
+    def is_rev (self):
+        """Return True iff this number is a CVS revision number."""
+        return len(self.c) % 2 == 0 and len(self.c) >= 2 and self.c[-2] != 0
+
+    def is_branch (self):
+        """Return True iff this number is a CVS branch number."""
+        return len(self.c) % 2 != 0 or (len(self.c) >= 2 and self.c[-2] == 0)
+
+    def components (self):
+        """Return a list of integer components in this CVS number."""
+        return list(self.c)
+
+    def branch (self):
+        """Return the branch on which the given number lives.
+
+        Revisions: chop the last component to find the branch, e.g.:
+            1.2.4.6 -> 1.2.4
+            1.1 -> 1
+        Branches: unchanged
+
+        """
+        if self.is_rev():
+            return self.from_components(self.c[:-1])
+        return self
+
+    def parent (self):
+        """Return the parent/previous revision number to this number.
+
+        For revisions, this is the previous revision, e.g.:
+            1.2.4.6 -> 1.2.4.5
+            1.2.4.1 -> 1.2
+            1.1 -> None
+            2.1 -> None
+        For branches, this is the branch point, e.g.:
+            1.2.4 -> 1.2
+            1 -> None
+            2 -> None
+
+        """
+        if len(self.c) < 2:
+            return None
+        elif len(self.c) % 2:  # Branch number
+            return self.from_components(self.c[:-1])
+        else:  # Revision number
+            assert self.c[-1] > 0
+            result = self.components()
+            result[-1] -= 1  # Decrement final component
+            if result[-1] == 0:  # We're at the start of the branch
+                del result[-2:]  # Make into branch point
+                if not result:
+                    return None
+            return self.from_components(result)
+
+    def follows (self, other):
+        """Return True iff self historically follows the given rev.
+
+        This iterates through the parents of self, and returns True iff
+        any of them equals the given rev.  Otherwise, it returns False.
+
+        """
+        assert other.is_rev()
+        cur = self
+        while cur:
+            if cur == other:
+                return True
+            cur = cur.parent()
+        return False
+
+    def on_branch (self, branch):
+        """Return True iff this rev is on the given branch.
+
+        The revs considered to be "on" a branch X also includes the
+        branch point of branch X.
+
+        """
+        return branch == self.branch() or branch.parent() == self
+
+
+class TestCVSNum(unittest.TestCase):
+
+    """CVSNum selftests."""
+
+    def test_basic (self):
+        """CVSNum basic selftests."""
+        self.assertEqual(CVSNum("1.2.4"), CVSNum("1.2.0.4"))
+        self.assert_(CVSNum("1.2.4").is_branch())
+        self.assert_(CVSNum("1.2").is_rev())
+        self.assert_(CVSNum("1").is_branch())
+        self.assert_(CVSNum("1.2.4.6").is_rev())
+        self.assertEqual(CVSNum("1.2.4.6").components(), [1, 2, 4, 6])
+        self.assertEqual(CVSNum.from_components([1, 2, 4, 6]),
+                         CVSNum("1.2.4.6"))
+        self.assertEqual(str(CVSNum.from_components([1, 2, 4, 6])), "1.2.4.6")
+        self.assertEqual(len(CVSNum("1.2.4.6")), 4)
+        self.assertEqual(CVSNum("1.2.4.6").branch(), CVSNum("1.2.4"))
+        self.assertEqual(CVSNum("1.2.4").branch(), CVSNum("1.2.4"))
+        self.assertEqual(CVSNum("1.1").branch(), CVSNum("1"))
+        self.assertEqual(CVSNum("1").branch(), CVSNum("1"))
+        self.assertEqual(CVSNum("1.2.4.6").parent(), CVSNum("1.2.4.5"))
+        self.assertEqual(CVSNum("1.2.4.1").parent(), CVSNum("1.2"))
+        self.assertEqual(CVSNum("1.2").parent(), CVSNum("1.1"))
+        self.assert_(CVSNum("1.1").parent() is None)
+        self.assert_(CVSNum("2.1").parent() is None)
+        self.assertEqual(CVSNum("1.2.4").parent(), CVSNum("1.2"))
+        self.assert_(CVSNum("1").parent() is None)
+        self.assert_(CVSNum("2").parent() is None)
+
+    def test_follows (self):
+        """CVSNum.follows() selftests."""
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.1")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.1")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.2")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.3")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.4")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.5")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.6")))
+        self.assertFalse(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.7")))
+        self.assertFalse(CVSNum("1.2.4.6").follows(CVSNum("1.3")))
+        self.assertFalse(CVSNum("1.1").follows(CVSNum("1.2.4.6")))
+
+    def test_disjoint (self):
+        """CVSNum.disjoint() selftests."""
+        tests = [
+            ("1.2", "1.1", False),
+            ("1.2", "1.2", False),
+            ("1.2", "1.3", False),
+            ("1.2", "1.1.2", True),
+            ("1.2", "1.1.2.3", True),
+            ("1.2.4", "1.1", False),
+            ("1.2.4", "1.2", False),
+            ("1.2.4", "1.3", True),
+            ("1.2.4", "1.2.2", True),
+            ("1.2.4", "1.2.4", False),
+            ("1.2.4", "1.2.6", True),
+            ("1.2.4", "1.2.2.4", True),
+            ("1.2.4", "1.2.4.4", False),
+            ("1.2.4", "1.2.6.4", True),
+            ("1.2.4.6", "1.1", False),
+            ("1.2.4.6", "1.2", False),
+            ("1.2.4.6", "1.3", True),
+            ("1.2.4.6", "1.2.2", True),
+            ("1.2.4.6", "1.2.2.1", True),
+            ("1.2.4.6", "1.2.4", False),
+            ("1.2.4.6", "1.2.4.5", False),
+            ("1.2.4.6", "1.2.4.6", False),
+            ("1.2.4.6", "1.2.4.7", False),
+            ("1.2.4.6.8.10", "1.2.4.5", False),
+            ("1.2.4.6.8.10", "1.2.4.6", False),
+            ("1.2.4.6.8.10", "1.2.4.7", True),
+        ]
+        for a, b, result in tests:
+            self.assertEqual(CVSNum.disjoint(CVSNum(a), CVSNum(b)), result)
+            self.assertEqual(CVSNum.disjoint(CVSNum(b), CVSNum(a)), result)
+
+
+class CVSState(object):
+
+    """Encapsulate a historical state in CVS (a set of paths and nums).
+
+    This class is a container of CVS pathnames and associated CVSNum
+    objects.
+
+    No communication with a CVS working directory or repository is done
+    in this class, hence only basic sanity checks are performed:
+      - A path may only appear once in a CVSState.
+      - When adding a path:num pair, path may not already exist in self
+      - When replacing a path:num pair, path must already exist in self
+      - When removing a path:num pair, both path and num must be given
+
+    IMPORTANT: Objects of this class are hash()able (to support being
+    used as keys in a dict), but they are also mutable.  It is
+    therefore up to the caller to make sure that the object is not
+    changed after being stored in a data structure indexed by its hash
+    value.
+
+    """
+
+    __slots__ = ('revs', '_hash')
+
+    def __init__ (self):
+        """Create a new, empty CVSState."""
+        self.revs = {}  # dict: path -> CVSNum object
+        self._hash = None
+
+    def __iter__ (self):
+        """Return iterator traversing the (path, CVSNum)s in this CVSState."""
+        return self.revs.iteritems()
+
+    def __cmp__ (self, other):
+        """Comparison method for CVSState objects."""
+        return cmp(self.revs, other.revs)
+
+    def __str__ (self):
+        """Stringify this CVSState by listing the contained revisions."""
+        return "".join(["%s:%s\n" % (p, n) for p, n in sorted(self)])
+
+    def __hash__ (self):
+        """Create a hash value for this CVSState."""
+        if self._hash is None:
+            self._hash = hash(str(self))
+        return self._hash
+
+    def __getitem__ (self, path):
+        """Return the CVSNum associated with the given path in self."""
+        return self.revs[path]
+
+    def get (self, path, default = None):
+        """Return the CVSNum associated with the given path in self."""
+        return self.revs.get(path, default)
+
+    def paths (self):
+        """Return the path names contained within this CVSState."""
+        return self.revs.iterkeys()
+
+    def add (self, path, revnum):
+        """Add the given path:revnum to this CVSState."""
+        assert path not in self.revs
+        self._hash = None
+        self.revs[path] = revnum
+
+    def replace (self, path, revnum):
+        """Replace the revnum associated with the given path."""
+        assert path in self.revs
+        self._hash = None
+        self.revs[path] = revnum
+
+    def remove (self, path, revnum):
+        """Remove the given path:revnum association from this CVSState."""
+        assert path in self.revs and self.revs[path] == revnum
+        self._hash = None
+        del self.revs[path]
+
+    def copy (self):
+        """Create and return a copy of this object."""
+        ret = CVSState()
+        ret.revs = self.revs.copy()
+        ret._hash = self._hash
+        return ret
+
+    def load_data (self, note_data):
+        """Load note data as formatted by self.__str__()."""
+        for line in note_data.split("\n"):
+            line = line.strip()
+            if not line:
+                continue
+            path, num = line.rsplit(':', 1)
+            self.add(path, CVSNum(num))
+        self._hash = hash(note_data)
+
+    def print_members (self, f = sys.stdout, prefix = ""):
+        """Write the members of this CVSState to the given file object."""
+        for path, num in sorted(self):
+            print >> f, "%s%s:%s" % (prefix, path, num)
+
+    @file_reader_method(missing_ok = True)
+    def load (self, f):
+        """Load CVS state from the given file name/object."""
+        if f:
+            self.load_data(f.read())
+
+    @file_writer_method
+    def save (self, f):
+        """Save CVS state to the given file name/object."""
+        assert f
+        print >> f, str(self),
+
+
+class CVSDate(object):
+
+    """Encapsulate a timestamp, as reported by CVS.
+
+    The internal representation of a timestamp is two integers, the
+    first representing the timestamp as #seconds since epoch (UTC),
+    and the second representing the timezone as #minutes offset from
+    UTC.
+
+    Example: "2007-09-05 17:26:28 -0200" is converted to
+             (1189013188, -120)
+
+    """
+
+    __slots__ = ('ts', 'tz')
+
+    def __init__ (self, date_str = None, in_utc = False):
+        """Convert CVS date string into a CVSDate object.
+
+        A CVS timestamp string has one of the following forms:
+          - "YYYY-MM-DD hh:mm:ss SZZZZ"
+          - "YYYY/MM/DD hh:mm:ss" (with timezone assumed to be UTC)
+        The in_utc parameter determines whether the timestamp part of
+        the given string (the "YYYY-MM-DD hh:mm:ss" part) is given in
+        local time or UTC (normally CVS dates are given in local time.
+        If given in local time, the timezone offset is subtracted from
+        the timestamp in order to make the time in UTC format.
+
+        """
+        if date_str is None:
+            self.ts, self.tz = 0, 0
+            return
+        if date_str == "now":
+            self.ts, self.tz = time.time(), 0
+            return
+        date_str = date_str.strip()
+        # Set up self.ts and self.tz
+        if date_str.count(" ") == 2:
+            # Assume format "YYYY-MM-DD hh:mm:ss SZZZZ"
+            t, z = date_str.rsplit(" ", 1)
+            # Convert timestamp to #secs since epoch (UTC)
+            self.ts = timegm(time.strptime(t, "%Y-%m-%d %H:%M:%S"))
+            # Convert timezone into #mins offset from UTC
+            self.tz = int(z[1:3]) * 60 + int(z[3:5])
+            # Incorporate timezone sign
+            if z[0] == '-':
+                self.tz *= -1
+        else:
+            assert date_str.count(" ") == 1
+            # Assume format "YYYY/MM/DD hh:mm:ss"
+            self.ts = timegm(time.strptime(date_str, "%Y/%m/%d %H:%M:%S"))
+            self.tz = 0
+        # Adjust timestamp if not already in UTC
+        if not in_utc:
+            self.ts -= self.tz * 60
+
+    def tz_str (self):
+        """Return timezone part of self in string format."""
+        sign = '+'
+        if self.tz < 0:
+            sign = '-'
+        hours, minutes = divmod(abs(self.tz), 60)
+        return "%s%02d%02d" % (sign, hours, minutes)
+
+    def __str__ (self):
+        """Reconstruct date string from members."""
+        s = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(self.ts))
+        return "%s %s" % (s, self.tz_str())
+
+    def __repr__ (self):
+        """Create a string representation of self."""
+        return "CVSDate('%s')" % (str(self))
+
+    def __hash__ (self):
+        """Create a hash value from self."""
+        return hash((self.ts, self.tz))
+
+    def __nonzero__ (self):
+        """Provide interpretation of self in a boolean context."""
+        return bool(self.ts or self.tz)
+
+    def __cmp__ (self, other):
+        """Comparison method for CVSDate objects."""
+        return cmp(self.ts, other.ts) or cmp(self.tz, other.tz)
+
+    def __eq__ (self, other):
+        """Return True iff self and other is considered equal."""
+        return self.ts == other.ts and self.tz == other.tz
+
+    def diff (self, other):
+        """Return difference between self and other in #seconds.
+
+        Invariant: self == other.add(self.diff(other))
+
+        """
+        return self.ts - other.ts
+
+
+class TestCVSDate(unittest.TestCase):
+
+    """CVSDate selftests."""
+
+    def test_basic (self):
+        """CVSDate basic selftests."""
+        a = CVSDate("2009-05-10 14:34:56 +0200")
+        b = CVSDate("2009/05/10 12:34:56")
+        self.assert_(a)
+        self.assert_(b)
+        self.assertEqual(str(a), "2009-05-10 12:34:56 +0200", str(a))
+        self.assertEqual(str(b), "2009-05-10 12:34:56 +0000", str(b))
+        self.assertNotEqual(a, b)
+        self.assertEqual(a.diff(b), 0)
+        c = CVSDate("2009-05-10 16:34:56 +0200")
+        self.assert_(c)
+        self.assertEqual(str(c), "2009-05-10 14:34:56 +0200", str(c))
+        self.assertNotEqual(c, a)
+        self.assertEqual(c.diff(a), 2 * 60 * 60)
+        self.assertEqual(a.diff(c), -2 * 60 * 60)
+
+
+class CVSRev(object):
+
+    """Encapsulate metadata on a CVS revision."""
+
+    __slots__ = ('path', 'num', 'date', 'author', 'deleted', 'message')
+
+    def __init__ (self, path, num):
+        """Create a CVSRev object for the given path:num revision."""
+        self.path = path
+        self.num = num
+        self.date = None  # CVSDate object
+        self.author = ""
+        self.deleted = None  # True or False
+        self.message = []  # Lines of commit message
+
+    def __str__ (self):
+        """Return a string listing the metadata in this CVS revision."""
+        return ("<%s:%s on %s by %s%s>" %
+                (self.path, self.num, self.date, self.author,
+                 self.deleted and ", deleted" or ""))
+
+    def __cmp__ (self, other):
+        """Comparison method for CVSRev objects."""
+        return cmp(self.path, other.path) or cmp(self.num, other.num)
+
+
+class CVSWorkDir(object):
+
+    """Encapsulate a CVS working directory.
+
+    This class auto-creates a CVS workdir/checkout in the directory
+    given to the constructor, and provides various methods for
+    interacting with this workdir.
+
+    """
+
+    def __init__ (self, workdir, cvs_repo):
+        """Create a new CVSWorkDir.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+
+        """
+        self.d = workdir
+        self.cvs_root, self.cvs_module = cvs_repo
+        parent_dir = os.path.dirname(self.d)
+        if not os.path.isdir(parent_dir):
+            os.makedirs(parent_dir)
+        self._valid = None
+
+    def makepath(self, *args):
+        """Create path relative to working directory."""
+        return os.path.join(self.d, *args)
+
+    def valid (self):
+        """Return True iff this workdir is present and valid."""
+        if self._valid is not None:
+            return self._valid
+        try:
+            f = open(self.makepath("CVS", "Root"), 'r')
+            assert f.read().strip() == self.cvs_root
+            f.close()
+            f = open(self.makepath("CVS", "Repository"), 'r')
+            assert f.read().strip() == self.cvs_module
+            f.close()
+            self._valid = True
+        except (IOError, AssertionError):
+            self._valid = False
+        return self._valid
+
+    def remove (self):
+        """Remove this checkout."""
+        shutil.rmtree(self.d, True)
+        assert not os.path.exists(self.d)
+        self._valid = False
+
+    def checkout (self, revision = "HEAD"):
+        """Create a checkout of the given revision."""
+        self.remove()
+        parent_dir, co_dir = os.path.split(self.d)
+        args = ["cvs", "-f", "-Q", "-d", self.cvs_root, "checkout"]
+        if str(revision) != "HEAD":
+            args.extend(["-r", str(revision)])
+        args.extend(["-d", co_dir, self.cvs_module])
+        exit_code, output, errors = run_command(args, cwd = parent_dir)
+        if exit_code:
+            die("Failed to checkout CVS working directory")
+        assert not errors
+        assert not output, "output = '%s'" % (output)
+        self._valid = None
+        assert self.valid()
+
+    def update (self, revision = "HEAD", paths = None):
+        """Update the given paths to the given revision."""
+        if not self.valid():
+            self.checkout()
+        args = ["cvs", "-f", "-Q", "update", "-kk"]
+        if str(revision) == "HEAD":
+            args.append("-A")
+        else:
+            args.extend(["-r", str(revision)])
+        if paths is not None:
+            args.extend(paths)
+        exit_code, output, errors = run_command(args, cwd = self.d)
+        if exit_code:
+            die("Failed to checkout CVS working directory")
+        assert not errors
+        assert not output, "output = '%s'" % (output)
+
+    def get_revision_data (self, path, revision):
+        """Return the contents of the given CVS path:revision."""
+        if not self.valid():
+            self.checkout()
+        args = ["cvs", "-f", "-Q", "update", "-p", "-kk"]
+        if str(revision) == "HEAD":
+            args.append("-A")
+        else:
+            args.extend(["-r", str(revision)])
+        args.append(path)
+        exit_code, output, errors = run_command(args, cwd = self.d)
+        if exit_code:
+            die("Failed to checkout CVS working directory")
+        assert not errors
+        return output
+
+    def get_modeinfo (self, paths = None):
+        """Return mode information for the given paths.
+
+        Returns a dict of path -> mode number mappings.  If paths are
+        not specified, mode information for all files in the current
+        checkout will be returned.  No checkout/update will be done.
+
+        """
+        result = {}
+        if paths is not None:
+            for path in paths:
+                fullpath = os.path.join(self.d, path)
+                mode = 644
+                if os.access(fullpath, os.X_OK):
+                    mode = 755
+                assert path not in result
+                result[path] = mode
+        else:  # Return mode information for all paths
+            for dirpath, dirnames, filenames in os.walk(self.d):
+                # Don't descend into CVS subdirs
+                try:
+                    dirnames.remove('CVS')
+                except ValueError:
+                    pass
+                assert dirpath.startswith(self.d)
+                directory = dirpath[len(self.d):].lstrip("/")
+                for fname in filenames:
+                    path = os.path.join(directory, fname)
+                    fullpath = os.path.join(dirpath, fname)
+                    mode = 644
+                    if os.access(fullpath, os.X_OK):
+                        mode = 755
+                    assert path not in result
+                    result[path] = mode
+        return result
+
+    @classmethod
+    def parse_entries (cls, entries, prefix, directory = ""):
+        """Recursively parse CVS/Entries files.
+
+        Return a dict of CVS paths found by parsing the CVS/Entries
+        files rooted at the given directory.
+
+        See http://ximbiot.com/cvs/manual/feature/cvs_2.html#SEC19 for
+        information on the format of the CVS/Entries file.
+
+        """
+        fname = os.path.join(prefix, directory, "CVS", "Entries")
+        subdirs = []
+        f = open(fname, 'r')
+        for line in f:
+            line = line.strip()
+            if line == "D":
+                continue  # There are no subdirectories
+            t, path, revnum, date, options, tag = line.split("/")
+            if t == "D":
+                subdirs.append(path)
+                continue
+            assert line.startswith("/")
+            path = os.path.join(directory, path)
+            revnum = CVSNum(revnum)
+            assert path not in entries
+            entries[path] = (revnum, date, options, tag)
+        f.close()
+        for d in subdirs:
+            d = os.path.join(directory, d)
+            cls.parse_entries(entries, prefix, d)
+
+    def get_state (self):
+        """Return CVSState reflecting current state of this checkout.
+
+        Note that the resulting CVSState will never contain any
+        deleted/dead files.  Other CVSStates to be compared to the one
+        returned from here should remove deleted/dead entries first.
+
+        """
+        assert self.valid()
+        entries = {}
+        result = CVSState()
+        self.parse_entries(entries, self.d)
+        for path, info in entries.iteritems():
+            result.add(path, info[0])
+        return result
+
+
+class CVSLogParser(object):
+
+    """Encapsulate the execution of a "cvs rlog" command."""
+
+    def __init__ (self, cvs_repo):
+        """Create a new CVSLogParser.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+
+        """
+        self.cvs_root, self.cvs_module = cvs_repo
+
+    def cleanup_path (self, cvs_path):
+        """Utility method for parsing CVS paths from CVS log."""
+        cvsprefix = "/".join((self.cvs_root[self.cvs_root.index("/"):],
+                              self.cvs_module))
+        assert cvs_path.startswith(cvsprefix)
+        assert cvs_path.endswith(",v")
+        # Drop cvsprefix and ,v-extension
+        cvs_path = cvs_path[len(cvsprefix):-2]
+        # Split the remaining path into components
+        path_comps = filter(None, cvs_path.strip().split('/'))
+        # Remove 'Attic' from CVS paths
+        if len(path_comps) >= 2 and path_comps[-2] == "Attic":
+            del path_comps[-2]
+        # Reconstruct resulting "cleaned" path
+        return "/".join(path_comps)
+
+    def __call__ (self, line):
+        """Parse the given line from the CVS log.
+
+        Must be reimplemented by subclass
+
+        """
+        pass
+
+    def finish (self):
+        """This method is invoked after the last line has been parsed.
+
+        May be reimplemented by subclass
+
+        """
+        pass
+
+    def run (self, paths = None, no_symbols = False, revisions = None):
+        """Execute "cvs rlog" with the given arguments.
+
+        self.__call__() is invoked once for each line in the CVS log.
+        self.finish() is invoked exactly once after the CVS log.
+
+        """
+        args = ["cvs", "-f", "-q", "-d", self.cvs_root, "rlog"]
+        if no_symbols:
+            args.append("-N")
+        if revisions:
+            args.append("-r%s" % (revisions))
+        if paths is not None:
+            for p in paths:
+                args.append("%s/%s" % (self.cvs_module, p))
+        else:
+            args.append(self.cvs_module)
+        proc = start_command(args)
+        proc.stdin.close()
+        while True:
+            for line in proc.stdout:
+                self(line.rstrip())  # Call self's line parser
+            if proc.poll() is not None:
+                break
+        assert proc.stdout.read() == ""
+        self.finish()  # Notify subclass that parsing is finished
+        exit_code = proc.returncode
+        if exit_code:
+            error("'%s' returned exit code %i, and errors:\n---\n%s---",
+                  " ".join(args), exit_code, proc.stderr.read())
+        return exit_code
+
+
+class CVSRevLister(CVSLogParser):
+
+    """Extract CVSRev objects (with revision metadata) from a CVS log."""
+
+    def __init__ (self, cvs_repo, show_progress = False):
+        """Create a new CVSRevLister.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+        show_progress determines whether progress indication is shown.
+
+        """
+        super(CVSRevLister, self).__init__(cvs_repo)
+        self.cur_file = None  # Current CVS file being processed
+        self.cur_file_numrevs = 0  # #revs in current CVS file
+        self.cur_rev = None  # Current CVSRev under construction
+        self.progress = None
+        if show_progress:
+            self.progress = ProgressIndicator("\t", sys.stderr)
+        # Store found revs in a two-level dict structure:
+        # filename -> revnum -> CVSRev
+        self.revs = {}
+        # Possible states:
+        # - BeforeRevs  - waiting for "total revisions:"
+        # - BetweenRevs - waiting for "----------------------------"
+        # - ReadingRev  - reading CVS revision details
+        self.state = 'BeforeRevs'
+
+    def __call__ (self, line):
+        """Line parser; this method is invoked for each line in the log."""
+        if self.state == 'BeforeRevs':
+            if line.startswith("RCS file: "):
+                self.cur_file = self.cleanup_path(line[10:])
+                assert self.cur_file not in self.revs
+                self.revs[self.cur_file] = {}
+            elif line.startswith("total revisions: "):
+                assert self.cur_file
+                totalrevs_unused, selectedrevs = line.split(";")
+                self.cur_file_numrevs = int(selectedrevs.split(":")[1].strip())
+                self.state = 'BetweenRevs'
+        elif self.state == 'BetweenRevs':
+            if (line == "----------------------------" or
+                line == "======================================"
+                        "======================================="):
+                if self.cur_rev:
+                    # Finished current revision
+                    f = self.revs[self.cur_file]
+                    assert self.cur_rev.num not in f
+                    f[self.cur_rev.num] = self.cur_rev
+                    self.cur_rev = None
+                    if self.progress:
+                        self.progress()
+                if line == "----------------------------":
+                    self.state = 'ReadingRev'
+                else:
+                    # Finalize current CVS file
+                    assert len(self.revs[self.cur_file]) == \
+                           self.cur_file_numrevs
+                    self.cur_file = None
+                    self.state = 'BeforeRevs'
+            elif self.cur_rev:
+                # Currently in the middle of a revision.
+                if line.startswith("branches:  %s" % (self.cur_rev.num)) and \
+                   line.endswith(";"):
+                    return  # Skip 'branches:' lines
+                # This line is part of the commit message.
+                self.cur_rev.message.append(line)
+        elif self.state == 'ReadingRev':
+            if line.startswith("revision "):
+                self.cur_rev = CVSRev(self.cur_file, CVSNum(line.split()[1]))
+            else:
+                date, author, state, dummy = line.split(";", 3)
+                assert date.startswith("date: ")
+                self.cur_rev.date = CVSDate(date[6:])
+                assert author.strip().startswith("author: ")
+                self.cur_rev.author = author.strip()[8:]
+                assert state.strip().startswith("state: ")
+                state = state.strip()[7:]
+                self.cur_rev.deleted = state == "dead"
+                self.state = 'BetweenRevs'
+
+    def finish (self):
+        """This method is invoked after the last line has been parsed."""
+        assert self.state == 'BeforeRevs'
+        if self.progress:
+            self.progress.finish("Parsed %i revs in %i files" %
+                                 (self.progress.n, len(self.revs)))
+
+
+def fetch_revs (path, from_rev, to_rev, symbol, cvs_repo):
+    """Fetch CVSRevs for each rev in <path:from_rev, path:symbol].
+
+    Return a dict of CVSRev objects (revnum -> CVSRev), where each
+    CVSRev encapsulates a CVS revision in the range from
+    path:from_rev to path:symbol (inclusive).  If symbol currently
+    refers to from_rev (i.e. nothing has happened since the last
+    import), the returned dict will have exactly one entry (from_rev).
+    If there is no valid revision range between from_rev and symbol,
+    the returned dict will be empty.  Situations in which an empty dict
+    is returned, include:
+    - symbol is no longer defined on this path
+    - symbol refers to a revision that is disjoint from from_rev
+
+    from_rev may be None, meaning that all revisions from the initial
+    version of path up to the revision currently referenced by symbol
+    should be fetched.
+
+    If the revision currently referenced by symbol is disjoint from
+    from_rev, the returned dict will be empty.
+
+    Note that there is lots of unexpected behaviour in the handling of
+    the 'cvs rlog -r' parameter: Say you have a branch, called
+    'my_branch', that points to branch number 1.1.2 of a file.  Say
+    there are 3 revisions on this branch: 1.1.2.1 -> 1.1.2.3 (in
+    additions to the branch point 1.1).  Now, observe the following
+    'cvs rlog' executions:
+    - cvs rlog -r0:my_branch ... returns 1.1, 1.1.2.1, 1.1.2.2, 1.1.2.3
+    - cvs rlog -r1.1:my_branch ... returns the same revs
+    - cvs rlog -rmy_branch ... returns 1.1.2.1, 1.1.2.2, 1.1.2.3
+    - cvs rlog -rmy_branch: ... returns the same revs
+    - cvs rlog -r:my_branch ... returns the same revs
+    - cvs rlog -r::my_branch ... returns the same revs
+    - cvs rlog -r1.1.2.1: ... returns the same revs
+    Here is where it gets really weird:
+    - cvs rlog -r1.1.2.1:my_branch ... returns 1.1.2.1 only
+    - cvs rlog -r1.1.2.2:my_branch ... returns 1.1.2.1, 1.1.2.2
+    - cvs rlog -r1.1.2.3:my_branch ... returns 1.1.2.1, 1.1.2.2, 1.1.2.3
+
+    In other words the 'cvs rlog -rfrom_rev:symbol' scheme that we
+    normally use will not work in the case where from_rev is _on_ the
+    branch pointed at by the symbol.
+
+    Therefore, we need an extra parameter, to_rev, which we can use to:
+    1. Detect when this situation is present.
+    2. Work around by using 'cvs rlog -rfrom_ref:to_rev' instead.
+
+    """
+    if from_rev is None:  # Initial import
+        from_rev = "0"  # "cvs rlog -r0:X" fetches from initial revision
+    elif to_rev and to_rev.branch() == from_rev.branch():
+        symbol = to_rev  # Use to_rev instead of given symbol
+    # Run 'cvs rlog' on range [from_rev, symbol] and parse CVSRev objects
+    parser = CVSRevLister(cvs_repo)
+    parser.run((path,), True, "%s:%s" % (from_rev, symbol))
+    assert len(parser.revs) == 1
+    assert path in parser.revs
+    return parser.revs[path]
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/git_remote_cvs/cvs_symbol_cache.py b/git_remote_cvs/cvs_symbol_cache.py
new file mode 100644
index 0000000..cc8d88b
--- /dev/null
+++ b/git_remote_cvs/cvs_symbol_cache.py
@@ -0,0 +1,313 @@
+#!/usr/bin/env python
+
+"""Implementation of a local CVS symbol cache.
+
+A CVS symbol cache stores a list of CVS symbols and the CVS state
+associated with each of those CVS symbols at some point in time.
+
+Keeping a local cache of CVS symbols is often needed because the
+design of CVS makes it potentially very expensive to query the CVS
+server directly for CVS symbols and associated states.
+
+In these cases, a local CVS symbol cache can provide equivalent
+(although possibly out-of-date) information immediatele.
+
+Synchronization with the current state on the CVS server can be
+done on a symbol-by-symbol basis (by checking out a given symbol
+and extracting the CVS state from the CVS work tree), or by
+synchronizing _all_ CVS symbols in one operation (by executing
+'cvs rlog' and parsing CVS states from its output).
+
+"""
+
+import sys
+import os
+
+from git_remote_cvs.util import debug, error, die, ProgressIndicator
+from git_remote_cvs.cvs import CVSNum, CVSState, CVSLogParser
+
+
+class CVSSymbolStateLister(CVSLogParser):
+
+    """Extract current CVSStates for all CVS symbols from a CVS log."""
+
+    def __init__ (self, cvs_repo, show_progress = False):
+        """Create a new CVSSymbolStateLister.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+        show_progress determines whether a progress indicator should
+        be displayed.
+
+        """
+        super(CVSSymbolStateLister, self).__init__(cvs_repo)
+        self.symbols = {}  # CVS symbol name -> CVSState object
+        self.cur_file = None  # current CVS file being processed
+        self.cur_file_numrevs = 0  # #revs in current CVS file
+        self.cur_revnum = None  # current revision number
+        self.rev2syms = {}  # CVSNum -> [CVS symbol names]
+        self.cur_revs = {}  # CVSNum -> True/False (deleted)
+        self.head_num = None  # CVSNum of the HEAD rev or branch
+
+        # Possible states:
+        # - BeforeSymbols - waiting for "symbolic names:"
+        # - WithinSymbols - reading CVS symbol names
+        # - BeforeRevs  - waiting for "total revisions:"
+        # - BetweenRevs - waiting for "----------------------------"
+        # - ReadingRev  - reading CVS revision details
+        self.state = 'BeforeSymbols'
+
+        self.progress = None
+        if show_progress:
+            self.progress = ProgressIndicator("\t", sys.stderr)
+
+    def finalize_symbol_states (self):
+        """Adjust CVSStates in self.symbols based on revision data.
+
+        Based on the information found in self.rev2syms and
+        self.cur_revs, remove deleted revisions and turn branch numbers
+        into corresponding revisions in the CVSStates found in
+        self.symbols.
+
+        """
+        # Create a mapping from branch numbers to the last existing
+        # revision number on those branches
+        branch2lastrev = {}  # branch number -> revision number
+        for revnum in self.cur_revs.iterkeys():
+            branchnum = revnum.branch()
+            if (branchnum not in branch2lastrev) or \
+               (revnum > branch2lastrev[branchnum]):
+                branch2lastrev[branchnum] = revnum
+
+        for cvsnum, symbols in self.rev2syms.iteritems():
+            if cvsnum.is_branch():
+                # Turn into corresponding revision number
+                revnum = branch2lastrev.get(cvsnum, cvsnum.parent())
+                for s in symbols:
+                    state = self.symbols[s]
+                    assert state[self.cur_file] == cvsnum
+                    state.replace(self.cur_file, revnum)
+                cvsnum = revnum
+            assert cvsnum.is_rev()
+            assert cvsnum in self.cur_revs
+            if self.cur_revs[cvsnum]:  # cvsnum is a deleted rev
+                # Remove from CVSStates
+                for s in symbols:
+                    state = self.symbols[s]
+                    state.remove(self.cur_file, cvsnum)
+
+        self.rev2syms = {}
+        self.cur_revs = {}
+        self.cur_file = None
+
+    def __call__ (self, line):
+        """Line parser; this method is invoked for each line in the log."""
+        if self.state == 'BeforeSymbols':
+            if line.startswith("RCS file: "):
+                self.cur_file = self.cleanup_path(line[10:])
+                if self.progress:
+                    self.progress("%5i symbols found - Parsing CVS file #%i: "
+                                  "%s " % (len(self.symbols), self.progress.n,
+                                           self.cur_file,))
+            if line.startswith("head: "):
+                self.head_num = CVSNum(line[6:])
+            if line.startswith("branch: "):
+                self.head_num = CVSNum(line[8:])
+            elif line == "symbolic names:":
+                assert self.head_num
+                s = self.symbols.setdefault("HEAD", CVSState())
+                s.add(self.cur_file, self.head_num)
+                r = self.rev2syms.setdefault(self.head_num, [])
+                r.append("HEAD")
+                self.head_num = None
+                self.state = 'WithinSymbols'
+        elif self.state == 'WithinSymbols':
+            if line.startswith("\t"):
+                symbol, cvsnum = line.split(":", 1)
+                symbol = symbol.strip()
+                cvsnum = CVSNum(cvsnum)
+                s = self.symbols.setdefault(symbol, CVSState())
+                s.add(self.cur_file, cvsnum)
+                r = self.rev2syms.setdefault(cvsnum, [])
+                r.append(symbol)
+            else:
+                self.state = 'BeforeRevs'
+        elif self.state == 'BeforeRevs':
+            if line.startswith("total revisions: "):
+                assert self.cur_file
+                totalrevs_unused, selectedrevs = line.split(";")
+                self.cur_file_numrevs = int(selectedrevs.split(":")[1].strip())
+                self.state = 'BetweenRevs'
+        elif self.state == 'BetweenRevs':
+            if (line == "----------------------------" or
+                line == "======================================"
+                        "======================================="):
+                if self.cur_revnum:
+                    assert self.cur_revnum in self.cur_revs
+                    self.cur_revnum = None
+                if line == "----------------------------":
+                    self.state = 'ReadingRev'
+                else:
+                    # Finalize current CVS file
+                    assert len(self.cur_revs) == self.cur_file_numrevs
+                    self.finalize_symbol_states()
+                    self.state = 'BeforeSymbols'
+        elif self.state == 'ReadingRev':
+            if line.startswith("revision "):
+                self.cur_revnum = CVSNum(line.split()[1])
+            else:
+                date, author, state, dummy = line.split(";", 3)
+                assert date.startswith("date: ")
+                assert author.strip().startswith("author: ")
+                assert state.strip().startswith("state: ")
+                state = state.strip()[7:]
+                assert self.cur_revnum not in self.cur_revs
+                deleted = state == "dead"
+                self.cur_revs[self.cur_revnum] = deleted
+                self.state = 'BetweenRevs'
+
+    def finish (self):
+        """This method is invoked after the last line has been parsed."""
+        assert self.state == 'BeforeSymbols'
+        if self.progress:
+            self.progress.finish("Parsed %i symbols in %i files" %
+                                 (len(self.symbols), self.progress.n))
+
+
+class CVSSymbolCache(object):
+
+    """Local cache of the current CVSState of CVS symbols.
+
+    Simulates a dictionary of CVS symbol -> CVSState mappings.
+
+    """
+
+    def __init__ (self, symbols_dir):
+        """Create a new CVS symbol cache, located in the given directory."""
+        self.symbols_dir = symbols_dir
+        if not os.path.isdir(self.symbols_dir):
+            os.makedirs(self.symbols_dir)
+
+    def __len__ (self):
+        """Return the number of CVS symbols stored in this cache."""
+        return len(os.listdir(self.symbols_dir))
+
+    def __iter__ (self):
+        """Return an iterator traversing symbol names stored in this cache."""
+        for filename in os.listdir(self.symbols_dir):
+            yield filename
+
+    def __contains__ (self, symbol):
+        """Return True if the given symbol is present in this cache."""
+        return os.access(os.path.join(self.symbols_dir, symbol),
+                         os.F_OK | os.R_OK)
+
+    def __getitem__ (self, symbol):
+        """Return the cached CVSState of the given CVS symbol."""
+        try:
+            f = open(os.path.join(self.symbols_dir, symbol), 'r')
+        except IOError:
+            raise KeyError("'%s'" % (symbol))
+        ret = CVSState()
+        ret.load(f)
+        f.close()
+        return ret
+
+    def __setitem__ (self, symbol, cvs_state):
+        """Store the given CVS symbol and CVSState into the cache."""
+        cvs_state.save(os.path.join(self.symbols_dir, symbol))
+
+    def __delitem__ (self, symbol):
+        """Remove the the given CVS symbol from the cache."""
+        os.remove(os.path.join(self.symbols_dir, symbol))
+
+    def get (self, symbol, default = None):
+        """Return the cached CVSState of the given CVS symbol."""
+        try:
+            return self[symbol]
+        except KeyError:
+            return default
+
+    def items (self):
+        """Return list of (CVS symbol, CVSState) tuples saved in this cache."""
+        for filename in self:
+            yield (filename, self[filename])
+
+    def clear (self):
+        """Remove all entries from this CVS symbol cache."""
+        for filename in os.listdir(self.symbols_dir):
+            os.remove(os.path.join(self.symbols_dir, filename))
+
+    def sync_symbol (self, symbol, cvs, progress):
+        """Synchronize the given CVS symbol with the CVS server.
+
+        The given CVS workdir is used for the synchronization.
+        The retrieved CVSState is also returned
+
+        """
+        progress("Retrieving state of CVS symbol '%s'..." % (symbol))
+        cvs.update(symbol)
+        state = cvs.get_state()
+
+        progress("Saving state of '%s' to symbol cache..." % (symbol))
+        self[symbol] = state
+
+    def sync_all_symbols (self, cvs_repo, progress, symbol_filter = None):
+        """Synchronize this entire CVS symbol cache with the CVS server.
+
+        This may be very expensive if the CVS repository is large, or
+        has many symbols.  After this method returns, the symbol cache
+        will be in sync with the current state on the server.
+
+        This method returns a dict with the keys 'unchanged',
+        'changed', 'added', and 'deleted', where each map to a list of
+        CVS symbols.  Each CVS symbol appears in exactly one of these
+        lists.
+
+        If symbol_filter is given, it specifies functions that takes
+        one parameter - a CVS symbol name - and returns True if that
+        symbol should be synchronized, and False if that symbol should
+        be skipped.  Otherwise all CVS symbols are synchronized.
+
+        """
+        if symbol_filter is None:
+            symbol_filter = lambda symbol: True
+
+        # Run cvs rlog to fetch current CVSState for all CVS symbols
+        progress("Retrieving current state of all CVS symbols from CVS "
+                 "server...", lf = True)
+        parser = CVSSymbolStateLister(cvs_repo, True)
+        retcode = parser.run()
+        if retcode:
+            raise EnvironmentError(retcode, "cvs rlog exit code %i" % retcode)
+
+        # Update symbol cache with new states from the CVS server
+        progress("Updating symbol cache with current CVS state...")
+        results = {}
+        result_keys = ("unchanged", "changed", "added", "deleted")
+        for k in result_keys:
+            results[k] = []
+        # Classify existing symbols as unchanged, changed, or deleted
+        for symbol in filter(symbol_filter, self):
+            if symbol not in parser.symbols:  # Deleted
+                results["deleted"].append(symbol)
+                del self[symbol]
+            elif self[symbol] != parser.symbols[symbol]:  # Changed
+                results["changed"].append(symbol)
+                self[symbol] = parser.symbols[symbol]
+            else:  # Unchanged
+                results["unchanged"].append(symbol)
+            progress()
+        # Add symbols that are not in self
+        for symbol, state in parser.symbols.iteritems():
+            if not symbol_filter(symbol):
+                debug("Skipping CVS symbol '%s'...", symbol)
+            elif symbol in self:
+                assert state == self[symbol]
+            else:  # Added
+                results["added"].append(symbol)
+                self[symbol] = state
+            progress()
+        progress("Synchronized local symbol cache (%s)" %
+                 (", ".join(["%i %s" % (len(results[k]), k)
+                             for k in result_keys])), True)
+        return results
diff --git a/git_remote_cvs/util.py b/git_remote_cvs/util.py
new file mode 100644
index 0000000..d3ca487
--- /dev/null
+++ b/git_remote_cvs/util.py
@@ -0,0 +1,194 @@
+#!/usr/bin/env python
+
+"""Misc. useful functionality used by the rest of this package.
+
+This module provides common functionality used by the other modules in
+this package.
+
+"""
+
+import sys
+import os
+import subprocess
+
+
+# Whether or not to show debug messages
+DEBUG = False
+
+def notify(msg, *args):
+	"""Print a message to stderr."""
+	print >> sys.stderr, msg % args
+
+def debug (msg, *args):
+    """Print a debug message to stderr when DEBUG is enabled."""
+    if DEBUG:
+        print >> sys.stderr, msg % args
+
+def error (msg, *args):
+    """Print an error message to stderr."""
+    print >> sys.stderr, "ERROR:", msg % args
+
+def warn(msg, *args):
+	"""Print a warning message to stderr."""
+	print >> sys.stderr, "warning:", msg % args
+
+def die (msg, *args):
+    """Print as error message to stderr and exit the program."""
+    error(msg, *args)
+    sys.exit(1)
+
+
+class ProgressIndicator(object):
+
+    """Simple progress indicator.
+
+    Displayed as a spinning character by default, but can be customized
+    by passing custom messages that overrides the spinning character.
+
+    """
+
+    States = ("|", "/", "-", "\\")
+
+    def __init__ (self, prefix = "", f = sys.stdout):
+        """Create a new ProgressIndicator, bound to the given file object."""
+        self.n = 0  # Simple progress counter
+        self.f = f  # Progress is written to this file object
+        self.prev_len = 0  # Length of previous msg (to be overwritten)
+        self.prefix = prefix  # Prefix prepended to each progress message
+        self.prefix_lens = [] # Stack of prefix string lengths
+
+    def pushprefix (self, prefix):
+        """Append the given prefix onto the prefix stack."""
+        self.prefix_lens.append(len(self.prefix))
+        self.prefix += prefix
+
+    def popprefix (self):
+        """Remove the last prefix from the prefix stack."""
+        prev_len = self.prefix_lens.pop()
+        self.prefix = self.prefix[:prev_len]
+
+    def __call__ (self, msg = None, lf = False):
+        """Indicate progress, possibly with a custom message."""
+        if msg is None:
+            msg = self.States[self.n % len(self.States)]
+        msg = self.prefix + msg
+        print >> self.f, "\r%-*s" % (self.prev_len, msg),
+        self.prev_len = len(msg.expandtabs())
+        if lf:
+            print >> self.f
+            self.prev_len = 0
+        self.n += 1
+
+    def finish (self, msg = "done", noprefix = False):
+        """Finalize progress indication with the given message."""
+        if noprefix:
+            self.prefix = ""
+        self(msg, True)
+
+
+def start_command (args, cwd = None, shell = False, add_env = None,
+                   stdin = subprocess.PIPE, stdout = subprocess.PIPE,
+                   stderr = subprocess.PIPE):
+    """Start the given command, and return a subprocess object.
+
+    This provides a simpler interface to the subprocess module.
+
+    """
+    env = None
+    if add_env is not None:
+        env = os.environ.copy()
+        env.update(add_env)
+    return subprocess.Popen(args, bufsize = 1, stdin = stdin, stdout = stdout,
+                            stderr = stderr, cwd = cwd, shell = shell,
+                            env = env, universal_newlines = True)
+
+
+def run_command (args, cwd = None, shell = False, add_env = None,
+                 flag_error = True):
+    """Run the given command to completion, and return its results.
+
+    This provides a simpler interface to the subprocess module.
+
+    The results are formatted as a 3-tuple: (exit_code, output, errors)
+
+    If flag_error is enabled, Error messages will be produced if the
+    subprocess terminated with a non-zero exit code and/or stderr
+    output.
+
+    The other arguments are passed on to start_command().
+
+    """
+    process = start_command(args, cwd, shell, add_env)
+    (output, errors) = process.communicate()
+    exit_code = process.returncode
+    if flag_error and errors:
+        error("'%s' returned errors:\n---\n%s---", " ".join(args), errors)
+    if flag_error and exit_code:
+        error("'%s' returned exit code %i", " ".join(args), exit_code)
+    return (exit_code, output, errors)
+
+
+def file_reader_method (missing_ok = False):
+    """Decorator for simplifying reading of files.
+
+    If missing_ok is True, a failure to open a file for reading will
+    not raise the usual IOError, but instead the wrapped method will be
+    called with f == None.  The method must in this case properly
+    handle f == None.
+
+    """
+    def _wrap (method):
+        """Teach given method to handle both filenames and file objects.
+
+        The given method must take a file object as its second argument
+        (the first argument being 'self', of course).  This decorator
+        will take a filename given as the second argument and promote
+        it to a file object.
+
+        """
+        def _wrapped_method (self, filename, *args, **kwargs):
+            if isinstance(filename, file):
+                f = filename
+            else:
+                try:
+                    f = open(filename, 'r')
+                except IOError:
+                    if missing_ok:
+                        f = None
+                    else:
+                        raise
+            try:
+                return method(self, f, *args, **kwargs)
+            finally:
+                if not isinstance(filename, file) and f:
+                    f.close()
+        return _wrapped_method
+    return _wrap
+
+
+def file_writer_method (method):
+    """Decorator for simplifying writing of files.
+
+    Enables the given method to handle both filenames and file objects.
+
+    The given method must take a file object as its second argument
+    (the first argument being 'self', of course).  This decorator will
+    take a filename given as the second argument and promote it to a
+    file object.
+
+    """
+    def _new_method (self, filename, *args, **kwargs):
+        if isinstance(filename, file):
+            f = filename
+        else:
+            # Make sure the containing directory exists
+            parent_dir = os.path.dirname(filename)
+            if not os.path.isdir(parent_dir):
+                os.makedirs(parent_dir)
+            f = open(filename, 'w')
+        try:
+            return method(self, f, *args, **kwargs)
+        finally:
+            if not isinstance(filename, file):
+                f.close()
+    return _new_method
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 05/19] Add support for "import" helper command
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

This command, supported if the "import" capability is advertized,
allows a helper to support fetching by outputting a git-fast-import
stream.

If both "fetch" and "import" are advertized, git itself will use
"fetch" (although other users may use "import" in this case).

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	This has my symref patch squashed in.

 Documentation/git-remote-helpers.txt |   10 ++++++
 transport-helper.c                   |   52 ++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-remote-helpers.txt b/Documentation/git-remote-helpers.txt
index 173ee23..e9aa67e 100644
--- a/Documentation/git-remote-helpers.txt
+++ b/Documentation/git-remote-helpers.txt
@@ -43,6 +43,13 @@ Commands are given by the caller on the helper's standard input, one per line.
 +
 Supported if the helper has the "fetch" capability.
 
+'import' <name>::
+	Produces a fast-import stream which imports the current value
+	of the named ref. It may additionally import other refs as
+	needed to construct the history efficiently.
++
+Supported if the helper has the "import" capability.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
@@ -57,6 +64,9 @@ CAPABILITIES
 'fetch'::
 	This helper supports the 'fetch' command.
 
+'import'::
+	This helper supports the 'import' command.
+
 REF LIST ATTRIBUTES
 -------------------
 
diff --git a/transport-helper.c b/transport-helper.c
index a0e4219..f840842 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -11,6 +11,7 @@ struct helper_data
 	const char *name;
 	struct child_process *helper;
 	unsigned fetch : 1;
+	unsigned import : 1;
 };
 
 static struct child_process *get_helper(struct transport *transport)
@@ -48,6 +49,8 @@ static struct child_process *get_helper(struct transport *transport)
 			break;
 		if (!strcmp(buf.buf, "fetch"))
 			data->fetch = 1;
+		if (!strcmp(buf.buf, "import"))
+			data->import = 1;
 	}
 	return data->helper;
 }
@@ -91,6 +94,52 @@ static int fetch_with_fetch(struct transport *transport,
 	return 0;
 }
 
+static int get_importer(struct transport *transport, struct child_process *fastimport)
+{
+	struct child_process *helper = get_helper(transport);
+	memset(fastimport, 0, sizeof(*fastimport));
+	fastimport->in = helper->out;
+	fastimport->argv = xcalloc(5, sizeof(*fastimport->argv));
+	fastimport->argv[0] = "fast-import";
+	fastimport->argv[1] = "--quiet";
+
+	fastimport->git_cmd = 1;
+	return start_command(fastimport);
+}
+
+static int fetch_with_import(struct transport *transport,
+			     int nr_heads, struct ref **to_fetch)
+{
+	struct child_process fastimport;
+	struct child_process *helper = get_helper(transport);
+	int i;
+	struct ref *posn;
+	struct strbuf buf = STRBUF_INIT;
+
+	if (get_importer(transport, &fastimport))
+		die("Couldn't run fast-import");
+
+	for (i = 0; i < nr_heads; i++) {
+		posn = to_fetch[i];
+		if (posn->status & REF_STATUS_UPTODATE)
+			continue;
+
+		strbuf_addf(&buf, "import %s\n", posn->name);
+		write_in_full(helper->in, buf.buf, buf.len);
+		strbuf_reset(&buf);
+	}
+	disconnect_helper(transport);
+	finish_command(&fastimport);
+
+	for (i = 0; i < nr_heads; i++) {
+		posn = to_fetch[i];
+		if (posn->status & REF_STATUS_UPTODATE)
+			continue;
+		read_ref(posn->symref ? posn->symref : posn->name, posn->old_sha1);
+	}
+	return 0;
+}
+
 static int fetch(struct transport *transport,
 		 int nr_heads, struct ref **to_fetch)
 {
@@ -108,6 +157,9 @@ static int fetch(struct transport *transport,
 	if (data->fetch)
 		return fetch_with_fetch(transport, nr_heads, to_fetch);
 
+	if (data->import)
+		return fetch_with_import(transport, nr_heads, to_fetch);
+
 	return -1;
 }
 
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 11/19] Basic build infrastructure for Python scripts
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

This patch adds basic boilerplate support (based on corresponding Perl
sections) for enabling the building and installation Python scripts.

There are currently no Python scripts being built, and when Python
scripts are added in future patches, their building and installation
can be disabled by defining NO_PYTHON.

Signed-off-by: Johan Herland <johan@herland.net>
---

	Unchanged.

 Makefile      |   13 +++++++++++++
 configure.ac  |    3 +++
 t/test-lib.sh |    1 +
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index 268aede..b27a7d6 100644
--- a/Makefile
+++ b/Makefile
@@ -164,6 +164,8 @@ all::
 #
 # Define NO_PERL if you do not want Perl scripts or libraries at all.
 #
+# Define NO_PYTHON if you do not want Python scripts or libraries at all.
+#
 # Define NO_TCLTK if you do not want Tcl/Tk GUI.
 #
 # The TCL_PATH variable governs the location of the Tcl interpreter
@@ -308,6 +310,7 @@ LIB_H =
 LIB_OBJS =
 PROGRAMS =
 SCRIPT_PERL =
+SCRIPT_PYTHON =
 SCRIPT_SH =
 TEST_PROGRAMS =
 
@@ -345,6 +348,7 @@ SCRIPT_PERL += git-svn.perl
 
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
+	  $(patsubst %.py,%,$(SCRIPT_PYTHON)) \
 	  git-instaweb
 
 # Empty...
@@ -398,8 +402,12 @@ endif
 ifndef PERL_PATH
 	PERL_PATH = /usr/bin/perl
 endif
+ifndef PYTHON_PATH
+	PYTHON_PATH = /usr/bin/python
+endif
 
 export PERL_PATH
+export PYTHON_PATH
 
 LIB_FILE=libgit.a
 XDIFF_LIB=xdiff/lib.a
@@ -1308,6 +1316,10 @@ ifeq ($(PERL_PATH),)
 NO_PERL=NoThanks
 endif
 
+ifeq ($(PYTHON_PATH),)
+NO_PYTHON=NoThanks
+endif
+
 QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
 QUIET_SUBDIR1  =
 
@@ -1355,6 +1367,7 @@ prefix_SQ = $(subst ','\'',$(prefix))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
+PYTHON_PATH_SQ = $(subst ','\'',$(PYTHON_PATH))
 TCLTK_PATH_SQ = $(subst ','\'',$(TCLTK_PATH))
 
 LIBS = $(GITLIBS) $(EXTLIBS)
diff --git a/configure.ac b/configure.ac
index b09b8e4..84b6cf4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -233,6 +233,9 @@ GIT_ARG_SET_PATH(shell)
 # Define PERL_PATH to provide path to Perl.
 GIT_ARG_SET_PATH(perl)
 #
+# Define PYTHON_PATH to provide path to Python.
+GIT_ARG_SET_PATH(python)
+#
 # Define ZLIB_PATH to provide path to zlib.
 GIT_ARG_SET_PATH(zlib)
 #
diff --git a/t/test-lib.sh b/t/test-lib.sh
index f2ca536..0b991db 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -729,6 +729,7 @@ case $(uname -s) in
 esac
 
 test -z "$NO_PERL" && test_set_prereq PERL
+test -z "$NO_PYTHON" && test_set_prereq PYTHON
 
 # test whether the filesystem supports symbolic links
 ln -s x y 2>/dev/null && test -h y 2>/dev/null && test_set_prereq SYMLINKS
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 09/19] Finally make remote helper support useful
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johannes Schindelin, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The common case for remote helpers will be to import some repository which
can be specified by a single URL.  Rather than supporting those who say
that Git is really complicated, let's be nice to users so that they will
be able to say:

	git clone hg::https://soc.googlecode.com/hg/ soc

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	I decided to not allow 'hg+https' format, since clone does
	not deal well with that syntax (it checks to see whether
	there is a ':' in the url or not).

 transport.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/transport.c b/transport.c
index 741a3a7..e6a00b2 100644
--- a/transport.c
+++ b/transport.c
@@ -818,6 +818,16 @@ struct transport *transport_get(struct remote *remote, const char *url)
 		url = remote->url[0];
 	ret->url = url;
 
+	/* maybe it is a foreign URL? */
+	if (url) {
+		const char *p = url;
+
+		while (isalnum(*p))
+			p++;
+		if (!prefixcmp(p, "::"))
+			remote->foreign_vcs = xstrndup(url, p - url);
+	}
+
 	if (remote && remote->foreign_vcs) {
 		transport_helper_init(ret, remote->foreign_vcs);
 		return ret;
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 10/19] Allow helpers to request marks for fast-import
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Johan Herland, Daniel Barkalow
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Johan Herland <johan@herland.net>

The 'marks' capability is reported by the remote helper if it requires
the fast-import marks database to loaded/saved by any git-fast-import
process that is provided by the transport machinery. The feature is
advertised along with exactly one argument: the location of the file
containing the marks database.

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
---

	Unchanged.

 Documentation/git-remote-helpers.txt |    8 ++++++++
 transport-helper.c                   |   15 +++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-remote-helpers.txt b/Documentation/git-remote-helpers.txt
index 2c5130f..9a3c5bc 100644
--- a/Documentation/git-remote-helpers.txt
+++ b/Documentation/git-remote-helpers.txt
@@ -67,6 +67,14 @@ CAPABILITIES
 'import'::
 	This helper supports the 'import' command.
 
+'marks' filename::
+	Helper requires the marks from a git-fast-import run to be
+	loaded from, and saved to, the given filename. When this
+	"feature" is advertised, each git-fast-import run must load
+	and save the internal marks database (see the --import-marks
+	and --export-marks option to git-fast-import for more details)
+	located at the given filename.
+
 REF LIST ATTRIBUTES
 -------------------
 
diff --git a/transport-helper.c b/transport-helper.c
index 36a265d..cd3f0df 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -10,6 +10,7 @@ struct helper_data
 {
 	const char *name;
 	struct child_process *helper;
+	char *marks_filename;
 	unsigned fetch : 1;
 	unsigned import : 1;
 };
@@ -51,6 +52,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->fetch = 1;
 		if (!strcmp(buf.buf, "import"))
 			data->import = 1;
+		if (!prefixcmp(buf.buf, "marks "))
+			data->marks_filename = xstrdup(buf.buf + 6);
 	}
 	return data->helper;
 }
@@ -104,11 +107,21 @@ static int fetch_with_fetch(struct transport *transport,
 static int get_importer(struct transport *transport, struct child_process *fastimport)
 {
 	struct child_process *helper = get_helper(transport);
+	struct helper_data *data = transport->data;
 	memset(fastimport, 0, sizeof(*fastimport));
 	fastimport->in = helper->out;
 	fastimport->argv = xcalloc(5, sizeof(*fastimport->argv));
 	fastimport->argv[0] = "fast-import";
 	fastimport->argv[1] = "--quiet";
+	if (data->marks_filename) {
+		struct strbuf buf = STRBUF_INIT;
+		strbuf_addf(&buf, "--export-marks=%s", data->marks_filename);
+		fastimport->argv[2] = strbuf_detach(&buf, 0);
+		if (!access(data->marks_filename, R_OK)) {
+			strbuf_addf(&buf, "--import-marks=%s", data->marks_filename);
+			fastimport->argv[3] = strbuf_detach(&buf, 0);
+		}
+	}
 
 	fastimport->git_cmd = 1;
 	return start_command(fastimport);
@@ -137,6 +150,8 @@ static int fetch_with_import(struct transport *transport,
 	}
 	disconnect_helper(transport);
 	finish_command(&fastimport);
+	free((char *) fastimport.argv[2]);
+	free((char *) fastimport.argv[3]);
 
 	for (i = 0; i < nr_heads; i++) {
 		posn = to_fetch[i];
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 08/19] Fix memory leak in helper method for disconnect
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

Since some cases may need to disconnect from the helper and reconnect,
wrap the function that just disconnects in a function that also frees
transport->data.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
---

	Unchanged.

 transport-helper.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index e093d05..36a265d 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -70,6 +70,13 @@ static int disconnect_helper(struct transport *transport)
 	return 0;
 }
 
+static int release_helper(struct transport *transport)
+{
+	disconnect_helper(transport);
+	free(transport->data);
+	return 0;
+}
+
 static int fetch_with_fetch(struct transport *transport,
 			    int nr_heads, struct ref **to_fetch)
 {
@@ -252,6 +259,6 @@ int transport_helper_init(struct transport *transport, const char *name)
 	transport->get_refs_list = get_refs_list;
 	transport->fetch = fetch;
 	transport->update_refs = update_refs;
-	transport->disconnect = disconnect_helper;
+	transport->disconnect = release_helper;
 	return 0;
 }
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 07/19] Allow helpers to report in "list" command that the ref is unchanged
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

Helpers may use a line like "? name unchanged" to specify that there
is nothing new at that name, without any git-specific code to
determine the correct response.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
---

	Unchanged.

 Documentation/git-remote-helpers.txt |    4 +++-
 transport-helper.c                   |   22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-remote-helpers.txt b/Documentation/git-remote-helpers.txt
index e9aa67e..2c5130f 100644
--- a/Documentation/git-remote-helpers.txt
+++ b/Documentation/git-remote-helpers.txt
@@ -70,7 +70,9 @@ CAPABILITIES
 REF LIST ATTRIBUTES
 -------------------
 
-None are defined yet, but the caller must accept any which are supplied.
+'unchanged'::
+	This ref is unchanged since the last import or fetch, although
+	the helper cannot necessarily determine what value that produced.
 
 Documentation
 -------------
diff --git a/transport-helper.c b/transport-helper.c
index ab40a9a..e093d05 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -163,6 +163,22 @@ static int fetch(struct transport *transport,
 	return -1;
 }
 
+static int has_attribute(const char *attrs, const char *attr) {
+	int len;
+	if (!attrs)
+		return 0;
+
+	len = strlen(attr);
+	for (;;) {
+		const char *space = strchrnul(attrs, ' ');
+		if (len == space - attrs && !strncmp(attrs, attr, len))
+			return 1;
+		if (!*space)
+			return 0;
+		attrs = space + 1;
+	}
+}
+
 static struct ref *get_refs_list(struct transport *transport, int for_push)
 {
 	struct child_process *helper;
@@ -197,6 +213,12 @@ static struct ref *get_refs_list(struct transport *transport, int for_push)
 			(*tail)->symref = xstrdup(buf.buf + 1);
 		else if (buf.buf[0] != '?')
 			get_sha1_hex(buf.buf, (*tail)->old_sha1);
+		if (eon) {
+			if (has_attribute(eon + 1, "unchanged")) {
+				(*tail)->status |= REF_STATUS_UPTODATE;
+				read_ref((*tail)->name, (*tail)->old_sha1);
+			}
+		}
 		tail = &((*tail)->next);
 	}
 	strbuf_release(&buf);
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [UNSTABLE PATCH 03/19] Allow programs to not depend on remotes having urls
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

For fetch and ls-remote, which use the first url of a remote, have
transport_get() determine this by passing a remote and passing NULL
for the url. For push, which uses every url of a remote, use each url
in turn if there are any, and use NULL if there are none.

This will allow the transport code to do something different if the
location is not specified with a url.

Also, have the message for a fetch say "foreign" if there is no url.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	I rebased this on master and had major conflicts with the
	recent 'advice' series, Daniel, please have a look at this to
	see whether it is sane at all.

 builtin-fetch.c     |    7 +++-
 builtin-ls-remote.c |    4 +-
 builtin-push.c      |   68 +++++++++++++++++++++++++++++++-------------------
 transport.c         |    3 ++
 4 files changed, 52 insertions(+), 30 deletions(-)

diff --git a/builtin-fetch.c b/builtin-fetch.c
index a35a6f8..013a6ba 100644
--- a/builtin-fetch.c
+++ b/builtin-fetch.c
@@ -309,7 +309,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	if (!fp)
 		return error("cannot open %s: %s\n", filename, strerror(errno));
 
-	url = transport_anonymize_url(raw_url);
+	if (raw_url)
+		url = transport_anonymize_url(raw_url);
+	else
+		url = xstrdup("foreign");
 	for (rm = ref_map; rm; rm = rm->next) {
 		struct ref *ref = NULL;
 
@@ -704,7 +707,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (!remote)
 		die("Where do you want to fetch from today?");
 
-	transport = transport_get(remote, remote->url[0]);
+	transport = transport_get(remote, NULL);
 	if (verbosity >= 2)
 		transport->verbose = 1;
 	if (verbosity < 0)
diff --git a/builtin-ls-remote.c b/builtin-ls-remote.c
index 78a88f7..4c6fc58 100644
--- a/builtin-ls-remote.c
+++ b/builtin-ls-remote.c
@@ -87,9 +87,9 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		}
 	}
 	remote = nongit ? NULL : remote_get(dest);
-	if (remote && !remote->url_nr)
+	if (!nongit && !remote)
 		die("remote %s has no configured URL", dest);
-	transport = transport_get(remote, remote ? remote->url[0] : dest);
+	transport = transport_get(remote, remote ? NULL : dest);
 	if (uploadpack != NULL)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK, uploadpack);
 
diff --git a/builtin-push.c b/builtin-push.c
index b5cd2cd..d86b93d 100644
--- a/builtin-push.c
+++ b/builtin-push.c
@@ -88,6 +88,36 @@ static void setup_default_push_refspecs(void)
 	}
 }
 
+static int push_with_options(struct transport *transport, int flags)
+{
+	int err;
+	int nonfastforward;
+	if (receivepack)
+		transport_set_option(transport,
+				     TRANS_OPT_RECEIVEPACK, receivepack);
+	if (thin)
+		transport_set_option(transport, TRANS_OPT_THIN, "yes");
+
+	if (flags & TRANSPORT_PUSH_VERBOSE)
+		fprintf(stderr, "Pushing to %s\n", transport->url);
+	err = transport_push(transport, refspec_nr, refspec, flags,
+			     &nonfastforward);
+	err |= transport_disconnect(transport);
+
+	if (!err)
+		return 0;
+
+	error("failed to push some refs to '%s'", transport->url);
+
+	if (nonfastforward && advice_push_nonfastforward) {
+		printf("To prevent you from losing history, non-fast-forward updates were rejected\n"
+		       "Merge the remote changes before pushing again.  See the 'non-fast forward'\n"
+		       "section of 'git push --help' for details.\n");
+	}
+
+	return 1;
+}
+
 static int do_push(const char *repo, int flags)
 {
 	int i, errs;
@@ -136,33 +166,19 @@ static int do_push(const char *repo, int flags)
 		url = remote->url;
 		url_nr = remote->url_nr;
 	}
-	for (i = 0; i < url_nr; i++) {
-		struct transport *transport =
-			transport_get(remote, url[i]);
-		int err;
-		int nonfastforward;
-		if (receivepack)
-			transport_set_option(transport,
-					     TRANS_OPT_RECEIVEPACK, receivepack);
-		if (thin)
-			transport_set_option(transport, TRANS_OPT_THIN, "yes");
-
-		if (flags & TRANSPORT_PUSH_VERBOSE)
-			fprintf(stderr, "Pushing to %s\n", url[i]);
-		err = transport_push(transport, refspec_nr, refspec, flags,
-				     &nonfastforward);
-		err |= transport_disconnect(transport);
-
-		if (!err)
-			continue;
-
-		error("failed to push some refs to '%s'", url[i]);
-		if (nonfastforward && advice_push_nonfastforward) {
-			printf("To prevent you from losing history, non-fast-forward updates were rejected\n"
-			       "Merge the remote changes before pushing again.  See the 'non-fast forward'\n"
-			       "section of 'git push --help' for details.\n");
+	if (url_nr) {
+		for (i = 0; i < url_nr; i++) {
+			struct transport *transport =
+				transport_get(remote, url[i]);
+			if (push_with_options(transport, flags))
+				errs++;
 		}
-		errs++;
+	} else {
+		struct transport *transport =
+			transport_get(remote, NULL);
+
+		if (push_with_options(transport, flags))
+			errs++;
 	}
 	return !!errs;
 }
diff --git a/transport.c b/transport.c
index 3822fc7..5ae8db6 100644
--- a/transport.c
+++ b/transport.c
@@ -813,6 +813,9 @@ struct transport *transport_get(struct remote *remote, const char *url)
 	struct transport *ret = xcalloc(1, sizeof(*ret));
 
 	ret->remote = remote;
+
+	if (!url && remote && remote->url)
+		url = remote->url[0];
 	ret->url = url;
 
 	if (!prefixcmp(url, "rsync:")) {
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [RFC PATCH 06/19] Factor ref updating out of fetch_with_import
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

Also allow the new update_refs to actually update the refs set, this
way the remote helper can set the value of previously unknown refs.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	Daniel, if we can get wanted_peer_refs to keep HEAD as a
	wanted ref somehow this patch could be a lot simpler.

 builtin-clone.c    |    7 +++++++
 builtin-fetch.c    |    3 +++
 transport-helper.c |   15 +++++++++++++++
 transport.c        |    6 ++++++
 transport.h        |    8 ++++++--
 5 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/builtin-clone.c b/builtin-clone.c
index 0042bee..7c90ce2 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -529,6 +529,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			struct ref *ref_cpy = wanted_peer_refs(refs, refspec);
 			mapped_refs = ref_cpy;
 			transport_fetch_refs(transport, ref_cpy);
+			if (transport->update_refs)
+			{
+				ref_cpy = copy_ref_list(refs);
+				transport_update_refs(transport, ref_cpy);
+				refs = ref_cpy;
+				mapped_refs = wanted_peer_refs(refs, refspec);
+			}
 		}
 	}
 
diff --git a/builtin-fetch.c b/builtin-fetch.c
index 013a6ba..c35188b 100644
--- a/builtin-fetch.c
+++ b/builtin-fetch.c
@@ -479,7 +479,10 @@ static int fetch_refs(struct transport *transport, struct ref *ref_map)
 {
 	int ret = quickfetch(ref_map);
 	if (ret)
+	{
 		ret = transport_fetch_refs(transport, ref_map);
+		transport_update_refs(transport, ref_map);
+	}
 	if (!ret)
 		ret |= store_updated_refs(transport->url,
 				transport->remote->name,
diff --git a/transport-helper.c b/transport-helper.c
index f840842..ab40a9a 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -207,6 +207,20 @@ static struct ref *get_refs_list(struct transport *transport, int for_push)
 	return ret;
 }
 
+static int update_refs(struct transport *transport, struct ref *refs)
+{
+	struct ref *ref = refs;
+
+	while (ref) {
+		if (ref->status & REF_STATUS_UPTODATE)
+			continue;
+		read_ref(ref->symref ? ref->symref : ref->name, ref->old_sha1);
+		ref = ref->next;
+	}
+
+	return 0;
+}
+
 int transport_helper_init(struct transport *transport, const char *name)
 {
 	struct helper_data *data = xcalloc(sizeof(*data), 1);
@@ -215,6 +229,7 @@ int transport_helper_init(struct transport *transport, const char *name)
 	transport->data = data;
 	transport->get_refs_list = get_refs_list;
 	transport->fetch = fetch;
+	transport->update_refs = update_refs;
 	transport->disconnect = disconnect_helper;
 	return 0;
 }
diff --git a/transport.c b/transport.c
index 13bab4e..741a3a7 100644
--- a/transport.c
+++ b/transport.c
@@ -966,6 +966,12 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 	return rc;
 }
 
+void transport_update_refs(struct transport *transport, struct ref *refs)
+{
+	if (transport->update_refs)
+		transport->update_refs(transport, refs);
+}
+
 void transport_unlock_pack(struct transport *transport)
 {
 	if (transport->pack_lockfile) {
diff --git a/transport.h b/transport.h
index 503db11..1aba16c 100644
--- a/transport.h
+++ b/transport.h
@@ -32,12 +32,15 @@ struct transport {
 	/**
 	 * Fetch the objects for the given refs. Note that this gets
 	 * an array, and should ignore the list structure.
-	 *
+	 **/
+	int (*fetch)(struct transport *transport, int refs_nr, struct ref **refs);
+
+	/**
 	 * If the transport did not get hashes for refs in
 	 * get_refs_list(), it should set the old_sha1 fields in the
 	 * provided refs now.
 	 **/
-	int (*fetch)(struct transport *transport, int refs_nr, struct ref **refs);
+	int (*update_refs)(struct transport *transport, struct ref *refs);
 
 	/**
 	 * Push the objects and refs. Send the necessary objects, and
@@ -112,6 +115,7 @@ int transport_push(struct transport *connection,
 const struct ref *transport_get_remote_refs(struct transport *transport);
 
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
+void transport_update_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);
 char *transport_anonymize_url(const char *url);
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 04/19] Add a config option for remotes to specify a foreign vcs
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

If this is set, the url is not required, and the transport always uses
a helper named "git-remote-<value>".

It is a separate configuration option in order to allow a sensible
configuration for foreign systems which either have no meaningful urls
for repositories or which require urls that do not specify the system
used by the repository at that location. However, this only affects
how the name of the helper is determined, not anything about the
interaction with the helper, and the contruction is such that, if the
foreign scm does happen to use a co-named url method, a url with that
method may be used directly.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	This has my patch to fix valid_remote to allow for both url
	vcs squashed in.

 Documentation/config.txt |    4 ++++
 remote.c                 |    4 +++-
 remote.h                 |    2 ++
 transport.c              |    5 +++++
 4 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d1e2120..0d9d369 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1408,6 +1408,10 @@ remote.<name>.tagopt::
 	Setting this value to \--no-tags disables automatic tag following when
 	fetching from remote <name>
 
+remote.<name>.vcs::
+	Setting this to a value <vcs> will cause git to interact with
+	the remote with the git-remote-<vcs> helper.
+
 remotes.<group>::
 	The list of remotes which are fetched by "git remote update
 	<group>".  See linkgit:git-remote[1].
diff --git a/remote.c b/remote.c
index 15c9cec..09bb79c 100644
--- a/remote.c
+++ b/remote.c
@@ -54,7 +54,7 @@ static char buffer[BUF_SIZE];
 
 static int valid_remote(const struct remote *remote)
 {
-	return !!remote->url;
+	return (!!remote->url) || (!!remote->foreign_vcs);
 }
 
 static const char *alias_url(const char *url, struct rewrites *r)
@@ -444,6 +444,8 @@ static int handle_config(const char *key, const char *value, void *cb)
 	} else if (!strcmp(subkey, ".proxy")) {
 		return git_config_string((const char **)&remote->http_proxy,
 					 key, value);
+	} else if (!strcmp(subkey, ".vcs")) {
+		return git_config_string(&remote->foreign_vcs, key, value);
 	}
 	return 0;
 }
diff --git a/remote.h b/remote.h
index 5db8420..ac0ce2f 100644
--- a/remote.h
+++ b/remote.h
@@ -11,6 +11,8 @@ struct remote {
 	const char *name;
 	int origin;
 
+	const char *foreign_vcs;
+
 	const char **url;
 	int url_nr;
 	int url_alloc;
diff --git a/transport.c b/transport.c
index 5ae8db6..13bab4e 100644
--- a/transport.c
+++ b/transport.c
@@ -818,6 +818,11 @@ struct transport *transport_get(struct remote *remote, const char *url)
 		url = remote->url[0];
 	ret->url = url;
 
+	if (remote && remote->foreign_vcs) {
+		transport_helper_init(ret, remote->foreign_vcs);
+		return ret;
+	}
+
 	if (!prefixcmp(url, "rsync:")) {
 		ret->get_refs_list = get_refs_via_rsync;
 		ret->fetch = fetch_objs_via_rsync;
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 02/19] Allow fetch to modify refs
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

This allows the transport to use the null sha1 for a ref reported to
be present in the remote repository to indicate that a ref exists but
its actual value is presently unknown and will be set if the objects
are fetched.

Also adds documentation to the API to specify exactly what the methods
should do and how they should interpret arguments.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	Signed off by me because I rebased it on master and fixed the
	conflicts with nico's patch.

 builtin-clone.c    |    5 +++--
 transport-helper.c |    4 ++--
 transport.c        |   13 +++++++------
 transport.h        |   41 +++++++++++++++++++++++++++++++++++++++--
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/builtin-clone.c b/builtin-clone.c
index 5762a6f..0042bee 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -526,8 +526,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 		refs = transport_get_remote_refs(transport);
 		if (refs) {
-			mapped_refs = wanted_peer_refs(refs, refspec);
-			transport_fetch_refs(transport, mapped_refs);
+			struct ref *ref_cpy = wanted_peer_refs(refs, refspec);
+			mapped_refs = ref_cpy;
+			transport_fetch_refs(transport, ref_cpy);
 		}
 	}
 
diff --git a/transport-helper.c b/transport-helper.c
index f57e84c..a0e4219 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -68,7 +68,7 @@ static int disconnect_helper(struct transport *transport)
 }
 
 static int fetch_with_fetch(struct transport *transport,
-			    int nr_heads, const struct ref **to_fetch)
+			    int nr_heads, struct ref **to_fetch)
 {
 	struct child_process *helper = get_helper(transport);
 	FILE *file = xfdopen(helper->out, "r");
@@ -92,7 +92,7 @@ static int fetch_with_fetch(struct transport *transport,
 }
 
 static int fetch(struct transport *transport,
-		 int nr_heads, const struct ref **to_fetch)
+		 int nr_heads, struct ref **to_fetch)
 {
 	struct helper_data *data = transport->data;
 	int i, count;
diff --git a/transport.c b/transport.c
index 644a30a..3822fc7 100644
--- a/transport.c
+++ b/transport.c
@@ -204,7 +204,7 @@ static struct ref *get_refs_via_rsync(struct transport *transport, int for_push)
 }
 
 static int fetch_objs_via_rsync(struct transport *transport,
-				int nr_objs, const struct ref **to_fetch)
+				int nr_objs, struct ref **to_fetch)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct child_process rsync;
@@ -408,7 +408,7 @@ static struct ref *get_refs_from_bundle(struct transport *transport, int for_pus
 }
 
 static int fetch_refs_from_bundle(struct transport *transport,
-			       int nr_heads, const struct ref **to_fetch)
+			       int nr_heads, struct ref **to_fetch)
 {
 	struct bundle_transport_data *data = transport->data;
 	return unbundle(&data->header, data->fd);
@@ -486,7 +486,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 }
 
 static int fetch_refs_via_pack(struct transport *transport,
-			       int nr_heads, const struct ref **to_fetch)
+			       int nr_heads, struct ref **to_fetch)
 {
 	struct git_transport_data *data = transport->data;
 	char **heads = xmalloc(nr_heads * sizeof(*heads));
@@ -923,16 +923,17 @@ const struct ref *transport_get_remote_refs(struct transport *transport)
 	return transport->remote_refs;
 }
 
-int transport_fetch_refs(struct transport *transport, const struct ref *refs)
+int transport_fetch_refs(struct transport *transport, struct ref *refs)
 {
 	int rc;
 	int nr_heads = 0, nr_alloc = 0, nr_refs = 0;
-	const struct ref **heads = NULL;
-	const struct ref *rm;
+	struct ref **heads = NULL;
+	struct ref *rm;
 
 	for (rm = refs; rm; rm = rm->next) {
 		nr_refs++;
 		if (rm->peer_ref &&
+		    !is_null_sha1(rm->old_sha1) &&
 		    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
 			continue;
 		ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
diff --git a/transport.h b/transport.h
index c14da6f..503db11 100644
--- a/transport.h
+++ b/transport.h
@@ -18,11 +18,48 @@ struct transport {
 	int (*set_option)(struct transport *connection, const char *name,
 			  const char *value);
 
+	/**
+	 * Returns a list of the remote side's refs. In order to allow
+	 * the transport to try to share connections, for_push is a
+	 * hint as to whether the ultimate operation is a push or a fetch.
+	 *
+	 * If the transport is able to determine the remote hash for
+	 * the ref without a huge amount of effort, it should store it
+	 * in the ref's old_sha1 field; otherwise it should be all 0.
+	 **/
 	struct ref *(*get_refs_list)(struct transport *transport, int for_push);
-	int (*fetch)(struct transport *transport, int refs_nr, const struct ref **refs);
+
+	/**
+	 * Fetch the objects for the given refs. Note that this gets
+	 * an array, and should ignore the list structure.
+	 *
+	 * If the transport did not get hashes for refs in
+	 * get_refs_list(), it should set the old_sha1 fields in the
+	 * provided refs now.
+	 **/
+	int (*fetch)(struct transport *transport, int refs_nr, struct ref **refs);
+
+	/**
+	 * Push the objects and refs. Send the necessary objects, and
+	 * then, for any refs where peer_ref is set and
+	 * peer_ref->new_sha1 is different from old_sha1, tell the
+	 * remote side to update each ref in the list from old_sha1 to
+	 * peer_ref->new_sha1.
+	 *
+	 * Where possible, set the status for each ref appropriately.
+	 *
+	 * The transport must modify new_sha1 in the ref to the new
+	 * value if the remote accepted the change. Note that this
+	 * could be a different value from peer_ref->new_sha1 if the
+	 * process involved generating new commits.
+	 **/
 	int (*push_refs)(struct transport *transport, struct ref *refs, int flags);
 	int (*push)(struct transport *connection, int refspec_nr, const char **refspec, int flags);
 
+	/** get_refs_list(), fetch(), and push_refs() can keep
+	 * resources (such as a connection) reserved for futher
+	 * use. disconnect() releases these resources.
+	 **/
 	int (*disconnect)(struct transport *connection);
 	char *pack_lockfile;
 	signed verbose : 2;
@@ -74,7 +111,7 @@ int transport_push(struct transport *connection,
 
 const struct ref *transport_get_remote_refs(struct transport *transport);
 
-int transport_fetch_refs(struct transport *transport, const struct ref *refs);
+int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);
 char *transport_anonymize_url(const char *url);
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 01/19] Use a function to determine whether a remote is valid
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland
  Cc: Daniel Barkalow, Sverre Rabbelier
In-Reply-To: <1256839287-19016-1-git-send-email-srabbelier@gmail.com>

From: Daniel Barkalow <barkalow@iabervon.org>

Currently, it only checks url, but it will allow other things in the future.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
---

	Signed off by me because I fixed the conflict with alias_url.

 remote.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/remote.c b/remote.c
index 73d33f2..15c9cec 100644
--- a/remote.c
+++ b/remote.c
@@ -52,6 +52,11 @@ static struct rewrites rewrites_push;
 #define BUF_SIZE (2048)
 static char buffer[BUF_SIZE];
 
+static int valid_remote(const struct remote *remote)
+{
+	return !!remote->url;
+}
+
 static const char *alias_url(const char *url, struct rewrites *r)
 {
 	int i, j;
@@ -688,14 +693,14 @@ struct remote *remote_get(const char *name)
 
 	ret = make_remote(name, 0);
 	if (valid_remote_nick(name)) {
-		if (!ret->url)
+		if (!valid_remote(ret))
 			read_remotes_file(ret);
-		if (!ret->url)
+		if (!valid_remote(ret))
 			read_branches_file(ret);
 	}
-	if (name_given && !ret->url)
+	if (name_given && !valid_remote(ret))
 		add_url_alias(ret, name);
-	if (!ret->url)
+	if (!valid_remote(ret))
 		return NULL;
 	ret->fetch = parse_fetch_refspec(ret->fetch_refspec_nr, ret->fetch_refspec);
 	ret->push = parse_push_refspec(ret->push_refspec_nr, ret->push_refspec);
-- 
1.6.5.2.291.gf76a3

^ permalink raw reply related

* [PATCH 0/19] Reroll of the remote-vcs-helper series
From: Sverre Rabbelier @ 2009-10-29 18:01 UTC (permalink / raw)
  To: Git List, Johannes Schindelin, Daniel Barkalow, Johan Herland

Reroll of the entire series that is currently in pu, rebased on
current master.

I did not squash the cvs makefile fixes since my Makefile foo is not
that great.

Daniel, please have a close look at the UNSTABLE patch, as I am not
sure at all that I rebased it properly.

I did not have time to work on my git-remote-hg patches yet, but I
might send those two out later today before I get on my flight back
home. If not it'll be tomorrow somewhere :).

Daniel Barkalow (7):
      Use a function to determine whether a remote is valid
      Allow fetch to modify refs
      Allow programs to not depend on remotes having urls
      Add a config option for remotes to specify a foreign vcs
      Add support for "import" helper command
      Allow helpers to report in "list" command that the ref is unchanged
      Fix memory leak in helper method for disconnect

Johan Herland (8):
      Allow helpers to request marks for fast-import
      Basic build infrastructure for Python scripts
      1/2: Add Python support library for CVS remote helper
      2/2: Add Python support library for CVS remote helper
      git-remote-cvs: Remote helper program for CVS repositories
      Add simple selftests of git-remote-cvs functionality
      Fix the Makefile-generated path to the git_remote_cvs package in git-remote-cvs
      More fixes to the git-remote-cvs installation procedure

Johannes Schindelin (1):
      Finally make remote helper support useful

Sverre Rabbelier (3):
      Factor ref updating out of fetch_with_import
      Refactor git_remote_cvs to a more generic git_remote_helpers
      .gitignore: add git-remote-cvs

 .gitignore                              |    1 +
 Documentation/config.txt                |    4 +
 Documentation/git-remote-cvs.txt        |   85 +++
 Documentation/git-remote-helpers.txt    |   22 +-
 Makefile                                |   53 ++
 builtin-clone.c                         |   12 +-
 builtin-fetch.c                         |   10 +-
 builtin-ls-remote.c                     |    4 +-
 builtin-push.c                          |   68 ++-
 configure.ac                            |    3 +
 git-remote-cvs.py                       |  683 +++++++++++++++++++++
 git_remote_helpers/.gitignore           |    2 +
 git_remote_helpers/Makefile             |   35 ++
 git_remote_helpers/__init__.py          |   27 +
 git_remote_helpers/cvs/changeset.py     |  126 ++++
 git_remote_helpers/cvs/commit_states.py |   62 ++
 git_remote_helpers/cvs/cvs.py           |  998 +++++++++++++++++++++++++++++++
 git_remote_helpers/cvs/revision_map.py  |  418 +++++++++++++
 git_remote_helpers/cvs/symbol_cache.py  |  313 ++++++++++
 git_remote_helpers/git/git.py           |  680 +++++++++++++++++++++
 git_remote_helpers/setup.py             |   17 +
 git_remote_helpers/util.py              |  194 ++++++
 remote.c                                |   15 +-
 remote.h                                |    2 +
 t/t9800-remote-cvs-basic.sh             |  524 ++++++++++++++++
 t/t9801-remote-cvs-fetch.sh             |  291 +++++++++
 t/test-lib.sh                           |   10 +
 transport-helper.c                      |  117 ++++-
 transport.c                             |   37 +-
 transport.h                             |   45 ++-
 30 files changed, 4810 insertions(+), 48 deletions(-)

^ permalink raw reply

* [PATCH] Explicitly truncate bswap operand to uint32_t
From: Benjamin Kramer @ 2009-10-29 17:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

There are some places in git where a long is passed to htonl/ntohl. llvm
doesn't support matching operands of different bitwidths intentionally.
This patch fixes the build with llvm-gcc (and clang) on x86_64.

Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
---
See also:
* http://llvm.org/bugs/show_bug.cgi?id=3373
* http://lkml.org/lkml/2009/1/23/261

 compat/bswap.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 5cc4acb..52fc7b6 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -24,7 +24,7 @@ static inline uint32_t default_swab32(uint32_t val)
 	if (__builtin_constant_p(x)) { \
 		__res = default_swab32(x); \
 	} else { \
-		__asm__("bswap %0" : "=r" (__res) : "0" (x)); \
+		__asm__("bswap %0" : "=r" (__res) : "0" ((uint32_t)(x))); \
 	} \
 	__res; })
 
-- 
1.6.5.2.102.g442a9

^ permalink raw reply related

* Re: [PATCH 7/7] Factor ref updating out of fetch_with_import
From: Sverre Rabbelier @ 2009-10-29 17:39 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Shawn O. Pearce, Git List, Johannes Schindelin, Johan Herland
In-Reply-To: <alpine.LNX.2.00.0910291311570.14365@iabervon.org>

Heya,

On Thu, Oct 29, 2009 at 10:22, Daniel Barkalow <barkalow@iabervon.org> wrote:
> (a) if HEAD is determined to be a symref, and we care about HEAD, we care
> about whatever HEAD is a symref to; wanted_peer_refs() shouldn't be
> filtering such things out.

It seems that wanted_peer_refs filters out HEAD no matter what though?

> (b) we don't want to make a whole bunch of copies of the ref list to avoid
> updating the mutable ref list that we will look for updated values in; the
> merge of my series with Nico's should replace my copy_ref_list with his
> wanted_peer_refs, not include the copy afterwards. Correcting this should
> lead to the value that matters in the list that will be used having been
> updated directly by transport_fetch_refs().

This I can and will fix. Patch-bomb incoming.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply

* [PATCH] Update packfile transfer protocol documentation
From: Scott Chacon @ 2009-10-29 17:35 UTC (permalink / raw)
  To: git list; +Cc: Shawn O. Pearce

The technical documentation for the packfile protocol is both sparse and
incorrect.  This documents the fetch-pack/upload-pack and send-pack/
receive-pack protocols much more fully.

Add documentation from Shawn's upcoming http-protocol docs that is shared
by the packfile protocol. protocol-common.txt describes ABNF notation
amendments, refname rules and the packet line format.

Add documentation on the various capabilities supported by the
upload-pack and receive-pack protocols. protocol-capabilities.txt describes
multi-ack, thin-pack, side-band[-64k], ofs-delta, shallow, no-progress
and include-tag.

Signed-Off-By: Scott Chacon <schacon@gmail.com>
---

Some of this documentation is in RFC style and some of it is really
not - I apologize for that.  However, I think it's a good starting
point - certainly better than the docs in there now.  Some of this was
taken from the gitserver-rfc stuff I did several months ago, and I've
tried to include updates I got from Jakub and others on issues with
that document.  The protocol-common.txt is taken from Shawns
http-protocol.txt that was in common with packfile protocol.  Shawn,
is that the best way to share that info?

 Documentation/technical/pack-protocol.txt         |  502 +++++++++++++++++++--
 Documentation/technical/protocol-capabilities.txt |  146 ++++++
 Documentation/technical/protocol-common.txt       |   97 ++++
 3 files changed, 704 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/technical/protocol-capabilities.txt
 create mode 100644 Documentation/technical/protocol-common.txt

diff --git a/Documentation/technical/pack-protocol.txt
b/Documentation/technical/pack-protocol.txt
index 9cd48b4..9222a10 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -1,41 +1,461 @@
-Pack transfer protocols
-=======================
-
-There are two Pack push-pull protocols.
-
-upload-pack (S) | fetch/clone-pack (C) protocol:
-
-	# Tell the puller what commits we have and what their names are
-	S: SHA1 name
-	S: ...
-	S: SHA1 name
-	S: # flush -- it's your turn
-	# Tell the pusher what commits we want, and what we have
-	C: want name
-	C: ..
-	C: want name
-	C: have SHA1
-	C: have SHA1
-	C: ...
-	C: # flush -- occasionally ask "had enough?"
-	S: NAK
-	C: have SHA1
-	C: ...
-	C: have SHA1
-	S: ACK
-	C: done
-	S: XXXXXXX -- packfile contents.
-
-send-pack | receive-pack protocol.
-
-	# Tell the pusher what commits we have and what their names are
-	C: SHA1 name
-	C: ...
-	C: SHA1 name
-	C: # flush -- it's your turn
-	# Tell the puller what the pusher has
-	S: old-SHA1 new-SHA1 name
-	S: old-SHA1 new-SHA1 name
-	S: ...
-	S: # flush -- done with the list
-	S: XXXXXXX --- packfile contents.
+Packfile transfer protocols
+===========================
+
+Git supports transferring data in packfiles over the ssh://, git:// and
+file:// transports.  There exist two sets of protocols, one for pushing
+data from a client to a server and another for fetching data from a
+server to a client.  All three transports (ssh, git, file) use the same
+protocol to transfer data.
+
+The processes invoked in the canonical Git implementation are 'upload-pack'
+on the server side and 'fetch-pack' on the client side for fetching data;
+then 'receive-pack' on the server and 'send-pack' on the client for pushing
+data.  The protocol functions to have a server tell a client what is
+currently on the server, then for the two to negotiate the smallest amount
+of data to send in order to fully update one or the other.
+
+Transports
+----------
+There are three transports over which the packfile protocol is
+initiated.  The Git transport is a simple, unauthenticated server that
+simply takes the command (almost always 'upload-pack', though Git
+servers can be configured to be globally writable, in which 'receive-
+pack' initiation is also allowed) with which the client wishes to
+communicate and executes it and connects it to the requesting
+process.
+
+In the SSH transport, the client basically just runs the 'upload-pack'
+or 'receive-pack' process on the server over the SSH protocol and then
+communicates with that invoked process over the SSH connection.
+
+The file:// transport simply runs the 'upload-pack' or 'receive-pack'
+process locally.
+
+Git Protocol
+------------
+
+The Git protocol starts off by sending "git-receive-pack 'repo.git'"
+on the wire using the pkt-line format, followed by a null byte and a
+hostname paramater, terminated by a null byte.
+
+   0032git-upload-pack /project.git\0host=myserver.com\0
+
+--
+   git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+   request-command   = 'git-upload-pack' / 'git-receive-pack' /
+                       'git-upload-archive'   ; case sensitive
+   pathname          = *( %x01-ff ) ; exclude NUL
+   host-parameter    = 'host' "=" hostname [ ":" port ]
+--
+
+Currently only 'host' is allowed in the extra information.  It's
+for the git-daemon name based virtual hosting.  See --interpolated-
+path option to git daemon, with the %H/%CH format characters.
+
+Basically what the Git client is doing to connect to an 'upload-pack'
+process on the server side over the Git protocol is this:
+
+   $ echo -e -n \
+     "0039git-upload-pack /schacon/gitbook.git\0host=github.com\0" |
+     nc -v github.com 9418
+
+
+SSH Protocol
+------------
+
+Initiating the upload-pack or receive-pack processes over SSH is
+simply executing the binary on the server via SSH remote execution.
+It is basically equivalent to running this:
+
+   $ ssh git.example.com 'git-upload-pack /project.git'
+
+For a server to support Git pushing and pulling for a given user over
+SSH, that user needs to be able to execute one or both of those
+commands via the SSH shell that they are provided on login.  On some
+systems, that shell access is limited to only being able to run those
+two commands, or even just one of them.
+
+In an ssh:// format URI, it's absolute in the URI, so the '/' after
+the host name (or port number) is sent as an argument, which is then
+read by the remote git-upload-pack exactly as is, so it's effectively
+an absolute path in the remote filesystem.
+
+          git clone ssh://user@example.com/project.git
+                            |
+                            v
+       ssh user@example.com 'git-upload-pack /project.git'
+
+In a "user@host:path" format URI, its relative to the user's home
+directory, because the Git client will run:
+
+             git clone user@example.com:project.git
+                              |
+                              v
+        ssh user@example.com 'git-upload-pack project.git'
+
+
+
+Fetching Data From a Server
+===========================
+
+When one Git repository wants to get all the data that a second
+repository has, the first can 'fetch' from the second.  This
+operation determines what data the server has that the client does
+not then streams that data down to the client in packfile format.
+
+The server side binaries need to be executable as 'git-upload-pack'
+for fetching and 'git-receive-pack' for pushing over SSH, since the
+Git clients will connect to the server and attempt to run that command
+directly.  Over the Git protocol, one could write their own daemon
+that sees that the client is trying to invoke those commands and
+simply handle the requests.
+
+
+Reference Discovery
+-------------------
+
+When the client initially connects the server will immediately respond
+with a listing of each reference it has (all branches and tags) along
+with the commit SHA that each reference currently points to.
+
+   $ echo -e -n \
+     "0039git-upload-pack /schacon/gitbook.git\0host=github.com\0" |
+      nc -v github.com 9418
+   00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack \
+     thin-pack side-band side-band-64k ofs-delta shallow no-progress \
+     include-tag
+   00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
+   003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master
+   003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9
+   003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0
+   003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
+   0000
+
+Server SHOULD terminate each non-flush line
+using LF ("\n") terminator; client MUST NOT complain if there is no
+terminator.
+
+The returned response is a pkt-line stream describing each ref and
+its known value.  The stream SHOULD be sorted by name according to
+the C locale ordering.  The stream SHOULD include the default ref
+named 'HEAD' as the first ref.  The stream MUST include capability
+declarations behind a NUL on the first ref.
+
+HEAD is not included if its detached - that is, if HEAD is not a
+symbolic reference, a pointer to another branch, it is not included
+in the initial server response.  The client pattern matches the
+advertisements against the fetch refspec, which is "refs/heads/
+*:refs/remotes/origin/*" by default.  HEAD doesn't match the LHS, so
+it doesn't get wanted by the client.
+
+----
+	advertised-refs  =  (no-refs / list-of-refs)
+	                    flush-pkt
+
+	no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
+	                             NUL capability-list LF)
+
+	list-of-refs     =  first-ref *other-ref
+	first-ref        =  PKT-LINE(obj-id SP refname
+	                             NUL capability-list LF)
+
+	other-ref        =  PKT-LINE(other-tip / other-peeled)
+	other-tip        =  obj-id SP refname LF
+	other-peeled     =  obj-id SP refname "^{}" LF
+
+	capability-list  =  capability *(SP capability)
+    capability       =  1*(ALPHA / DIGIT / "-" / "_")
+----
+
+Server and client SHOULD use lowercase for SHA1, both MUST treat SHA1
+as case-insensitive.
+
+See protocol-capabilities.txt for a list of allowed server capabilities
+and descriptions.
+
+Packfile Negotiation
+--------------------
+After reference and capabilities discovery, the client can decide
+to terminate the connection (as happens with the ls-remote command)
+or it can enter the negotiation phase, where the client and server
+determine what the minimal packfile necessary for transport is.
+
+Once the client has the initial list of references that the server
+has, as well as the list of capabilities, it will begin telling the
+server what objects it wants and what objects it has, so the server
+can make a packfile that only has the objects that the client needs.
+The client will also send a list of the capabilities it supports out
+of what the server said it could do with the first 'want' statement.
+
+----
+	upload-request    =  want-list
+	                     have-list
+	                     compute-end
+
+	want-list         =  first-want
+	                     *additional-want
+						 flush-pkt
+
+	first-want        =  PKT-LINE("want" SP obj-id SP capability-list LF)
+	additional-want   =  PKT-LINE("want" SP obj-id LF)
+
+	have-list         =  *have-line
+	have-line         =  PKT-LINE("have" SP obj-id LF)
+	compute-end       =  flush-pkt / PKT-LINE("done")
+----
+
+Clients MUST send all the SHAs it wants from the reference
+discovery phase as 'want' lines. Clients MUST send at least one
+'want' command in the request body. Clients MUST NOT mention an
+obj-id in a 'want' command which did not appear in the response
+obtained through ref discovery.
+
+If client is requesting a shallow clone, it will now send a 'deepen'
+command with the depth it is requesting.
+
+Once all the "want"s (and optional 'deepen') are transferred,
+clients MUST send a flush. If the client has all the references on
+the server, client simply flushes and disconnects.
+
+TODO: shallow/unshallow response
+
+Now the client will send a list of the obj-ids it has.  In multi-ack
+mode, the canonical implementation will send up to 32 of these at a
+time, then will flush and wait for the server to respond.  If the
+client has no objects (as in the case of a non-referencing clone),
+it will skip this phase, just send it's 'done' and wait for the
+packfile.
+
+If the server reads 'have' lines, it then will respond by ACKing any
+of the obj-ids the client said it had that the server also has.  Or,
+once the server has found an acceptable common base commit and is
+ready to make a packfile, it will blindly ACK all 'have' obj-ids back
+to the client.  Then it will send a 'NACK' and then wait for
+another response from the client - either a 'done' or another list of
+'have' lines.
+
+In multi-ack mode, the server will respond with 'ACK obj-id continue'
+for common commits, otherwise it will just respond with 'ACK obj-id'
+lines.  In multi-ack-detailed mode, it will differentiate the ACKs
+where it is simply signaling that it is ready to send data with
+'ACK obj-id ready' lines, and signals the identified common commits
+with 'ACK obj-id common' lines.
+
+After the client has gotten 'ACK obj-id' responses for all it's
+references, or has sent more than 256 references and decides to give
+up, it will send a 'done' command, which signals to the server that it
+is ready to receive it's packfile data.
+
+Once the 'done' line is read from the client, the server will either
+send a final 'ACK obj-id' line if it is in multi-ack mode and has found
+a common base, or it will send a 'NAK' if it has still not found a common
+base; then the server will start sending it's packfile data.
+
+----
+	server-response   =  *acks
+	                     nack
+
+    acks              =  *ack
+	ack               =  PKT-LINE("ACK" SP obj-id continue LF)
+	nack              =  PKT-LINE("NACK" LF)
+----
+
+A simple clone may look like this (with no 'have' statements):
+
+----
+   C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d\0multi_ack \
+     side-band-64k ofs-delta\n
+   C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+   C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+   C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+   C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   C: 0000
+   C: 0009done\n
+
+   S: 0008NAK\n
+   S: [PACKFILE]
+----
+
+An incremental update (fetch) response might look like this:
+
+----
+   C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d\0multi_ack \
+     side-band-64k ofs-delta\n
+   C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+   C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+   C: 0000
+   C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+   C: [30 more have lines]
+   C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   C: 0000
+
+   S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n
+   S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n
+   S: 0008NAK\n
+
+   C: 0009done\n
+
+   S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   S: [PACKFILE]
+----
+
+
+Packfile Data
+-------------
+
+Now that the client and server have done some negotiation about what
+the minimal amount of data that can be sent to the client is, the server
+will construct and send the required data in packfile format.
+
+See pack-format.txt for what the packfile itself actually looks like.
+
+If 'side-band' or 'side-band-64k' capabilities have been specified by
+the client, the server will send the packfile data multiplexed - it
+will be sent in packets of either 1000 bytes or 64k, depending on which
+sideband type was specified, with each packet starting with the packet-line
+format of the amount of data that follows, followed by a single byte
+specifying the sideband the following data is coming in on.
+
+The sideband byte will be either a '1' or a '2'. Sideband '1' will contain
+packfile data, sideband '2' will be used for progress information that the
+client will generally print to stderr.
+
+If no 'side-band' capability was specified, the server will simply
+stream the entire packfile.
+
+
+Pushing Data To a Server
+========================
+
+Pushing data to a server will invoke the 'receive-pack' process on the
+server, which will allow the client to tell it which references it should
+update and then send all the data the server will need for those new
+references to be complete.  Once all the data is received and validated,
+the server will then update it's references to what the client specified.
+
+Authentication
+--------------
+
+The protocol itself contains no authentication mechanisms.  That is to be
+handled by the transport, such as SSH, before the 'receive-pack' process is
+invoked.  If 'receive-pack' is configured over the Git transport, those
+repositories will be writable by anyone who can access that port (9418) as
+that transport is unauthenticated.
+
+Reference Discovery
+-------------------
+
+The reference discovery phase is done nearly the same way as it is in the
+fetching protocol. Each reference obj-id and name on the server is sent
+in packet-line format to the client, followed by a flush packet.  The only
+real difference is that the capability listing is different - the only
+possible values are 'report-status', 'delete-refs' and 'ofs-delta', and
+instead of following a null byte, the capabilities follow a space.
+
+----
+	advertised-refs  =  (no-refs / list-of-refs)
+	                    flush-pkt
+
+	no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
+	                             NUL capability-list LF)
+
+	list-of-refs     =  first-ref *other-ref
+	first-ref        =  PKT-LINE(obj-id SP refname
+	                             SP capability-list LF)
+
+	other-ref        =  PKT-LINE(other-tip / other-peeled)
+	other-tip        =  obj-id SP refname LF
+	other-peeled     =  obj-id SP refname "^{}" LF
+
+	capability-list  =  capability *(SP capability)
+    capability       =  1*(ALPHA / DIGIT / "-" / "_")
+----
+
+Reference Update Request and Packfile Transfer
+----------------------------------------------
+
+Once the client knows what references the server is at, it can send a
+list of reference update requests.  For each reference on the server
+that it wants to update, it sends a line listing the obj-id currently on
+the server, the obj-id the client would like to update it to and the name
+of the reference.
+
+This list is followed by a flush packet and then the packfile that should
+contain all the objects that the server will need to complete the new
+references.
+
+----
+	update-request    =  command-list pack-file
+
+	command-list      =  PKT-LINE(command NUL capability-list LF)
+	                     *PKT-LINE(command LF)
+	                     flush-pkt
+
+	command           =  create / delete / update
+	create            =  zero-id SP new-id  SP name
+	delete            =  old-id  SP zero-id SP name
+	update            =  old-id  SP new-id  SP name
+
+	old-id            =  obj-id
+	new-id            =  obj-id
+
+	pack-file         = "PACK" 24*(OCTET)
+----
+
+The server will receive the packfile, unpack it, then validate each
+reference that is being updated that it hasn't changed while the request
+was being processed (the obj-id is still the same as the old-id), and
+it will run any update hooks to make sure that the update is acceptable.
+If all of that is fine, the server will then update the references.
+
+Report Status
+-------------
+
+If the 'report-status' capability is sent by the client, then the server
+will send a short report of what happened in that update.  It will first
+list the status of the packfile unpacking as either 'unpack ok' or
+'unpack [error]'.  Then it will list the status for each of the references
+that it tried to update.  Each line be either 'ok [refname]' if the
+update was successful, or 'ng [refname] [error]' if the update was not.
+
+----
+	report-status     = unpack-status
+	                    1*(command-status)
+	                    flush-pkt
+
+	unpack-status     = PKT-LINE("unpack" SP unpack-result LF)
+	unpack-result     = "ok" / error-msg
+
+	command-status    = command-ok / command-fail
+	command-ok        = PKT-LINE("ok" SP refname LF)
+	command-fail      = PKT-LINE("ng" SP refname SP error-msg LF)
+
+	error-msg         = 1*(OCTECT) ; where not "ok"
+----
+
+Updates can be unsuccessful for a number of reasons.  The reference can have
+changed since the reference discovery phase was originally sent, meaning
+someone pushed in the meantime.  The reference being pushed could be a
+non-fast-forward reference and the update hooks or configuration could be
+set to not allow that, etc.  Also, some references can be updated while others
+can be rejected.
+
+An example client/server communication might look like this:
+
+----
+   S: 007c74730d410fcb6603ace96f1dc55ea6196122532d HEAD report-status
delete-refs ofs-delta\n
+   S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
+   S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
+   S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
+   S: 0000
+
+   C: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe
74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
+   C: 003e74730d410fcb6603ace96f1dc55ea6196122532d
5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
+   C: 0000
+   C: [PACKDATA]
+
+   S: 000aunpack ok\n
+   S: 000aok refs/heads/debug\n
+   S: 000ang refs/heads/master non-fast-forward\n
+----
+
diff --git a/Documentation/technical/protocol-capabilities.txt
b/Documentation/technical/protocol-capabilities.txt
new file mode 100644
index 0000000..dad7d16
--- /dev/null
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -0,0 +1,146 @@
+Git Protocol Capabilities
+=========================
+
+On the very first line of the initial server response, the first
+reference is followed by a null byte and then a list of space
+delimited server capabilities.  These allow the server to declare
+what it can and cannot do to the client.
+
+Client sends space separated list of capabilities it wants.  It
+SHOULD send a subset of server capabilities, i.e do not send
+capabilities served does not advertise.  The client SHOULD NOT ask
+for capabilities the server did not say it supports.
+
+Server MUST ignore capabilities it does not understand.  Server MUST
+NOT ignore capabilities that client requested and server advertised.
+
+multi-ack
+---------
+
+The 'multi-ack' capability allows the server to return "ACK $SHA1
+continue" as soon as it finds a commit that it can use as a common
+base, between the client's wants and the client's have set.
+
+By sending this early, the server can potentially head off the client
+from walking any further down that particular branch of the client's
+repository history.  The client may still need to walk down other
+branches, sending have lines for those, until the server has a
+complete cut across the DAG, or the client has said "done".
+
+Without multi_ack, a client sends have lines in --date-order until
+the server has found a common base.  That means the client will send
+have lines that are already known by the server to be common, because
+they overlap in time with another branch that the server hasn't found
+a common base on yet.
+
+The client has things in caps that the server doesn't; server has
+things in lower case.
+
+                         +---- u ---------------------- x
+                        /             +----- y
+                       /             /
+                      a -- b -- c -- d -- E -- F
+                       \
+                        +--- Q -- R -- S
+
+If the client wants x,y and starts out by saying have F,S, the server
+doesn't know what F,S is.  Eventually the client says "have d" and
+the server sends "ACK d continue" to let the client know to stop
+walking down that line (so don't send c-b-a), but its not done yet,
+it needs a base for X. The client keeps going with S-R-Q, until a
+gets reached, at which point the server has a clear base and it all
+ends.
+
+Without multi_ack the client would have sent that c-b-a chain anyway,
+interleaved with S-R-Q.
+
+thin-pack
+---------
+
+Server can send thin packs, i.e. packs which do not contain base
+elements, if those base elements are available on clients side.
+Client requests thin-pack capability when it understands how to "thicken"
+them adding required delta bases making them independent.
+
+Client MUST NOT request 'thin-pack' capability if it
+cannot turn thin packs into proper independent packs.
+
+
+side-band, side-band-64k
+------------------------
+
+This means that server can send, and client understand multiplexed
+(muxed) progress reports and error info interleaved with the packfile
+itself.
+
+These two options are mutually exclusive.  A client should ask for
+only one of them, and a modern client always favors side-band-64k.
+
+The 'side-band' capability allows up to 1000 bytes per packet.  But
+the packet length field is 4 bytes, in hex, so 16 bits worth of
+information space.  Limiting it to only 1000 bytes for a large 800
+MiB binary pack file on initial clone is really quite poor usage of
+the data stream space.
+
+The "side-band-64k" capability came about as a way for newer clients
+that can handle much larger packets to request packets that are
+actually crammed nearly full (65520 bytes), while maintaining
+backward compatibility for the older clients.
+
+The client MUST send only maximum of one of "side-band" and "side-
+band-64k".  Server MUST favor side-band-64k if client requests both.
+
+ofs-delta
+---------
+
+Server can send, and client understand PACKv2 with delta refering to
+its base by position in pack rather than by SHA-1.  Its that they can
+send/read OBJ_OFS_DELTA, aka type 6 in a pack file.
+
+shallow
+-------
+
+Server can send shallow clone (git clone --depth ...).
+
+no-progress
+-----------
+
+The client was started with "git clone -q" or something, and doesn't
+want that side brand 2.  Basically the client just says "I do not
+wish to receive stream 2 on sideband, so do not send it to me, and if
+you did, I will drop it on the floor anyway".  However, the sideband
+channel 3 is still used for error responses.
+
+include-tag
+-----------
+
+The 'include-tag' capability is about sending tags if we are sending
+objects they point to.  If we pack an object to the client, and a tag
+points exactly at that object, we pack the tag too.  In general this
+allows a client to get all new tags when it fetches a branch, in a
+single network connection.
+
+Clients MAY always send include-tag, hardcoding it into a request.
+The decision for a client to request include-tag only has to do with
+the client's desires for tag data, whether or not a server had
+advertised objects in the refs/tags/* namespace.
+
+Clients SHOULD NOT send include-tag if remote.name.tagopt was set to
+--no-tags, as the client doesn't want tag data.
+
+Servers MUST accept include-tag without error or warning, even if the
+server does not understand or support the option.
+
+Servers SHOULD pack the tags if their referrant is packed and the
+client has requested include-tag.
+
+Clients MUST be prepared for the case where a server has ignored
+include-tag and has not actually sent tags in the pack.  In such
+cases the client SHOULD issue a subsequent fetch to acquire the tags
+that include-tag would have otherwise given the client.
+
+The server SHOULD send include-tag, if it supports it, irregardless
+of whether or not there are tags available.
+
+Servers SHOULD support all capabilities defined in this document.
+
diff --git a/Documentation/technical/protocol-common.txt
b/Documentation/technical/protocol-common.txt
new file mode 100644
index 0000000..d28ad98
--- /dev/null
+++ b/Documentation/technical/protocol-common.txt
@@ -0,0 +1,97 @@
+Documentation Common to Pack and Http Protocols
+===============================================
+
+ABNF Notation
+-------------
+
+ABNF notation as described by RFC 5234 is used within the protocol documents,
+except the following replacement core rules are used:
+----
+	HEXDIG    =  DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
+----
+
+We also define the following common rules:
+----
+	NUL       =  %x00
+	zero-id   =  40*"0"
+	obj-id    =  40*(HEXDIGIT)
+
+	refname  =  "HEAD"
+	refname /=  "refs/" <see discussion below>
+----
+
+A refname is a hierarichal octet string beginning with "refs/" and
+not violating the 'git-check-ref-format' command's validation rules.
+More generally, they:
+
+. They can include slash `/` for hierarchical (directory)
+  grouping, but no slash-separated component can begin with a
+  dot `.`.
+
+. They must contain at least one `/`. This enforces the presence of a
+  category like `heads/`, `tags/` etc. but the actual names are not
+  restricted.
+
+. They cannot have two consecutive dots `..` anywhere.
+
+. They cannot have ASCII control characters (i.e. bytes whose
+  values are lower than \040, or \177 `DEL`), space, tilde `~`,
+  caret `{caret}`, colon `:`, question-mark `?`, asterisk `*`,
+  or open bracket `[` anywhere.
+
+. They cannot end with a slash `/` nor a dot `.`.
+
+. They cannot end with the sequence `.lock`.
+
+. They cannot contain a sequence `@{`.
+
+. They cannot contain a `\\`.
+
+
+pkt-line Format
+---------------
+
+Much (but not all) of the payload is described around pkt-lines.
+
+A pkt-line is a variable length binary string.  The first four bytes
+of the line, the pkt-len, indicates the total length of the line,
+in hexadecimal.  The pkt-len includes the 4 bytes used to contain
+the length's hexadecimal representation.
+
+A pkt-line MAY contain binary data, so implementors MUST ensure
+pkt-line parsing/formatting routines are 8-bit clean.
+
+A non-binary line SHOULD BE terminated by an LF, which if present
+MUST be included in the total length.
+
+The maximum length of a pkt-line's data component is 65520 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65524
+(65520 bytes of payload + 4 bytes of length data).
+
+Implementations SHOULD NOT send an empty pkt-line ("0004").
+
+A pkt-line with a length field of 0 ("0000"), called a flush-pkt,
+is a special case and MUST be handled differently than an empty
+pkt-line ("0004").
+
+----
+	pkt-line     =  data-pkt / flush-pkt
+
+	data-pkt     =  pkt-len pkt-payload
+    pkt-len      =  4*(HEXDIG)
+	pkt-payload  =  (pkt-len - 4)*(OCTET)
+
+	flush-pkt    = "0000"
+----
+
+Examples (as C-style strings):
+
+----
+  pkt-line          actual value
+  ---------------------------------
+  "0006a\n"         "a\n"
+  "0005a"           "a"
+  "000bfoobar\n"    "foobar\n"
+  "0004"            ""
+----
+
-- 
1.6.5.2.75.gad2f8

^ permalink raw reply related

* Re: [PATCH 7/7] Factor ref updating out of fetch_with_import
From: Daniel Barkalow @ 2009-10-29 17:22 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Shawn O. Pearce, Git List, Johannes Schindelin, Johan Herland
In-Reply-To: <fabb9a1e0910290932u45c9c416m4d0ba0a8b2f5b01d@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2196 bytes --]

On Thu, 29 Oct 2009, Sverre Rabbelier wrote:

> Heya,
> 
> On Thu, Oct 29, 2009 at 08:56, Shawn O. Pearce <spearce@spearce.org> wrote:
> >> > I certainly have to wonder... if this is done in both fetch and
> >> > clone, why isn't it just part of fetch_refs?
> >>
> >> Because clone does not use fetch_refs, or am I missing something?
> >
> > Hmmph.  Weird.  Its been a while since I last looked at this code,
> > maybe I misunderstood it.
> 
> Unless you mean transport_fetch_refs? It can't be done in
> transport_fetch_refs because the foreign helper transport needs to
> update all refs, including those that wanted_peer_refs decided not to
> fetch. Otherwise the 'HEAD' ref will not be updated, and it is left at
> all zeros. It is not as obvious that that is a problem in this patch,
> but when Junio merges with Nico's [5bdc32d3e "make 'git clone' ask the
> remote only for objects it cares about"] the code looks like this:
> 
> 		if (refs) {
> 			struct ref *ref_cpy;
> 			mapped_refs = wanted_peer_refs(refs, refspec);
> 			ref_cpy = copy_ref_list(mapped_refs);
> 			transport_fetch_refs(transport, ref_cpy);
> 			if (transport->update_refs)
> 			{
> 				ref_cpy = copy_ref_list(refs);
> 				transport->update_refs(transport, ref_cpy);
> 				refs = ref_cpy;
> 				mapped_refs = wanted_peer_refs(refs, refspec);
> 			}
> 		}
> 
> Does it make more sense now? transport_fetch_refs gets only a limited
> view of the refs, so it cannot pass all the refs to
> transport_update_refs as needed.

I think there are a bunch of bugs that you're working around:

(a) if HEAD is determined to be a symref, and we care about HEAD, we care 
about whatever HEAD is a symref to; wanted_peer_refs() shouldn't be 
filtering such things out.

(b) we don't want to make a whole bunch of copies of the ref list to avoid 
updating the mutable ref list that we will look for updated values in; the 
merge of my series with Nico's should replace my copy_ref_list with his 
wanted_peer_refs, not include the copy afterwards. Correcting this should 
lead to the value that matters in the list that will be used having been 
updated directly by transport_fetch_refs().

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* [PATCH] Documentation/gitk.txt: order options alphabetically
From: jari.aalto @ 2009-10-29 16:59 UTC (permalink / raw)
  To: git; +Cc: Jari Aalto

From: Jari Aalto <jari.aalto@cante.net>

Signed-off-by: Jari Aalto <jari.aalto@cante.net>
---
 Documentation/gitk.txt |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/Documentation/gitk.txt b/Documentation/gitk.txt
index cf465cb..08ff926 100644
--- a/Documentation/gitk.txt
+++ b/Documentation/gitk.txt
@@ -26,23 +26,17 @@ the 'git-rev-list' command (see linkgit:git-rev-list[1]).
 This manual page describes only the most
 frequently used options.
 
--n <number>::
---max-count=<number>::
-
-	Limits the number of commits to show.
-
---since=<date>::
-
-	Show commits more recent than a specific date.
-
---until=<date>::
-
-	Show commits older than a specific date.
-
 --all::
 
 	Show all branches.
 
+--argscmd=<command>::
+	Command to be run each time gitk has to determine the list of
+	<revs> to show.  The command is expected to print on its standard
+	output a list of additional revs to be shown, one per line.
+	Use this instead of explicitly specifying <revs> if the set of
+	commits to show may vary between refreshes.
+
 --merge::
 
 	After an attempt to merge stops with conflicts, show the commits on
@@ -50,18 +44,24 @@ frequently used options.
 	that modify the conflicted files and do not exist on all the heads
 	being merged.
 
---argscmd=<command>::
-	Command to be run each time gitk has to determine the list of
-	<revs> to show.  The command is expected to print on its standard
-	output a list of additional revs to be shown, one per line.
-	Use this instead of explicitly specifying <revs> if the set of
-	commits to show may vary between refreshes.
+-n <number>::
+--max-count=<number>::
+
+	Limits the number of commits to show.
 
 --select-commit=<ref>::
 
 	Automatically select the specified commit after loading the graph.
 	Default behavior is equivalent to specifying '--select-commit=HEAD'.
 
+--since=<date>::
+
+	Show commits more recent than a specific date.
+
+--until=<date>::
+
+	Show commits older than a specific date.
+
 <revs>::
 
 	Limit the revisions to show. This can be either a single revision
-- 
1.6.5

^ permalink raw reply related

* Re: date change of commit?
From: Thomas Rast @ 2009-10-29 16:43 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: Alex K, Miklos Vajna, Matthieu Moy, git
In-Reply-To: <4AE9AAE7.2080404@dawes.za.net>

Rogan Dawes wrote:
> Alex K wrote:
> > And how do you actually set those variables? Sorry to ask such a
> > trivial question but it's been an hour that i'm going through the doc
> > for such a simple feature. I thought those were environment variables
> 
> As you suspected, they are environment variables.
> 
> i.e. :
> 
> $ GIT_AUTHOR_DATE="1112911993 -0700" git commit x

Since this was in the context of git-filter-branch, I should point out
that you'll have to use the variables with --env-filter and (like the
manpage says) make sure you export them.

So to change the date of a single commit with SHA1 <sha1>, you could
say

git filter-branch --env-filter '
  if [ $GIT_COMMIT = <sha1> ]; then
    export GIT_AUTHOR_DATE="1112911993 -0700"
    export GIT_COMMITTER_DATE="1112911993 -0700"
  fi
' <other_arguments>

Alex K wrote:
> for such a simple feature

I'm not sure if you're referring to setting the variables or actually
rewriting the commits here, but anyway:

The reason why the latter is so hard is that rewriting "old" history
is a _huge_ hassle for people who already have the previous version of
that history.

That being said, _why_?  The dates are never a guarantee that
something happened earlier or later precisely because they can easily
be modified at commit time.  So nobody should take them as more than a
hint, and there's little use in faking hints.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply

* [PATCH 1/2] Provide a build time default-pager setting
From: Ben Walton @ 2009-10-29 16:42 UTC (permalink / raw)
  To: gitster, jrnieder, git; +Cc: Ben Walton
In-Reply-To: <1256834565-19443-1-git-send-email-bwalton@artsci.utoronto.ca>

From: Junio C Hamano <gitster@pobox.com>

On (old) solaris systems, /usr/bin/less (typically the first less
found) doesn't understand the default arguments (FXRS), which
forces users to alter their environment (PATH, GIT_PAGER, LESS,
etc) or have a local or global gitconfig before paging works as
expected.

On Debian systems, by policy packages must fall back to the
'pager' command, so that changing the target of the
/usr/bin/pager symlink changes the default pager for all packages
at once.

Provide a DEFAULT_PAGER knob so packagers can set the fallback
pager to something appropriate during the build.

This puts the "less" default in the Makefile instead of pager.c, since
it is needed for git-svn and git-am, too.  This means that the
DEFAULT_PAGER preprocessor token _has_ to be defined on the command
line for git to build.  I was worried about this for a moment, but
GIT_VERSION already works this way without trouble.

Probably the DEFAULT_PAGER setting should be added to something
like TRACK_CFLAGS as well.  Actually, some other settings that
can change without forcing files to be rebuilt (e.g. SHELL_PATH),
too.  This should be probably be addressed separately.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ben Walton <bwalton@artsci.utoronto.ca>
---
 Makefile     |    9 +++++++++
 git-am.sh    |    2 +-
 git-svn.perl |    5 +++--
 pager.c      |    2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 42b7d60..1d26800 100644
--- a/Makefile
+++ b/Makefile
@@ -200,6 +200,9 @@ all::
 # memory allocators with the nedmalloc allocator written by Niall Douglas.
 #
 # Define NO_REGEX if you have no or inferior regex support in your C library.
+#
+# Define DEFAULT_PAGER to the path of a sensible pager (defaults to "less") if
+# you want to use something different.
 
 GIT-VERSION-FILE: .FORCE-GIT-VERSION-FILE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1294,6 +1297,10 @@ ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
 endif
+ifndef DEFAULT_PAGER
+	DEFAULT_PAGER = less
+endif
+BASIC_CFLAGS += -DDEFAULT_PAGER='"$(DEFAULT_PAGER)"'
 
 ifdef USE_NED_ALLOCATOR
        COMPAT_CFLAGS += -DUSE_NED_ALLOCATOR -DOVERRIDE_STRDUP -DNDEBUG -DREPLACE_SYSTEM_ALLOCATOR -Icompat/nedmalloc
@@ -1428,6 +1435,7 @@ $(patsubst %.sh,%,$(SCRIPT_SH)) : % : %.sh
 	    -e 's|@SHELL_PATH@|$(SHELL_PATH_SQ)|' \
 	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
 	    -e 's/@@NO_CURL@@/$(NO_CURL)/g' \
+	    -e 's|@@DEFAULT_PAGER@@|$(DEFAULT_PAGER)|g' \
 	    -e $(BROKEN_PATH_FIX) \
 	    $@.sh >$@+ && \
 	chmod +x $@+ && \
@@ -1451,6 +1459,7 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
 	    -e '}' \
 	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
 	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
+	    -e 's|@@DEFAULT_PAGER@@|$(DEFAULT_PAGER)|g' \
 	    $@.perl >$@+ && \
 	chmod +x $@+ && \
 	mv $@+ $@
diff --git a/git-am.sh b/git-am.sh
index c132f50..a194a4e 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -649,7 +649,7 @@ do
 		[eE]*) git_editor "$dotest/final-commit"
 		       action=again ;;
 		[vV]*) action=again
-		       LESS=-S ${PAGER:-less} "$dotest/patch" ;;
+		       LESS=-S ${PAGER:-@@DEFAULT_PAGER@@} "$dotest/patch" ;;
 		*)     action=again ;;
 		esac
 	    done
diff --git a/git-svn.perl b/git-svn.perl
index eb4b75a..a61ec55 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3,11 +3,12 @@
 # License: GPL v2 or later
 use warnings;
 use strict;
-use vars qw/	$AUTHOR $VERSION
+use vars qw/	$AUTHOR $VERSION $DEFAULT_PAGER
 		$sha1 $sha1_short $_revision $_repository
 		$_q $_authors $_authors_prog %users/;
 $AUTHOR = 'Eric Wong <normalperson@yhbt.net>';
 $VERSION = '@@GIT_VERSION@@';
+$DEFAULT_PAGER = '@@DEFAULT_PAGER@@';
 
 # From which subdir have we been invoked?
 my $cmd_dir_prefix = eval {
@@ -5031,7 +5032,7 @@ sub git_svn_log_cmd {
 sub config_pager {
 	$pager ||= $ENV{GIT_PAGER} || $ENV{PAGER};
 	if (!defined $pager) {
-		$pager = 'less';
+		$pager = $DEFAULT_PAGER;
 	} elsif (length $pager == 0 || $pager eq 'cat') {
 		$pager = undef;
 	}
diff --git a/pager.c b/pager.c
index 86facec..416a796 100644
--- a/pager.c
+++ b/pager.c
@@ -58,7 +58,7 @@ void setup_pager(void)
 	if (!pager)
 		pager = getenv("PAGER");
 	if (!pager)
-		pager = "less";
+		pager = DEFAULT_PAGER;
 	else if (!*pager || !strcmp(pager, "cat"))
 		return;
 
-- 
1.6.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox