Git development

Git development
 help / color / mirror / Atom feed

* gitview: Use git ls-remote to find the tag and branch details
From: Aneesh Kumar K.V @ 2006-02-22 16:07 UTC (permalink / raw)
  To: git, Junio C Hamano, aneesh.kumar

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: 0002-gitview-Use-git-ls-remote-to-find-the-tag-and-branch-details.txt --]
[-- Type: text/plain, Size: 2707 bytes --]


From: Junio C Hamano <junkio@cox.net>
This fix the below bug

Junio C Hamano <junkio@cox.net> writes:

>
> It does not work in my repository, since you do not seem to
> handle branch and tag names with slashes in them.  All of my
> topic branches live in directories with two-letter names
> (e.g. ak/gitview).

Also use ${GIT_DIR} directly  so that it works with below environment setup

GIT_DIR=/home/opensource/Test\ Output/git-devel/.git

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>

---

 contrib/gitview/gitview |   54 ++++++++++++-----------------------------------
 1 files changed, 14 insertions(+), 40 deletions(-)

ecd82bdd8399b84ada1f4fc0c720a88ae0735a94
diff --git a/contrib/gitview/gitview b/contrib/gitview/gitview
index 0e52c78..76a1b67 100755
--- a/contrib/gitview/gitview
+++ b/contrib/gitview/gitview
@@ -56,20 +56,6 @@ def show_date(epoch, tz):
 
 	return time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(secs))
 
-def get_sha1_from_tags(line):
-	fp = os.popen("git cat-file -t " + line)
-	entry = string.strip(fp.readline())
-	fp.close()
-	if (entry == "commit"):
-		return line
-	elif (entry == "tag"):
-		fp = os.popen("git cat-file tag "+ line)
-		entry = string.strip(fp.readline())
-		fp.close()
-		obj = re.split(" ", entry)
-		if (obj[0] == "object"):
-			return obj[1]
-	return None
 
 class CellRendererGraph(gtk.GenericCellRenderer):
 	"""Cell renderer for directed graph.
@@ -467,32 +453,20 @@ class GitView:
 		respective sha1 details """
 
 		self.bt_sha1 = { }
-		git_dir = os.getenv("GIT_DIR")
-		if (git_dir == None):
-			git_dir = ".git"
-
-		#FIXME the path seperator
-		ref_files = os.listdir(git_dir + "/refs/tags")
-		for file in ref_files:
-			fp = open(git_dir + "/refs/tags/"+file)
-			sha1 = get_sha1_from_tags(string.strip(fp.readline()))
-			try:
-				self.bt_sha1[sha1].append(file)
-			except KeyError:
-				self.bt_sha1[sha1] = [file]
-			fp.close()
-
-
-		#FIXME the path seperator
-		ref_files = os.listdir(git_dir + "/refs/heads")
-		for file in ref_files:
-			fp = open(git_dir + "/refs/heads/" + file)
-			sha1 = get_sha1_from_tags(string.strip(fp.readline()))
-			try:
-				self.bt_sha1[sha1].append(file)
-			except KeyError:
-				self.bt_sha1[sha1] = [file]
-			fp.close()
+		ls_remote = re.compile('^(.{40})\trefs/([^^]+)(?:\\^(..))?$');
+		fp = os.popen('git ls-remote "${GIT_DIR:-.git}"')
+		while 1:
+			line = string.strip(fp.readline())
+			if line == '':
+				break
+			m = ls_remote.match(line)
+			if not m:
+				continue
+			(sha1, name) = (m.group(1), m.group(2))
+			if not self.bt_sha1.has_key(sha1):
+				self.bt_sha1[sha1] = []
+			self.bt_sha1[sha1].append(name)
+		fp.close()
 
 
 	def construct(self):
-- 
1.2.0-dirty


^ permalink raw reply related

* gitview: Use monospace font to draw the branch and tag name
From: Aneesh Kumar K.V @ 2006-02-22 16:06 UTC (permalink / raw)
  To: git, Junio C Hamano, aneesh.kumar

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: 0001-gitview-Use-monospace-font-to-draw-the-branch-and-tag-name.txt --]
[-- Type: text/plain, Size: 1527 bytes --]


This patch address the below:
Use monospace font to draw branch and tag name
set the font size to 13.
Make the graph column resizable. This helps to accommodate large tag names

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>

---

 contrib/gitview/gitview |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

725c7d29cbe4efd0a7f7d9f218dc12e36f5920de
diff --git a/contrib/gitview/gitview b/contrib/gitview/gitview
index 5862fcc..0e52c78 100755
--- a/contrib/gitview/gitview
+++ b/contrib/gitview/gitview
@@ -174,9 +174,9 @@ class CellRendererGraph(gtk.GenericCellR
 		names_len = 0
 		if (len(names) != 0):
 			for item in names:
-				names_len += len(item)/3
+				names_len += len(item)
 
-		width = box_size * (cols + 1 + names_len )
+		width = box_size * (cols + 1 ) + names_len 
 		height = box_size
 
 		# FIXME I have no idea how to use cell_area properly
@@ -258,6 +258,8 @@ class CellRendererGraph(gtk.GenericCellR
 			for item in names:
 				name = name + item + " "
 
+			ctx.select_font_face("Monospace")
+			ctx.set_font_size(13)
 			ctx.text_path(name)
 
 		self.set_colour(ctx, colour, 0.0, 0.5)
@@ -537,8 +539,8 @@ class GitView:
 
 		cell = CellRendererGraph()
 		column = gtk.TreeViewColumn()
-		column.set_resizable(False)
-		column.pack_start(cell, expand=False)
+		column.set_resizable(True)
+		column.pack_start(cell, expand=True)
 		column.add_attribute(cell, "node", 1)
 		column.add_attribute(cell, "in-lines", 2)
 		column.add_attribute(cell, "out-lines", 3)
-- 
1.2.0-dirty


^ permalink raw reply related

* Re: [PATCH] relax delta selection filtering in pack-objects
From: Nicolas Pitre @ 2006-02-22 16:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vhd6rsvqd.fsf@assigned-by-dhcp.cox.net>

On Tue, 21 Feb 2006, Junio C Hamano wrote:

> Junio C Hamano <junkio@cox.net> writes:
> 
> > I haven't dug into the issue yet, but these four delta series
> > seem to break the testsuite.
> 
> I bisected.  It is the adler32 one -- since it makes the
> generated delta much smaller, it is understandable that it would
> interact with the break/rename heuristics.  It is not strictly
> breakage in that sense -- we just need to readjust the
> heuristics thresholds for those algorithms.

I had a quick look and that code rather looks like black magic to me atm.
I however found a memory leak in diffcore-rename.c:estimate_similarity():

        delta = diff_delta(src->data, src->size,
                           dst->data, dst->size,
                           &delta_size, delta_limit);
        if (!delta)
                /* If delta_limit is exceeded, we have too much differences */
                return 0;

        /* A delta that has a lot of literal additions would have
         * big delta_size no matter what else it does.
         */
        if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
                return 0;
                   \________ delta memory is leaked.


Nicolas

^ permalink raw reply

* Re: What's in git.git
From: Alex Riesen @ 2006-02-22 13:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xs3pt8c.fsf@assigned-by-dhcp.cox.net>

On 2/22/06, Junio C Hamano <junkio@cox.net> wrote:
>  - Perl 5.6 backward compatibility.
>
>    Alex Riesen volunteered to further work on porting the
>    "allegedly 5.6 compatible" constructs these patches use to
>    work on ActiveState (whose 5.8 does not grok them).  We'll
>    see what happens.

Well, I'll have to.
The alternative is Perforce which is too much pain to explain...

^ permalink raw reply

* What's in git.git
From: Junio C Hamano @ 2006-02-22 10:45 UTC (permalink / raw)
  To: git

Tonight's installment is a fairly huge update.  I am getting a
feeling that we should declare 1.3 real soon, once what's
cooking in the 'next' branch stabilizes.

* The 'master' branch has these since the last announcement.

 - Updates to gitview by Aneesh Kumar (contrib/)

   This is not shellquote safe, so please be careful if your
   GIT_DIR has shell metacharacters in it.

 - Various fixes and cleanups from Carl Worth, Jason Riedy and me.

 - Solaris 9+ portability bits by Paul Jakma.

 - Portability bits by Johannes Schindelin.

   "make -j" breakage was fixed.  Thanks.

 - gitk "fix find" update.  I failed to pull this one for some
   time.

 - "Assume unchanged git".

 - Reusing data from existing packs.

 - Perl 5.6 backward compatibility.

   Alex Riesen volunteered to further work on porting the
   "allegedly 5.6 compatible" constructs these patches use to
   work on ActiveState (whose 5.8 does not grok them).  We'll
   see what happens.

* The 'next' branch, in addition, has these.

 - send-pack workaround for insanely large number of refs.

 - git-cvsserver by Martyn and Martin (and The Open University UK folks).

 - Delta updates by Nicolas Pitre.

   One part out of this four patch series seems to break diff
   tests so that one sits in 'pu' for now.

 - Other bits I mentioned in the previous message:

   - "Thin pack" bandwidth reduction for "git push" and "git
     fetch".

   - git-blame by Fredrik.

   - git-annotate by Ryan, with some enhancements from Johannes
     to make it usable with git-cvsserver.

* The 'pu' branch, in addition, has these.

 - git-rm by Carl Worth and Krzysiek Pawlik.

   This is not shellquote safe, so please be careful if you
   have files with shell metacharacters in their names.

 - Delta updates by Nicolas Pitre.

 - "Bound commit" lowlevel machinery for subprojects support.

^ permalink raw reply

* [PATCH] send-pack: do not give up when remote has insanely large number of refs.
From: Junio C Hamano @ 2006-02-22  9:51 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git
In-Reply-To: <7virr8t82n.fsf@assigned-by-dhcp.cox.net>

Stephen C. Tweedie noticed that we give up running rev-list when
we see too many refs on the remote side.  Limit the number of
negative references we give to rev-list and continue.

Not sending any negative references to rev-list is very bad --
we may be pushing a ref that is new to the other end.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

  Junio C Hamano <junkio@cox.net> writes:

  > That is, something like this.
  > -- >8 --
  > Do not give up...

  Ah, crap.  Sorry about that.  This one has been tested at least.

 send-pack.c |   38 ++++++++++++++++++++++++++++----------
 1 files changed, 28 insertions(+), 10 deletions(-)

797656e58ddbd82ac461a5142ed726db3a4d0ac0
diff --git a/send-pack.c b/send-pack.c
index 990be3f..b58bbab 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -37,26 +37,44 @@ static void exec_pack_objects(void)
 
 static void exec_rev_list(struct ref *refs)
 {
+	struct ref *ref;
 	static char *args[1000];
-	int i = 0;
+	int i = 0, j;
 
 	args[i++] = "rev-list";	/* 0 */
 	args[i++] = "--objects";	/* 1 */
-	while (refs) {
-		char *buf = malloc(100);
-		if (i > 900)
+
+	/* First send the ones we care about most */
+	for (ref = refs; ref; ref = ref->next) {
+		if (900 < i)
 			die("git-rev-list environment overflow");
-		if (!is_zero_sha1(refs->old_sha1) &&
-		    has_sha1_file(refs->old_sha1)) {
+		if (!is_zero_sha1(ref->new_sha1)) {
+			char *buf = malloc(100);
 			args[i++] = buf;
-			snprintf(buf, 50, "^%s", sha1_to_hex(refs->old_sha1));
+			snprintf(buf, 50, "%s", sha1_to_hex(ref->new_sha1));
 			buf += 50;
+			if (!is_zero_sha1(ref->old_sha1) &&
+			    has_sha1_file(ref->old_sha1)) {
+				args[i++] = buf;
+				snprintf(buf, 50, "^%s",
+					 sha1_to_hex(ref->old_sha1));
+			}
 		}
-		if (!is_zero_sha1(refs->new_sha1)) {
+	}
+
+	/* Then a handful of the remainder
+	 * NEEDSWORK: we would be better off if used the newer ones first.
+	 */
+	for (ref = refs, j = i + 16;
+	     i < 900 && i < j && ref;
+	     ref = ref->next) {
+		if (is_zero_sha1(ref->new_sha1) &&
+		    !is_zero_sha1(ref->old_sha1) &&
+		    has_sha1_file(ref->old_sha1)) {
+			char *buf = malloc(42);
 			args[i++] = buf;
-			snprintf(buf, 50, "%s", sha1_to_hex(refs->new_sha1));
+			snprintf(buf, 42, "^%s", sha1_to_hex(ref->old_sha1));
 		}
-		refs = refs->next;
 	}
 	args[i] = NULL;
 	execv_git_cmd(args);
-- 
1.2.2.g882e

^ permalink raw reply related

* [PATCH] Introducing git-cvsserver -- a CVS emulator for git.
From: Martin Langhoff @ 2006-02-22  9:50 UTC (permalink / raw)
  To: git; +Cc: Martin Langhoff

git-cvsserver is highly functional. However, not all methods are implemented,
and for those methods that are implemented, not all switches are implemented.
All the common read operations are implemented, and add/remove/commit are
supported.

Testing has been done using both the CLI CVS client, and the Eclipse CVS
plugin. Most functionality works fine with both of these clients.

Currently git-cvsserver only works over SSH connections, see the
Documentation for more details on how to configure your client. It
does not support pserver for anonymous access but it should not be
hard to implement. Anonymous access will need tighter input validation.

In our very informal tests, it seems to be significantly faster than a real
CVS server.

This utility depends on a version of git-cvsannotate that supports -S and on
DBD::SQLite.

Licensed under GPLv2. Copyright The Open University UK.
Authors: Martyn Smith <martyn@catalyst.net.nz>
         Martin Langhoff <martin@catalyst.net.nz>

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>


---

 Documentation/git-cvsserver.txt |   89 +
 Makefile                        |    2 
 git-cvsserver.perl              | 2449 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 2539 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/git-cvsserver.txt
 create mode 100755 git-cvsserver.perl

1ceecd39c82c5bd469bfebc9744f8ef6dbafc7d6
diff --git a/Documentation/git-cvsserver.txt b/Documentation/git-cvsserver.txt
new file mode 100644
index 0000000..f8b52f8
--- /dev/null
+++ b/Documentation/git-cvsserver.txt
@@ -0,0 +1,89 @@
+git-cvsserver(1)
+================
+
+NAME
+----
+git-cvsserver - A CVS server emulator for git
+
+
+SYNOPSIS
+--------
+[verse]
+export CVS_SERVER=git-cvsserver
+'cvs' -d :ext:user@server/path/repo.git co <HEAD_name>
+
+
+DESCRIPTION
+-----------
+
+This application is a CVS emulation layer for git.
+
+It is highly functional. However, not all methods are implemented, 
+and for those methods that are implemented,
+not all switches are implemented.  
+
+Testing has been done using both the CLI CVS client, and the Eclipse CVS
+plugin. Most functionality works fine with both of these clients.
+
+LIMITATIONS
+-----------
+Currently gitcvs only works over ssh connections.
+
+
+INSTALLATION
+------------
+1. Put server.pl somewhere useful on the same machine that is hosting your git repos
+
+2. For each repo that you want accessible from CVS you need to edit config in
+   the repo and add the following section.
+
+   [gitcvs]
+        enabled=1
+        logfile=/path/to/logfile
+
+   n.b. you need to ensure each user that is going to invoke server.pl has
+   write access to the log file.
+
+5. On each client machine you need to set the following variables.
+     CVSROOT should be set as per normal, but the directory should point at the
+             appropriate git repo.
+     CVS_SERVER should be set to the server.pl script that has been put on the
+                remote machine.
+
+6. Clients should now be able to check out modules (where modules are the names
+   of branches in git).
+     $ cvs co -d mylocaldir master
+
+Operations supported
+--------------------
+
+All the operations required for normal use are supported, including
+checkout, diff, status, update, log, add, remove, commit.
+Legacy monitoring operations are not supported (edit, watch and related). 
+Exports and tagging (tags and branches) are not supported at this stage. 
+
+The server will set the -k mode to binary when relevant. In proper GIT
+tradition, the contents of the files are always respected. 
+No keyword expansion or newline munging is supported. 
+
+Dependencies
+------------
+
+git-cvsserver depends on DBD::SQLite. 
+
+Copyright and Authors
+---------------------
+
+This program is copyright The Open University UK - 2006.
+
+Authors: Martyn Smith    <martyn@catalyst.net.nz>
+         Martin Langhoff <martin@catalyst.net.nz>
+         with ideas and patches from participants of the git-list <git@vger.kernel.org>.
+
+Documentation
+--------------
+Documentation by Martyn Smith <martyn@catalyst.net.nz> and Martin Langhoff <martin@catalyst.net.nz>Matthias Urlichs <smurf@smurf.noris.de>.
+
+GIT
+---
+Part of the gitlink:git[7] suite
diff --git a/Makefile b/Makefile
index d8f7d03..bf2436a 100644
--- a/Makefile
+++ b/Makefile
@@ -130,7 +130,7 @@ SCRIPT_SH = \
 SCRIPT_PERL = \
 	git-archimport.perl git-cvsimport.perl git-relink.perl \
 	git-shortlog.perl git-fmt-merge-msg.perl git-rerere.perl \
-	git-annotate.perl \
+	git-annotate.perl git-cvsserver.perl \
 	git-svnimport.perl git-mv.perl git-cvsexportcommit.perl
 
 SCRIPT_PYTHON = \
diff --git a/git-cvsserver.perl b/git-cvsserver.perl
new file mode 100755
index 0000000..a32cd9a
--- /dev/null
+++ b/git-cvsserver.perl
@@ -0,0 +1,2449 @@
+#!/usr/bin/perl
+
+####
+#### This application is a CVS emulation layer for git.
+#### It is intended for clients to connect over SSH.
+#### See the documentation for more details. 
+####
+#### Copyright The Open University UK - 2006.
+####
+#### Authors: Martyn Smith    <martyn@catalyst.net.nz>
+####          Martin Langhoff <martin@catalyst.net.nz>
+####
+####
+#### Released under the GNU Public License, version 2.
+####
+####
+
+use strict;
+use warnings;
+
+use Fcntl;
+use File::Temp qw/tempdir tempfile/;
+use File::Basename;
+
+my $log = GITCVS::log->new();
+my $cfg;
+
+my $DATE_LIST = {
+    Jan => "01",
+    Feb => "02",
+    Mar => "03",
+    Apr => "04",
+    May => "05",
+    Jun => "06",
+    Jul => "07",
+    Aug => "08",
+    Sep => "09",
+    Oct => "10",
+    Nov => "11",
+    Dec => "12",
+};
+
+# Enable autoflush for STDOUT (otherwise the whole thing falls apart)
+$| = 1;
+
+#### Definition and mappings of functions ####
+
+my $methods = {
+    'Root'            => \&req_Root,
+    'Valid-responses' => \&req_Validresponses,
+    'valid-requests'  => \&req_validrequests,
+    'Directory'       => \&req_Directory,
+    'Entry'           => \&req_Entry,
+    'Modified'        => \&req_Modified,
+    'Unchanged'       => \&req_Unchanged,
+    'Argument'        => \&req_Argument,
+    'Argumentx'       => \&req_Argument,
+    'expand-modules'  => \&req_expandmodules,
+    'add'             => \&req_add,
+    'remove'          => \&req_remove,
+    'co'              => \&req_co,
+    'update'          => \&req_update,
+    'ci'              => \&req_ci,
+    'diff'            => \&req_diff,
+    'log'             => \&req_log,
+    'tag'             => \&req_CATCHALL,
+    'status'          => \&req_status,
+    'admin'           => \&req_CATCHALL,
+    'history'         => \&req_CATCHALL,
+    'watchers'        => \&req_CATCHALL,
+    'editors'         => \&req_CATCHALL,
+    'annotate'        => \&req_annotate,
+    'Global_option'   => \&req_Globaloption,
+    #'annotate'        => \&req_CATCHALL,
+};
+
+##############################################
+
+
+# $state holds all the bits of information the clients sends us that could
+# potentially be useful when it comes to actually _doing_ something.
+my $state = {};
+$log->info("--------------- STARTING -----------------");
+
+my $TEMP_DIR = tempdir( CLEANUP => 1 );
+$log->debug("Temporary directory is '$TEMP_DIR'");
+
+# Keep going until the client closes the connection
+while (<STDIN>)
+{
+    chomp;
+
+    # Check to see if we've seen this method, and call appropiate function.
+    if ( /^([\w-]+)(?:\s+(.*))?$/ and defined($methods->{$1}) )
+    {
+        # use the $methods hash to call the appropriate sub for this command
+        #$log->info("Method : $1");
+        &{$methods->{$1}}($1,$2);
+    } else {
+        # log fatal because we don't understand this function. If this happens
+        # we're fairly screwed because we don't know if the client is expecting
+        # a response. If it is, the client will hang, we'll hang, and the whole
+        # thing will be custard.
+        $log->fatal("Don't understand command $_\n");
+        die("Unknown command $_");
+    }
+}
+
+$log->debug("Processing time : user=" . (times)[0] . " system=" . (times)[1]);
+$log->info("--------------- FINISH -----------------");
+
+# Magic catchall method.
+#    This is the method that will handle all commands we haven't yet
+#    implemented. It simply sends a warning to the log file indicating a
+#    command that hasn't been implemented has been invoked.
+sub req_CATCHALL
+{
+    my ( $cmd, $data ) = @_;
+    $log->warn("Unhandled command : req_$cmd : $data");
+}
+
+
+# Root pathname \n 
+#     Response expected: no. Tell the server which CVSROOT to use. Note that
+#     pathname is a local directory and not a fully qualified CVSROOT variable.
+#     pathname must already exist; if creating a new root, use the init
+#     request, not Root. pathname does not include the hostname of the server,
+#     how to access the server, etc.; by the time the CVS protocol is in use,
+#     connection, authentication, etc., are already taken care of. The Root
+#     request must be sent only once, and it must be sent before any requests
+#     other than Valid-responses, valid-requests, UseUnchanged, Set or init.
+sub req_Root
+{
+    my ( $cmd, $data ) = @_;
+    $log->debug("req_Root : $data");
+
+    $state->{CVSROOT} = $data;
+
+    $ENV{GIT_DIR} = $state->{CVSROOT} . "/";
+
+    foreach my $line ( `git-var -l` )
+    {
+        next unless ( $line =~ /^(.*?)\.(.*?)=(.*)$/ );
+        $cfg->{$1}{$2} = $3;
+    }
+
+    unless ( defined ( $cfg->{gitcvs}{enabled} ) and $cfg->{gitcvs}{enabled} =~ /^\s*(1|true|yes)\s*$/i )
+    {
+        print "E GITCVS emulation needs to be enabled on this repo\n";
+        print "E the repo config file needs a [gitcvs] section added, and the parameter 'enabled' set to 1\n";
+        print "E \n";
+        print "error 1 GITCVS emulation disabled\n";
+    }
+
+    if ( defined ( $cfg->{gitcvs}{logfile} ) )
+    {
+        $log->setfile($cfg->{gitcvs}{logfile});
+    } else {
+        $log->nofile();
+    }
+}
+
+# Global_option option \n 
+#     Response expected: no. Transmit one of the global options `-q', `-Q',
+#     `-l', `-t', `-r', or `-n'. option must be one of those strings, no
+#     variations (such as combining of options) are allowed. For graceful
+#     handling of valid-requests, it is probably better to make new global
+#     options separate requests, rather than trying to add them to this
+#     request.
+sub req_Globaloption
+{
+    my ( $cmd, $data ) = @_;
+    $log->debug("req_Globaloption : $data");
+
+    # TODO : is this data useful ???
+}
+
+# Valid-responses request-list \n 
+#     Response expected: no. Tell the server what responses the client will
+#     accept. request-list is a space separated list of tokens.
+sub req_Validresponses
+{
+    my ( $cmd, $data ) = @_;
+    $log->debug("req_Validrepsonses : $data");
+
+    # TODO : re-enable this, currently it's not particularly useful
+    #$state->{validresponses} = [ split /\s+/, $data ];
+}
+
+# valid-requests \n 
+#     Response expected: yes. Ask the server to send back a Valid-requests
+#     response.
+sub req_validrequests
+{
+    my ( $cmd, $data ) = @_;
+
+    $log->debug("req_validrequests");
+    
+    $log->debug("SEND : Valid-requests " . join(" ",keys %$methods));
+    $log->debug("SEND : ok");
+
+    print "Valid-requests " . join(" ",keys %$methods) . "\n";
+    print "ok\n";
+}
+
+# Directory local-directory \n 
+#     Additional data: repository \n. Response expected: no. Tell the server
+#     what directory to use. The repository should be a directory name from a
+#     previous server response. Note that this both gives a default for Entry
+#     and Modified and also for ci and the other commands; normal usage is to
+#     send Directory for each directory in which there will be an Entry or
+#     Modified, and then a final Directory for the original directory, then the
+#     command. The local-directory is relative to the top level at which the
+#     command is occurring (i.e. the last Directory which is sent before the
+#     command); to indicate that top level, `.' should be sent for
+#     local-directory.
+sub req_Directory
+{
+    my ( $cmd, $data ) = @_;
+
+    my $repository = <STDIN>;
+    chomp $repository;
+
+
+    $state->{localdir} = $data;
+    $state->{repository} = $repository;
+    $state->{directory} = $repository;
+    $state->{directory} =~ s/^$state->{CVSROOT}\///;
+    $state->{module} = $1 if ($state->{directory} =~ s/^(.*?)(\/|$)//);
+    $state->{directory} .= "/" if ( $state->{directory} =~ /\S/ );
+
+    $log->debug("req_Directory : localdir=$data repository=$repository directory=$state->{directory} module=$state->{module}");
+}
+
+# Entry entry-line \n 
+#     Response expected: no. Tell the server what version of a file is on the
+#     local machine. The name in entry-line is a name relative to the directory
+#     most recently specified with Directory. If the user is operating on only
+#     some files in a directory, Entry requests for only those files need be
+#     included. If an Entry request is sent without Modified, Is-modified, or
+#     Unchanged, it means the file is lost (does not exist in the working
+#     directory). If both Entry and one of Modified, Is-modified, or Unchanged
+#     are sent for the same file, Entry must be sent first. For a given file,
+#     one can send Modified, Is-modified, or Unchanged, but not more than one
+#     of these three.
+sub req_Entry
+{
+    my ( $cmd, $data ) = @_;
+
+    $log->debug("req_Entry : $data");
+
+    my @data = split(/\//, $data);
+
+    $state->{entries}{$state->{directory}.$data[1]} = {
+        revision    => $data[2],
+        conflict    => $data[3],
+        options     => $data[4],
+        tag_or_date => $data[5],
+    };
+}
+
+# add \n 
+#     Response expected: yes. Add a file or directory. This uses any previous
+#     Argument, Directory, Entry, or Modified requests, if they have been sent.
+#     The last Directory sent specifies the working directory at the time of
+#     the operation. To add a directory, send the directory to be added using
+#     Directory and Argument requests.
+sub req_add
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("add");
+
+    my $addcount = 0;
+
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        unless ( defined ( $state->{entries}{$filename}{modified_filename} ) )
+        {
+            print "E cvs add: nothing known about `$filename'\n";
+            next;
+        }
+        # TODO : check we're not squashing an already existing file
+        if ( defined ( $state->{entries}{$filename}{revision} ) )
+        {
+            print "E cvs add: `$filename' has already been entered\n";
+            next;
+        }
+
+
+        my ( $filepart, $dirpart ) = filenamesplit($filename);
+
+        print "E cvs add: scheduling file `$filename' for addition\n";
+
+        print "Checked-in $dirpart\n";
+        print "$filename\n";
+        print "/$filepart/0///\n";
+
+        $addcount++;
+    }
+
+    if ( $addcount == 1 )
+    {
+        print "E cvs add: use `cvs commit' to add this file permanently\n";
+    }
+    elsif ( $addcount > 1 )
+    {
+        print "E cvs add: use `cvs commit' to add these files permanently\n";
+    }
+
+    print "ok\n";
+}
+
+# remove \n 
+#     Response expected: yes. Remove a file. This uses any previous Argument,
+#     Directory, Entry, or Modified requests, if they have been sent. The last
+#     Directory sent specifies the working directory at the time of the
+#     operation. Note that this request does not actually do anything to the
+#     repository; the only effect of a successful remove request is to supply
+#     the client with a new entries line containing `-' to indicate a removed
+#     file. In fact, the client probably could perform this operation without
+#     contacting the server, although using remove may cause the server to
+#     perform a few more checks. The client sends a subsequent ci request to
+#     actually record the removal in the repository.
+sub req_remove
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("remove");
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+    $updater->update();
+
+    #$log->debug("add state : " . Dumper($state));
+
+    my $rmcount = 0;
+
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        if ( defined ( $state->{entries}{$filename}{unchanged} ) or defined ( $state->{entries}{$filename}{modified_filename} ) )
+        {
+            print "E cvs remove: file `$filename' still in working directory\n";
+            next;
+        }
+
+        my $meta = $updater->getmeta($filename);
+        my $wrev = revparse($filename);
+
+        unless ( defined ( $wrev ) )
+        {
+            print "E cvs remove: nothing known about `$filename'\n";
+            next;
+        }
+
+        if ( defined($wrev) and $wrev < 0 )
+        {
+            print "E cvs remove: file `$filename' already scheduled for removal\n";
+            next;
+        }
+
+        unless ( $wrev == $meta->{revision} )
+        {
+            # TODO : not sure if the format of this message is quite correct.
+            print "E cvs remove: Up to date check failed for `$filename'\n";
+            next;
+        }
+
+
+        my ( $filepart, $dirpart ) = filenamesplit($filename);
+
+        print "E cvs remove: scheduling `$filename' for removal\n";
+
+        print "Checked-in $dirpart\n";
+        print "$filename\n";
+        print "/$filepart/-1.$wrev///\n";
+
+        $rmcount++;
+    }
+
+    if ( $rmcount == 1 )
+    {
+        print "E cvs remove: use `cvs commit' to remove this file permanently\n";
+    }
+    elsif ( $rmcount > 1 )
+    {
+        print "E cvs remove: use `cvs commit' to remove these files permanently\n";
+    }
+
+    print "ok\n";
+}
+
+# Modified filename \n 
+#     Response expected: no. Additional data: mode, \n, file transmission. Send
+#     the server a copy of one locally modified file. filename is a file within
+#     the most recent directory sent with Directory; it must not contain `/'.
+#     If the user is operating on only some files in a directory, only those
+#     files need to be included. This can also be sent without Entry, if there
+#     is no entry for the file.
+sub req_Modified
+{
+    my ( $cmd, $data ) = @_;
+
+    my $mode = <STDIN>;
+    chomp $mode;
+    my $size = <STDIN>;
+    chomp $size;
+
+    # Grab config information
+    my $blocksize = 8192;
+    my $bytesleft = $size;
+    my $tmp;
+
+    # Get a filehandle/name to write it to
+    my ( $fh, $filename ) = tempfile( DIR => $TEMP_DIR );
+
+    # Loop over file data writing out to temporary file.
+    while ( $bytesleft )
+    {
+        $blocksize = $bytesleft if ( $bytesleft < $blocksize );
+        read STDIN, $tmp, $blocksize;
+        print $fh $tmp;
+        $bytesleft -= $blocksize;
+    }
+
+    close $fh;
+
+    # Ensure we have something sensible for the file mode
+    if ( $mode =~ /u=(\w+)/ )
+    {
+        $mode = $1;
+    } else {
+        $mode = "rw";
+    }
+
+    # Save the file data in $state
+    $state->{entries}{$state->{directory}.$data}{modified_filename} = $filename;
+    $state->{entries}{$state->{directory}.$data}{modified_mode} = $mode;
+    $state->{entries}{$state->{directory}.$data}{modified_hash} = `git-hash-object $filename`;
+    $state->{entries}{$state->{directory}.$data}{modified_hash} =~ s/\s.*$//s;
+
+    #$log->debug("req_Modified : file=$data mode=$mode size=$size");
+}
+
+# Unchanged filename \n 
+#     Response expected: no. Tell the server that filename has not been
+#     modified in the checked out directory. The filename is a file within the
+#     most recent directory sent with Directory; it must not contain `/'.
+sub req_Unchanged
+{
+    my ( $cmd, $data ) = @_;
+
+    $state->{entries}{$state->{directory}.$data}{unchanged} = 1;
+
+    #$log->debug("req_Unchanged : $data");
+}
+
+# Argument text \n 
+#     Response expected: no. Save argument for use in a subsequent command.
+#     Arguments accumulate until an argument-using command is given, at which
+#     point they are forgotten.
+# Argumentx text \n 
+#     Response expected: no. Append \n followed by text to the current argument
+#     being saved.
+sub req_Argument
+{
+    my ( $cmd, $data ) = @_;
+
+    # TODO :  Not quite sure how Argument and Argumentx differ, but I assume
+    # it's for multi-line arguments ... somehow ...
+
+    $log->debug("$cmd : $data");
+
+    push @{$state->{arguments}}, $data;
+}
+
+# expand-modules \n 
+#     Response expected: yes. Expand the modules which are specified in the
+#     arguments. Returns the data in Module-expansion responses. Note that the
+#     server can assume that this is checkout or export, not rtag or rdiff; the
+#     latter do not access the working directory and thus have no need to
+#     expand modules on the client side. Expand may not be the best word for
+#     what this request does. It does not necessarily tell you all the files
+#     contained in a module, for example. Basically it is a way of telling you
+#     which working directories the server needs to know about in order to
+#     handle a checkout of the specified modules. For example, suppose that the
+#     server has a module defined by 
+#   aliasmodule -a 1dir
+#     That is, one can check out aliasmodule and it will take 1dir in the
+#     repository and check it out to 1dir in the working directory. Now suppose
+#     the client already has this module checked out and is planning on using
+#     the co request to update it. Without using expand-modules, the client
+#     would have two bad choices: it could either send information about all
+#     working directories under the current directory, which could be
+#     unnecessarily slow, or it could be ignorant of the fact that aliasmodule
+#     stands for 1dir, and neglect to send information for 1dir, which would
+#     lead to incorrect operation. With expand-modules, the client would first
+#     ask for the module to be expanded:
+sub req_expandmodules
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit();
+
+    $log->debug("req_expandmodules : " . ( defined($data) ? $data : "[NULL]" ) );
+
+    unless ( ref $state->{arguments} eq "ARRAY" )
+    {
+        print "ok\n";
+        return;
+    }
+
+    foreach my $module ( @{$state->{arguments}} )
+    {
+        $log->debug("SEND : Module-expansion $module");
+        print "Module-expansion $module\n";
+    }
+
+    print "ok\n";
+    statecleanup();
+}
+
+# co \n 
+#     Response expected: yes. Get files from the repository. This uses any
+#     previous Argument, Directory, Entry, or Modified requests, if they have
+#     been sent. Arguments to this command are module names; the client cannot
+#     know what directories they correspond to except by (1) just sending the
+#     co request, and then seeing what directory names the server sends back in
+#     its responses, and (2) the expand-modules request.
+sub req_co
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("co");
+
+    my $module = $state->{args}[0];
+    my $checkout_path = $module;
+
+    # use the user specified directory if we're given it
+    $checkout_path = $state->{opt}{d} if ( exists ( $state->{opt}{d} ) );
+
+    $log->debug("req_co : " . ( defined($data) ? $data : "[NULL]" ) );
+
+    $log->info("Checking out module '$module' ($state->{CVSROOT}) to '$checkout_path'");
+
+    $ENV{GIT_DIR} = $state->{CVSROOT} . "/";
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $module, $log);
+    $updater->update();
+
+    # instruct the client that we're checking out to $checkout_path
+    print "E cvs server: updating $checkout_path\n";
+
+    foreach my $git ( @{$updater->gethead} )
+    {
+        # Don't want to check out deleted files
+        next if ( $git->{filehash} eq "deleted" );
+
+        ( $git->{name}, $git->{dir} ) = filenamesplit($git->{name});
+
+        # modification time of this file
+        print "Mod-time $git->{modified}\n";
+
+        # print some information to the client
+        print "MT +updated\n";
+        print "MT text U\n";
+        if ( defined ( $git->{dir} ) and $git->{dir} ne "./" )
+        {
+            print "MT fname $checkout_path/$git->{dir}$git->{name}\n";
+        } else {
+            print "MT fname $checkout_path/$git->{name}\n";
+        }
+        print "MT newline\n";
+        print "MT -updated\n";
+
+        # instruct client we're sending a file to put in this path
+        print "Created $checkout_path/" . ( defined ( $git->{dir} ) ? $git->{dir} . "/" : "" ) . "\n";
+
+        print $state->{CVSROOT} . "/$module/" . ( defined ( $git->{dir} ) ? $git->{dir} . "/" : "" ) . "$git->{name}\n";
+
+        # this is an "entries" line
+        print "/$git->{name}/1.$git->{revision}///\n";
+        # permissions
+        print "u=$git->{mode},g=$git->{mode},o=$git->{mode}\n";
+
+        # transmit file
+        transmitfile($git->{filehash});
+    }
+
+    print "ok\n";
+
+    statecleanup();
+}
+
+# update \n 
+#     Response expected: yes. Actually do a cvs update command. This uses any
+#     previous Argument, Directory, Entry, or Modified requests, if they have
+#     been sent. The last Directory sent specifies the working directory at the
+#     time of the operation. The -I option is not used--files which the client
+#     can decide whether to ignore are not mentioned and the client sends the
+#     Questionable request for others.
+sub req_update
+{
+    my ( $cmd, $data ) = @_;
+
+    $log->debug("req_update : " . ( defined($data) ? $data : "[NULL]" ));
+
+    argsplit("update");
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+
+    $updater->update();
+
+    # if no files were specified, we need to work out what files we should be providing status on ...
+    argsfromdir($updater) if ( scalar ( @{$state->{args}} ) == 0 );
+
+    #$log->debug("update state : " . Dumper($state));
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        # if we have a -C we should pretend we never saw modified stuff
+        if ( exists ( $state->{opt}{C} ) )
+        {
+            delete $state->{entries}{$filename}{modified_hash};
+            delete $state->{entries}{$filename}{modified_filename};
+            $state->{entries}{$filename}{unchanged} = 1;
+        }
+
+        my $meta;
+        if ( defined($state->{opt}{r}) and $state->{opt}{r} =~ /^1\.(\d+)/ )
+        {
+            $meta = $updater->getmeta($filename, $1);
+        } else {
+            $meta = $updater->getmeta($filename);
+        }
+
+        next unless ( $meta->{revision} );
+
+        my $oldmeta = $meta;
+
+        my $wrev = revparse($filename);
+
+        # If the working copy is an old revision, lets get that version too for comparison.
+        if ( defined($wrev) and $wrev != $meta->{revision} )
+        {
+            $oldmeta = $updater->getmeta($filename, $wrev);
+        }
+
+        #$log->debug("Target revision is $meta->{revision}, current working revision is $wrev");
+
+        # Files are up to date if the working copy and repo copy have the same revision, and the working copy is unmodified _and_ the user hasn't specified -C
+        next if ( defined ( $wrev ) and defined($meta->{revision}) and $wrev == $meta->{revision} and $state->{entries}{$filename}{unchanged} and not exists ( $state->{opt}{C} ) );
+
+        if ( $meta->{filehash} eq "deleted" )
+        {
+            my ( $filepart, $dirpart ) = filenamesplit($filename);
+
+            $log->info("Removing '$filename' from working copy (no longer in the repo)");
+
+            print "E cvs update: `$filename' is no longer in the repository\n";
+            print "Removed $dirpart\n";
+            print "$filepart\n";
+        }
+        elsif ( not defined ( $state->{entries}{$filename}{modified_hash} ) or $state->{entries}{$filename}{modified_hash} eq $oldmeta->{filehash} )
+        {
+            $log->info("Updating '$filename'");
+            # normal update, just send the new revision (either U=Update, or A=Add, or R=Remove)
+            print "MT +updated\n";
+            print "MT text U\n";
+            print "MT fname $filename\n";
+            print "MT newline\n";
+            print "MT -updated\n";
+
+            my ( $filepart, $dirpart ) = filenamesplit($filename);
+            $dirpart =~ s/^$state->{directory}//;
+
+            if ( defined ( $wrev ) )
+            {
+                # instruct client we're sending a file to put in this path as a replacement
+                print "Update-existing $dirpart\n";
+                $log->debug("Updating existing file 'Update-existing $dirpart'");
+            } else {
+                # instruct client we're sending a file to put in this path as a new file
+                print "Created $dirpart\n";
+                $log->debug("Creating new file 'Created $dirpart'");
+            }
+            print $state->{CVSROOT} . "/$state->{module}/$filename\n";
+
+            # this is an "entries" line
+            $log->debug("/$filepart/1.$meta->{revision}///");
+            print "/$filepart/1.$meta->{revision}///\n";
+
+            # permissions
+            $log->debug("SEND : u=$meta->{mode},g=$meta->{mode},o=$meta->{mode}");
+            print "u=$meta->{mode},g=$meta->{mode},o=$meta->{mode}\n";
+
+            # transmit file
+            transmitfile($meta->{filehash});
+        } else {
+            my ( $filepart, $dirpart ) = filenamesplit($meta->{name});
+
+            my $dir = tempdir( DIR => $TEMP_DIR, CLEANUP => 1 ) . "/";
+
+            chdir $dir;
+            my $file_local = $filepart . ".mine";
+            system("ln","-s",$state->{entries}{$filename}{modified_filename}, $file_local);
+            my $file_old = $filepart . "." . $oldmeta->{revision};
+            transmitfile($oldmeta->{filehash}, $file_old);
+            my $file_new = $filepart . "." . $meta->{revision};
+            transmitfile($meta->{filehash}, $file_new);
+
+            # we need to merge with the local changes ( M=successful merge, C=conflict merge )
+            $log->info("Merging $file_local, $file_old, $file_new");
+
+            $log->debug("Temporary directory for merge is $dir");
+
+            my $return = system("merge", $file_local, $file_old, $file_new);
+            $return >>= 8;
+
+            if ( $return == 0 )
+            {
+                $log->info("Merged successfully");
+                print "M M $filename\n";
+                $log->debug("Update-existing $dirpart");
+                print "Update-existing $dirpart\n";
+                $log->debug($state->{CVSROOT} . "/$state->{module}/$filename");
+                print $state->{CVSROOT} . "/$state->{module}/$filename\n";
+                $log->debug("/$filepart/1.$meta->{revision}///");
+                print "/$filepart/1.$meta->{revision}///\n";
+            }
+            elsif ( $return == 1 )
+            {
+                $log->info("Merged with conflicts");
+                print "M C $filename\n";
+                print "Update-existing $dirpart\n";
+                print $state->{CVSROOT} . "/$state->{module}/$filename\n";
+                print "/$filepart/1.$meta->{revision}/+//\n";
+            }
+            else
+            {
+                $log->warn("Merge failed");
+                next;
+            }
+
+            # permissions
+            $log->debug("SEND : u=$meta->{mode},g=$meta->{mode},o=$meta->{mode}");
+            print "u=$meta->{mode},g=$meta->{mode},o=$meta->{mode}\n";
+
+            # transmit file, format is single integer on a line by itself (file
+            # size) followed by the file contents
+            # TODO : we should copy files in blocks
+            my $data = `cat $file_local`;
+            $log->debug("File size : " . length($data));
+            print length($data) . "\n";
+            print $data;
+
+            chdir "/";
+        }
+
+    }
+
+    print "ok\n";
+}
+
+sub req_ci
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("ci");
+
+    #$log->debug("State : " . Dumper($state));
+
+    $log->info("req_ci : " . ( defined($data) ? $data : "[NULL]" ));
+
+    if ( -e $state->{CVSROOT} . "/index" )
+    {
+        print "error 1 Index already exists in git repo\n";
+        exit;
+    }
+
+    my $lockfile = "$state->{CVSROOT}/refs/heads/$state->{module}.lock";
+    unless ( sysopen(LOCKFILE,$lockfile,O_EXCL|O_CREAT|O_WRONLY) )
+    {
+        print "error 1 Lock file '$lockfile' already exists, please try again\n";
+        exit;
+    }
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+    $updater->update();
+
+    my $tmpdir = tempdir ( DIR => $TEMP_DIR );
+    my ( undef, $file_index ) = tempfile ( DIR => $TEMP_DIR, OPEN => 0 );
+    $log->info("Lock successful, basing commit on '$tmpdir', index file is '$file_index'");
+
+    $ENV{GIT_DIR} = $state->{CVSROOT} . "/";
+    $ENV{GIT_INDEX_FILE} = $file_index;
+
+    chdir $tmpdir;
+
+    # populate the temporary index based
+    system("git-read-tree", $state->{module});
+    unless ($? == 0)
+    {
+	die "Error running git-read-tree $state->{module} $file_index $!";
+    }
+    $log->info("Created index '$file_index' with for head $state->{module} - exit status $?");
+
+
+    my @committedfiles = ();
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        next unless ( exists $state->{entries}{$filename}{modified_filename} or not $state->{entries}{$filename}{unchanged} );
+
+        my $meta = $updater->getmeta($filename);
+
+        my $wrev = revparse($filename);
+
+        my ( $filepart, $dirpart ) = filenamesplit($filename);
+
+        # do a checkout of the file if it part of this tree
+        if ($wrev) {
+            system('git-checkout-index', '-f', '-u', $filename);
+            unless ($? == 0) {
+                die "Error running git-checkout-index -f -u $filename : $!";
+            }
+        }
+
+        my $addflag = 0;
+        my $rmflag = 0;
+        $rmflag = 1 if ( defined($wrev) and $wrev < 0 );
+        $addflag = 1 unless ( -e $filename );
+
+        # Do up to date checking
+        unless ( $addflag or $wrev == $meta->{revision} or ( $rmflag and -$wrev == $meta->{revision} ) )
+        {
+            # fail everything if an up to date check fails
+            print "error 1 Up to date check failed for $filename\n";
+            close LOCKFILE;
+            unlink($lockfile);
+            chdir "/";
+            exit;
+        }
+
+        push @committedfiles, $filename;
+        $log->info("Committing $filename");
+
+        system("mkdir","-p",$dirpart) unless ( -d $dirpart );
+
+        unless ( $rmflag )
+        {
+            $log->debug("rename $state->{entries}{$filename}{modified_filename} $filename");
+            rename $state->{entries}{$filename}{modified_filename},$filename;
+
+            # Calculate modes to remove
+            my $invmode = "";
+            foreach ( qw (r w x) ) { $invmode .= $_ unless ( $state->{entries}{$filename}{modified_mode} =~ /$_/ ); }
+
+            $log->debug("chmod u+" . $state->{entries}{$filename}{modified_mode} . "-" . $invmode . " $filename");
+            system("chmod","u+" .  $state->{entries}{$filename}{modified_mode} . "-" . $invmode, $filename);
+        }
+
+        if ( $rmflag )
+        {
+            $log->info("Removing file '$filename'");
+            unlink($filename);
+            system("git-update-index", "--remove", $filename);
+        }
+        elsif ( $addflag )
+        {
+            $log->info("Adding file '$filename'");
+            system("git-update-index", "--add", $filename);
+        } else {
+            $log->info("Updating file '$filename'");
+            system("git-update-index", $filename);
+        }
+    }
+
+    unless ( scalar(@committedfiles) > 0 )
+    {
+        print "E No files to commit\n";
+        print "ok\n";
+        close LOCKFILE;
+        unlink($lockfile);
+        chdir "/";
+        return;
+    }
+
+    my $treehash = `git-write-tree`;
+    my $parenthash = `cat $ENV{GIT_DIR}refs/heads/$state->{module}`;
+    chomp $treehash;
+    chomp $parenthash;
+
+    $log->debug("Treehash : $treehash, Parenthash : $parenthash");
+
+    # write our commit message out if we have one ...
+    my ( $msg_fh, $msg_filename ) = tempfile( DIR => $TEMP_DIR );
+    print $msg_fh $state->{opt}{m};# if ( exists ( $state->{opt}{m} ) );
+    print $msg_fh "\n\nvia git-CVS emulator\n";
+    close $msg_fh;
+
+    my $commithash = `git-commit-tree $treehash -p $parenthash < $msg_filename`;
+    $log->info("Commit hash : $commithash");
+
+    unless ( $commithash =~ /[a-zA-Z0-9]{40}/ )
+    {
+        $log->warn("Commit failed (Invalid commit hash)");
+        print "error 1 Commit failed (unknown reason)\n";
+        close LOCKFILE;
+        unlink($lockfile);
+        chdir "/";
+        exit;
+    }
+
+    open FILE, ">", "$ENV{GIT_DIR}refs/heads/$state->{module}";
+    print FILE $commithash;
+    close FILE;
+
+    $updater->update();
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @committedfiles )
+    {
+        $filename = filecleanup($filename);
+
+        my $meta = $updater->getmeta($filename);
+
+        my ( $filepart, $dirpart ) = filenamesplit($filename);
+
+        $log->debug("Checked-in $dirpart : $filename");
+
+        if ( $meta->{filehash} eq "deleted" )
+        {
+            print "Remove-entry $dirpart\n";
+            print "$filename\n";
+        } else {
+            print "Checked-in $dirpart\n";
+            print "$filename\n";
+            print "/$filepart/1.$meta->{revision}///\n";
+        }
+    }
+
+    close LOCKFILE;
+    unlink($lockfile);
+    chdir "/";
+
+    print "ok\n";
+}
+
+sub req_status
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("status");
+
+    $log->info("req_status : " . ( defined($data) ? $data : "[NULL]" ));
+    #$log->debug("status state : " . Dumper($state));
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+    $updater->update();
+
+    # if no files were specified, we need to work out what files we should be providing status on ...
+    argsfromdir($updater) if ( scalar ( @{$state->{args}} ) == 0 );
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        my $meta = $updater->getmeta($filename);
+        my $oldmeta = $meta;
+
+        my $wrev = revparse($filename);
+
+        # If the working copy is an old revision, lets get that version too for comparison.
+        if ( defined($wrev) and $wrev != $meta->{revision} )
+        {
+            $oldmeta = $updater->getmeta($filename, $wrev);
+        }
+
+        # TODO : All possible statuses aren't yet implemented
+        my $status;
+        # Files are up to date if the working copy and repo copy have the same revision, and the working copy is unmodified
+        $status = "Up-to-date" if ( defined ( $wrev ) and defined($meta->{revision}) and $wrev == $meta->{revision}
+                                    and
+                                    ( ( $state->{entries}{$filename}{unchanged} and ( not defined ( $state->{entries}{$filename}{conflict} ) or $state->{entries}{$filename}{conflict} !~ /^\+=/ ) )
+                                      or ( defined($state->{entries}{$filename}{modified_hash}) and $state->{entries}{$filename}{modified_hash} eq $meta->{filehash} ) )
+                                   );
+
+        # Need checkout if the working copy has an older revision than the repo copy, and the working copy is unmodified
+        $status ||= "Needs Checkout" if ( defined ( $wrev ) and defined ( $meta->{revision} ) and $meta->{revision} > $wrev
+                                          and
+                                          ( $state->{entries}{$filename}{unchanged}
+                                            or ( defined($state->{entries}{$filename}{modified_hash}) and $state->{entries}{$filename}{modified_hash} eq $oldmeta->{filehash} ) )
+                                        );
+
+        # Need checkout if it exists in the repo but doesn't have a working copy
+        $status ||= "Needs Checkout" if ( not defined ( $wrev ) and defined ( $meta->{revision} ) );
+
+        # Locally modified if working copy and repo copy have the same revision but there are local changes
+        $status ||= "Locally Modified" if ( defined ( $wrev ) and defined($meta->{revision}) and $wrev == $meta->{revision} and $state->{entries}{$filename}{modified_filename} );
+
+        # Needs Merge if working copy revision is less than repo copy and there are local changes
+        $status ||= "Needs Merge" if ( defined ( $wrev ) and defined ( $meta->{revision} ) and $meta->{revision} > $wrev and $state->{entries}{$filename}{modified_filename} );
+
+        $status ||= "Locally Added" if ( defined ( $state->{entries}{$filename}{revision} ) and not defined ( $meta->{revision} ) );
+        $status ||= "Locally Removed" if ( defined ( $wrev ) and defined ( $meta->{revision} ) and -$wrev == $meta->{revision} );
+        $status ||= "Unresolved Conflict" if ( defined ( $state->{entries}{$filename}{conflict} ) and $state->{entries}{$filename}{conflict} =~ /^\+=/ );
+        $status ||= "File had conflicts on merge" if ( 0 );
+
+        $status ||= "Unknown";
+
+        print "M ===================================================================\n";
+        print "M File: $filename\tStatus: $status\n";
+        if ( defined($state->{entries}{$filename}{revision}) )
+        {
+            print "M Working revision:\t" . $state->{entries}{$filename}{revision} . "\n";
+        } else {
+            print "M Working revision:\tNo entry for $filename\n";
+        }
+        if ( defined($meta->{revision}) )
+        {
+            print "M Repository revision:\t1." . $meta->{revision} . "\t$state->{repository}/$filename,v\n";
+            print "M Sticky Tag:\t\t(none)\n";
+            print "M Sticky Date:\t\t(none)\n";
+            print "M Sticky Options:\t\t(none)\n";
+        } else {
+            print "M Repository revision:\tNo revision control file\n";
+        }
+        print "M\n";
+    }
+
+    print "ok\n";
+}
+
+sub req_diff
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("diff");
+
+    $log->debug("req_diff : " . ( defined($data) ? $data : "[NULL]" ));
+    #$log->debug("status state : " . Dumper($state));
+
+    my ($revision1, $revision2);
+    if ( defined ( $state->{opt}{r} ) and ref $state->{opt}{r} eq "ARRAY" )
+    {
+        $revision1 = $state->{opt}{r}[0];
+        $revision2 = $state->{opt}{r}[1];
+    } else {
+        $revision1 = $state->{opt}{r};
+    }
+
+    $revision1 =~ s/^1\.// if ( defined ( $revision1 ) );
+    $revision2 =~ s/^1\.// if ( defined ( $revision2 ) );
+
+    $log->debug("Diffing revisions " . ( defined($revision1) ? $revision1 : "[NULL]" ) . " and " . ( defined($revision2) ? $revision2 : "[NULL]" ) );
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+    $updater->update();
+
+    # if no files were specified, we need to work out what files we should be providing status on ...
+    argsfromdir($updater) if ( scalar ( @{$state->{args}} ) == 0 );
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        my ( $fh, $file1, $file2, $meta1, $meta2, $filediff );
+
+        my $wrev = revparse($filename);
+
+        # We need _something_ to diff against
+        next unless ( defined ( $wrev ) );
+
+        # if we have a -r switch, use it
+        if ( defined ( $revision1 ) )
+        {
+            ( undef, $file1 ) = tempfile( DIR => $TEMP_DIR, OPEN => 0 );
+            $meta1 = $updater->getmeta($filename, $revision1);
+            unless ( defined ( $meta1 ) and $meta1->{filehash} ne "deleted" )
+            {
+                print "E File $filename at revision 1.$revision1 doesn't exist\n";
+                next;
+            }
+            transmitfile($meta1->{filehash}, $file1);
+        }
+        # otherwise we just use the working copy revision
+        else
+        {
+            ( undef, $file1 ) = tempfile( DIR => $TEMP_DIR, OPEN => 0 );
+            $meta1 = $updater->getmeta($filename, $wrev);
+            transmitfile($meta1->{filehash}, $file1);
+        }
+
+        # if we have a second -r switch, use it too
+        if ( defined ( $revision2 ) )
+        {
+            ( undef, $file2 ) = tempfile( DIR => $TEMP_DIR, OPEN => 0 );
+            $meta2 = $updater->getmeta($filename, $revision2);
+
+            unless ( defined ( $meta2 ) and $meta2->{filehash} ne "deleted" )
+            {
+                print "E File $filename at revision 1.$revision2 doesn't exist\n";
+                next;
+            }
+
+            transmitfile($meta2->{filehash}, $file2);
+        }
+        # otherwise we just use the working copy
+        else
+        {
+            $file2 = $state->{entries}{$filename}{modified_filename};
+        }
+
+        # if we have been given -r, and we don't have a $file2 yet, lets get one
+        if ( defined ( $revision1 ) and not defined ( $file2 ) )
+        {
+            ( undef, $file2 ) = tempfile( DIR => $TEMP_DIR, OPEN => 0 );
+            $meta2 = $updater->getmeta($filename, $wrev);
+            transmitfile($meta2->{filehash}, $file2);
+        }
+
+        # We need to have retrieved something useful
+        next unless ( defined ( $meta1 ) );
+
+        # Files to date if the working copy and repo copy have the same revision, and the working copy is unmodified
+        next if ( not defined ( $meta2 ) and $wrev == $meta1->{revision}
+                  and
+                   ( ( $state->{entries}{$filename}{unchanged} and ( not defined ( $state->{entries}{$filename}{conflict} ) or $state->{entries}{$filename}{conflict} !~ /^\+=/ ) )
+                     or ( defined($state->{entries}{$filename}{modified_hash}) and $state->{entries}{$filename}{modified_hash} eq $meta1->{filehash} ) )
+                  );
+
+        # Apparently we only show diffs for locally modified files
+        next unless ( defined($meta2) or defined ( $state->{entries}{$filename}{modified_filename} ) );
+
+        print "M Index: $filename\n";
+        print "M ===================================================================\n";
+        print "M RCS file: $state->{CVSROOT}/$state->{module}/$filename,v\n";
+        print "M retrieving revision 1.$meta1->{revision}\n" if ( defined ( $meta1 ) );
+        print "M retrieving revision 1.$meta2->{revision}\n" if ( defined ( $meta2 ) );
+        print "M diff ";
+        foreach my $opt ( keys %{$state->{opt}} )
+        {
+            if ( ref $state->{opt}{$opt} eq "ARRAY" )
+            {
+                foreach my $value ( @{$state->{opt}{$opt}} )
+                {
+                    print "-$opt $value ";
+                }
+            } else {
+                print "-$opt ";
+                print "$state->{opt}{$opt} " if ( defined ( $state->{opt}{$opt} ) );
+            }
+        }
+        print "$filename\n";
+
+        $log->info("Diffing $filename -r $meta1->{revision} -r " . ( $meta2->{revision} or "workingcopy" ));
+
+        ( $fh, $filediff ) = tempfile ( DIR => $TEMP_DIR );
+
+        if ( exists $state->{opt}{u} )
+        {
+            system("diff -u -L '$filename revision 1.$meta1->{revision}' -L '$filename " . ( defined($meta2->{revision}) ? "revision 1.$meta2->{revision}" : "working copy" ) . "' $file1 $file2 > $filediff");
+        } else {
+            system("diff $file1 $file2 > $filediff");
+        }
+
+        while ( <$fh> )
+        {
+            print "M $_";
+        }
+        close $fh;
+    }
+
+    print "ok\n";
+}
+
+sub req_log
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("log");
+
+    $log->debug("req_log : " . ( defined($data) ? $data : "[NULL]" ));
+    #$log->debug("log state : " . Dumper($state));
+
+    my ( $minrev, $maxrev );
+    if ( defined ( $state->{opt}{r} ) and $state->{opt}{r} =~ /([\d.]+)?(::?)([\d.]+)?/ )
+    {
+        my $control = $2;
+        $minrev = $1;
+        $maxrev = $3;
+        $minrev =~ s/^1\.// if ( defined ( $minrev ) );
+        $maxrev =~ s/^1\.// if ( defined ( $maxrev ) );
+        $minrev++ if ( defined($minrev) and $control eq "::" );
+    }
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+    $updater->update();
+
+    # if no files were specified, we need to work out what files we should be providing status on ...
+    argsfromdir($updater) if ( scalar ( @{$state->{args}} ) == 0 );
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        my $headmeta = $updater->getmeta($filename);
+
+        my $revisions = $updater->getlog($filename);
+        my $totalrevisions = scalar(@$revisions);
+
+        if ( defined ( $minrev ) )
+        {
+            $log->debug("Removing revisions less than $minrev");
+            while ( scalar(@$revisions) > 0 and $revisions->[-1]{revision} < $minrev )
+            {
+                pop @$revisions;
+            }
+        }
+        if ( defined ( $maxrev ) )
+        {
+            $log->debug("Removing revisions greater than $maxrev");
+            while ( scalar(@$revisions) > 0 and $revisions->[0]{revision} > $maxrev )
+            {
+                shift @$revisions;
+            }
+        }
+
+        next unless ( scalar(@$revisions) );
+
+        print "M \n";
+        print "M RCS file: $state->{CVSROOT}/$state->{module}/$filename,v\n";
+        print "M Working file: $filename\n";
+        print "M head: 1.$headmeta->{revision}\n";
+        print "M branch:\n";
+        print "M locks: strict\n";
+        print "M access list:\n";
+        print "M symbolic names:\n";
+        print "M keyword substitution: kv\n";
+        print "M total revisions: $totalrevisions;\tselected revisions: " . scalar(@$revisions) . "\n";
+        print "M description:\n";
+
+        foreach my $revision ( @$revisions )
+        {
+            print "M ----------------------------\n";
+            print "M revision 1.$revision->{revision}\n";
+            # reformat the date for log output
+            $revision->{modified} = sprintf('%04d/%02d/%02d %s', $3, $DATE_LIST->{$2}, $1, $4 ) if ( $revision->{modified} =~ /(\d+)\s+(\w+)\s+(\d+)\s+(\S+)/ and defined($DATE_LIST->{$2}) );
+            $revision->{author} =~ s/\s+.*//;
+            $revision->{author} =~ s/^(.{8}).*/$1/;
+            print "M date: $revision->{modified};  author: $revision->{author};  state: " . ( $revision->{filehash} eq "deleted" ? "dead" : "Exp" ) . ";  lines: +2 -3\n";
+            my $commitmessage = $updater->commitmessage($revision->{commithash});
+            $commitmessage =~ s/^/M /mg;
+            print $commitmessage . "\n";
+        }
+        print "M =============================================================================\n";
+    }
+
+    print "ok\n";
+}
+
+sub req_annotate
+{
+    my ( $cmd, $data ) = @_;
+
+    argsplit("annotate");
+
+    $log->info("req_annotate : " . ( defined($data) ? $data : "[NULL]" ));
+    #$log->debug("status state : " . Dumper($state));
+
+    # Grab a handle to the SQLite db and do any necessary updates
+    my $updater = GITCVS::updater->new($state->{CVSROOT}, $state->{module}, $log);
+    $updater->update();
+
+    # if no files were specified, we need to work out what files we should be providing annotate on ...
+    argsfromdir($updater) if ( scalar ( @{$state->{args}} ) == 0 );
+
+    # we'll need a temporary checkout dir
+    my $tmpdir = tempdir ( DIR => $TEMP_DIR );
+    my ( undef, $file_index ) = tempfile ( DIR => $TEMP_DIR, OPEN => 0 );
+    $log->info("Temp checkoutdir creation successful, basing annotate session work on '$tmpdir', index file is '$file_index'");
+
+    $ENV{GIT_DIR} = $state->{CVSROOT} . "/";
+    $ENV{GIT_INDEX_FILE} = $file_index;
+
+    chdir $tmpdir;
+
+    # foreach file specified on the commandline ...
+    foreach my $filename ( @{$state->{args}} )
+    {
+        $filename = filecleanup($filename);
+
+        my $meta = $updater->getmeta($filename);
+
+        next unless ( $meta->{revision} );
+
+	# get all the commits that this file was in
+	# in dense format -- aka skip dead revisions
+        my $revisions   = $updater->gethistorydense($filename);
+	my $lastseenin  = $revisions->[0][2];
+
+	# populate the temporary index based on the latest commit were we saw
+	# the file -- but do it cheaply without checking out any files
+	# TODO: if we got a revision from the client, use that instead
+	# to look up the commithash in sqlite (still good to default to 
+	# the current head as we do now)
+	system("git-read-tree", $lastseenin);
+	unless ($? == 0)
+	{
+	    die "Error running git-read-tree $lastseenin $file_index $!";
+	}
+	$log->info("Created index '$file_index' with commit $lastseenin - exit status $?");
+
+        # do a checkout of the file
+        system('git-checkout-index', '-f', '-u', $filename);
+        unless ($? == 0) {
+            die "Error running git-checkout-index -f -u $filename : $!";
+        }
+
+        $log->info("Annotate $filename");
+
+        # Prepare a file with the commits from the linearized
+        # history that annotate should know about. This prevents
+        # git-jsannotate telling us about commits we are hiding
+        # from the client.
+
+        open(ANNOTATEHINTS, ">$tmpdir/.annotate_hints") or die "Error opening > $tmpdir/.annotate_hints $!";
+        for (my $i=0; $i < @$revisions; $i++)
+        {
+            print ANNOTATEHINTS $revisions->[$i][2];
+            if ($i+1 < @$revisions) { # have we got a parent?
+                print ANNOTATEHINTS ' ' . $revisions->[$i+1][2];
+            }
+            print ANNOTATEHINTS "\n";
+        }
+
+        print ANNOTATEHINTS "\n";
+        close ANNOTATEHINTS;
+
+        my $annotatecmd = 'git-annotate';
+        open(ANNOTATE, "-|", $annotatecmd, '-l', '-S', "$tmpdir/.annotate_hints", $filename) 
+	    or die "Error invoking $annotatecmd -l -S $tmpdir/.annotate_hints $filename : $!";
+        my $metadata = {};
+        print "E Annotations for $filename\n";
+        print "E ***************\n";
+        while ( <ANNOTATE> )
+        {
+            if (m/^([a-zA-Z0-9]{40})\t\([^\)]*\)(.*)$/i)
+            {
+                my $commithash = $1;
+                my $data = $2;
+                unless ( defined ( $metadata->{$commithash} ) )
+                {
+                    $metadata->{$commithash} = $updater->getmeta($filename, $commithash);
+                    $metadata->{$commithash}{author} =~ s/\s+.*//;
+                    $metadata->{$commithash}{author} =~ s/^(.{8}).*/$1/;
+                    $metadata->{$commithash}{modified} = sprintf("%02d-%s-%02d", $1, $2, $3) if ( $metadata->{$commithash}{modified} =~ /^(\d+)\s(\w+)\s\d\d(\d\d)/ );
+                }
+                printf("M 1.%-5d      (%-8s %10s): %s\n",
+                    $metadata->{$commithash}{revision},
+                    $metadata->{$commithash}{author},
+                    $metadata->{$commithash}{modified},
+                    $data
+                );
+            } else {
+                $log->warn("Error in annotate output! LINE: $_");
+                print "E Annotate error \n";
+                next;
+            }
+        }
+        close ANNOTATE;
+    }
+
+    # done; get out of the tempdir
+    chdir "/";
+
+    print "ok\n";
+
+}
+
+# This method takes the state->{arguments} array and produces two new arrays.
+# The first is $state->{args} which is everything before the '--' argument, and
+# the second is $state->{files} which is everything after it.
+sub argsplit
+{
+    return unless( defined($state->{arguments}) and ref $state->{arguments} eq "ARRAY" );
+
+    my $type = shift;
+
+    $state->{args} = [];
+    $state->{files} = [];
+    $state->{opt} = {};
+
+    if ( defined($type) )
+    {
+        my $opt = {};
+        $opt = { A => 0, N => 0, P => 0, R => 0, c => 0, f => 0, l => 0, n => 0, p => 0, s => 0, r => 1, D => 1, d => 1, k => 1, j => 1, } if ( $type eq "co" );
+        $opt = { v => 0, l => 0, R => 0 } if ( $type eq "status" );
+        $opt = { A => 0, P => 0, C => 0, d => 0, f => 0, l => 0, R => 0, p => 0, k => 1, r => 1, D => 1, j => 1, I => 1, W => 1 } if ( $type eq "update" );
+        $opt = { l => 0, R => 0, k => 1, D => 1, D => 1, r => 2 } if ( $type eq "diff" );
+        $opt = { c => 0, R => 0, l => 0, f => 0, F => 1, m => 1, r => 1 } if ( $type eq "ci" );
+        $opt = { k => 1, m => 1 } if ( $type eq "add" );
+        $opt = { f => 0, l => 0, R => 0 } if ( $type eq "remove" );
+        $opt = { l => 0, b => 0, h => 0, R => 0, t => 0, N => 0, S => 0, r => 1, d => 1, s => 1, w => 1 } if ( $type eq "log" );
+
+
+        while ( scalar ( @{$state->{arguments}} ) > 0 )
+        {
+            my $arg = shift @{$state->{arguments}};
+
+            next if ( $arg eq "--" );
+            next unless ( $arg =~ /\S/ );
+
+            # if the argument looks like a switch
+            if ( $arg =~ /^-(\w)(.*)/ )
+            {
+                # if it's a switch that takes an argument
+                if ( $opt->{$1} )
+                {
+                    # If this switch has already been provided
+                    if ( $opt->{$1} > 1 and exists ( $state->{opt}{$1} ) )
+                    {
+                        $state->{opt}{$1} = [ $state->{opt}{$1} ];
+                        if ( length($2) > 0 )
+                        {
+                            push @{$state->{opt}{$1}},$2;
+                        } else {
+                            push @{$state->{opt}{$1}}, shift @{$state->{arguments}};
+                        }
+                    } else {
+                        # if there's extra data in the arg, use that as the argument for the switch
+                        if ( length($2) > 0 )
+                        {
+                            $state->{opt}{$1} = $2;
+                        } else {
+                            $state->{opt}{$1} = shift @{$state->{arguments}};
+                        }
+                    }
+                } else {
+                    $state->{opt}{$1} = undef;
+                }
+            }
+            else
+            {
+                push @{$state->{args}}, $arg;
+            }
+        }
+    }
+    else 
+    {
+        my $mode = 0;
+
+        foreach my $value ( @{$state->{arguments}} )
+        {
+            if ( $value eq "--" )
+            {
+                $mode++;
+                next;
+            }
+            push @{$state->{args}}, $value if ( $mode == 0 );
+            push @{$state->{files}}, $value if ( $mode == 1 );
+        }
+    }
+}
+
+# This method uses $state->{directory} to populate $state->{args} with a list of filenames
+sub argsfromdir
+{
+    my $updater = shift;
+
+    $state->{args} = [];
+
+    foreach my $file ( @{$updater->gethead} )
+    {
+        next if ( $file->{filehash} eq "deleted" and not defined ( $state->{entries}{$file->{name}} ) );
+        next unless ( $file->{name} =~ s/^$state->{directory}// );
+        push @{$state->{args}}, $file->{name};
+    }
+}
+
+# This method cleans up the $state variable after a command that uses arguments has run
+sub statecleanup
+{
+    $state->{files} = [];
+    $state->{args} = [];
+    $state->{arguments} = [];
+    $state->{entries} = {};
+}
+
+sub revparse
+{
+    my $filename = shift;
+
+    return undef unless ( defined ( $state->{entries}{$filename}{revision} ) );
+
+    return $1 if ( $state->{entries}{$filename}{revision} =~ /^1\.(\d+)/ );
+    return -$1 if ( $state->{entries}{$filename}{revision} =~ /^-1\.(\d+)/ );
+
+    return undef;
+}
+
+# This method takes a file hash and does a CVS "file transfer" which transmits the
+# size of the file, and then the file contents.
+# If a second argument $targetfile is given, the file is instead written out to
+# a file by the name of $targetfile
+sub transmitfile
+{
+    my $filehash = shift;
+    my $targetfile = shift;
+
+    if ( defined ( $filehash ) and $filehash eq "deleted" )
+    {
+        $log->warn("filehash is 'deleted'");
+        return;
+    }
+
+    die "Need filehash" unless ( defined ( $filehash ) and $filehash =~ /^[a-zA-Z0-9]{40}$/ );
+
+    my $type = `git-cat-file -t $filehash`;
+    chomp $type;
+
+    die ( "Invalid type '$type' (expected 'blob')" ) unless ( defined ( $type ) and $type eq "blob" );
+
+    my $size = `git-cat-file -s $filehash`;
+    chomp $size;
+
+    $log->debug("transmitfile($filehash) size=$size, type=$type");
+
+    if ( open my $fh, '-|', "git-cat-file", "blob", $filehash )
+    {
+        if ( defined ( $targetfile ) )
+        {
+            open NEWFILE, ">", $targetfile or die("Couldn't open '$targetfile' for writing : $!");
+            print NEWFILE $_ while ( <$fh> );
+            close NEWFILE;
+        } else {
+            print "$size\n";
+            print while ( <$fh> );
+        }
+        close $fh or die ("Couldn't close filehandle for transmitfile()");
+    } else {
+        die("Couldn't execute git-cat-file");
+    }
+}
+
+# This method takes a file name, and returns ( $dirpart, $filepart ) which
+# refers to the directory porition and the file portion of the filename
+# respectively
+sub filenamesplit
+{
+    my $filename = shift;
+
+    my ( $filepart, $dirpart ) = ( $filename, "." );
+    ( $filepart, $dirpart ) = ( $2, $1 ) if ( $filename =~ /(.*)\/(.*)/ );
+    $dirpart .= "/";
+
+    return ( $filepart, $dirpart );
+}
+
+sub filecleanup
+{
+    my $filename = shift;
+
+    return undef unless(defined($filename));
+    if ( $filename =~ /^\// )
+    {
+        print "E absolute filenames '$filename' not supported by server\n";
+        return undef;
+    }
+
+    $filename =~ s/^\.\///g;
+    $filename = $state->{directory} . $filename;
+
+    return $filename;
+}
+
+package GITCVS::log;
+
+####
+#### Copyright The Open University UK - 2006.
+####
+#### Authors: Martyn Smith    <martyn@catalyst.net.nz>
+####          Martin Langhoff <martin@catalyst.net.nz>
+####
+####
+
+use strict;
+use warnings;
+
+=head1 NAME
+
+GITCVS::log
+
+=head1 DESCRIPTION
+
+This module provides very crude logging with a similar interface to
+Log::Log4perl
+
+=head1 METHODS
+
+=cut
+
+=head2 new
+
+Creates a new log object, optionally you can specify a filename here to
+indicate the file to log to. If no log file is specified, you can specifiy one
+later with method setfile, or indicate you no longer want logging with method
+nofile.
+
+Until one of these methods is called, all log calls will buffer messages ready
+to write out.
+
+=cut
+sub new
+{
+    my $class = shift;
+    my $filename = shift;
+
+    my $self = {};
+
+    bless $self, $class;
+
+    if ( defined ( $filename ) )
+    {
+        open $self->{fh}, ">>", $filename or die("Couldn't open '$filename' for writing : $!");
+    }
+
+    return $self;
+}
+
+=head2 setfile
+
+This methods takes a filename, and attempts to open that file as the log file.
+If successful, all buffered data is written out to the file, and any further
+logging is written directly to the file.
+
+=cut
+sub setfile
+{
+    my $self = shift;
+    my $filename = shift;
+
+    if ( defined ( $filename ) )
+    {
+        open $self->{fh}, ">>", $filename or die("Couldn't open '$filename' for writing : $!");
+    }
+
+    return unless ( defined ( $self->{buffer} ) and ref $self->{buffer} eq "ARRAY" );
+
+    while ( my $line = shift @{$self->{buffer}} )
+    {
+        print {$self->{fh}} $line;
+    }
+}
+
+=head2 nofile
+
+This method indicates no logging is going to be used. It flushes any entries in
+the internal buffer, and sets a flag to ensure no further data is put there.
+
+=cut
+sub nofile
+{
+    my $self = shift;
+
+    $self->{nolog} = 1;
+
+    return unless ( defined ( $self->{buffer} ) and ref $self->{buffer} eq "ARRAY" );
+
+    $self->{buffer} = [];
+}
+
+=head2 _logopen
+
+Internal method. Returns true if the log file is open, false otherwise.
+
+=cut
+sub _logopen
+{
+    my $self = shift;
+
+    return 1 if ( defined ( $self->{fh} ) and ref $self->{fh} eq "GLOB" );
+    return 0;
+}
+
+=head2 debug info warn fatal
+
+These four methods are wrappers to _log. They provide the actual interface for
+logging data.
+
+=cut
+sub debug { my $self = shift; $self->_log("debug", @_); }
+sub info  { my $self = shift; $self->_log("info" , @_); }
+sub warn  { my $self = shift; $self->_log("warn" , @_); }
+sub fatal { my $self = shift; $self->_log("fatal", @_); }
+
+=head2 _log
+
+This is an internal method called by the logging functions. It generates a
+timestamp and pushes the logged line either to file, or internal buffer.
+
+=cut
+sub _log
+{
+    my $self = shift;
+    my $level = shift;
+
+    return if ( $self->{nolog} );
+
+    my @time = localtime;
+    my $timestring = sprintf("%4d-%02d-%02d %02d:%02d:%02d : %-5s",
+        $time[5] + 1900,
+        $time[4] + 1,
+        $time[3],
+        $time[2],
+        $time[1],
+        $time[0],
+        uc $level,
+    );
+
+    if ( $self->_logopen )
+    {
+        print {$self->{fh}} $timestring . " - " . join(" ",@_) . "\n";
+    } else {
+        push @{$self->{buffer}}, $timestring . " - " . join(" ",@_) . "\n";
+    }
+}
+
+=head2 DESTROY
+
+This method simply closes the file handle if one is open
+
+=cut
+sub DESTROY
+{
+    my $self = shift;
+
+    if ( $self->_logopen )
+    {
+        close $self->{fh};
+    }
+}
+
+package GITCVS::updater;
+
+####
+#### Copyright The Open University UK - 2006.
+####
+#### Authors: Martyn Smith    <martyn@catalyst.net.nz>
+####          Martin Langhoff <martin@catalyst.net.nz>
+####
+####
+
+use strict;
+use warnings;
+use DBI;
+
+=head1 METHODS
+
+=cut
+
+=head2 new
+
+=cut
+sub new
+{
+    my $class = shift;
+    my $config = shift;
+    my $module = shift;
+    my $log = shift;
+
+    die "Need to specify a git repository" unless ( defined($config) and -d $config );
+    die "Need to specify a module" unless ( defined($module) );
+
+    $class = ref($class) || $class;
+
+    my $self = {};
+
+    bless $self, $class;
+
+    $self->{dbdir} = $config . "/";
+    die "Database dir '$self->{dbdir}' isn't a directory" unless ( defined($self->{dbdir}) and -d $self->{dbdir} );
+
+    $self->{module} = $module;
+    $self->{file} = $self->{dbdir} . "/gitcvs.$module.sqlite";
+
+    $self->{git_path} = $config . "/";
+
+    $self->{log} = $log;
+
+    die "Git repo '$self->{git_path}' doesn't exist" unless ( -d $self->{git_path} );
+
+    $self->{dbh} = DBI->connect("dbi:SQLite:dbname=" . $self->{file},"","");
+
+    $self->{tables} = {};
+    foreach my $table ( $self->{dbh}->tables )
+    {
+        $table =~ s/^"//;
+        $table =~ s/"$//;
+        $self->{tables}{$table} = 1;
+    }
+
+    # Construct the revision table if required
+    unless ( $self->{tables}{revision} )
+    {
+        $self->{dbh}->do("
+            CREATE TABLE revision (
+                name       TEXT NOT NULL,
+                revision   INTEGER NOT NULL,
+                filehash   TEXT NOT NULL,
+                commithash TEXT NOT NULL,
+                author     TEXT NOT NULL,
+                modified   TEXT NOT NULL,
+                mode       TEXT NOT NULL
+            )
+        ");
+    }
+
+    # Construct the revision table if required
+    unless ( $self->{tables}{head} )
+    {
+        $self->{dbh}->do("
+            CREATE TABLE head (
+                name       TEXT NOT NULL,
+                revision   INTEGER NOT NULL,
+                filehash   TEXT NOT NULL,
+                commithash TEXT NOT NULL,
+                author     TEXT NOT NULL,
+                modified   TEXT NOT NULL,
+                mode       TEXT NOT NULL
+            )
+        ");
+    }
+
+    # Construct the properties table if required
+    unless ( $self->{tables}{properties} )
+    {
+        $self->{dbh}->do("
+            CREATE TABLE properties (
+                key        TEXT NOT NULL PRIMARY KEY,
+                value      TEXT
+            )
+        ");
+    }
+
+    # Construct the commitmsgs table if required
+    unless ( $self->{tables}{commitmsgs} )
+    {
+        $self->{dbh}->do("
+            CREATE TABLE commitmsgs (
+                key        TEXT NOT NULL PRIMARY KEY,
+                value      TEXT
+            )
+        ");
+    }
+
+    return $self;
+}
+
+=head2 update
+
+=cut
+sub update
+{
+    my $self = shift;
+
+    # first lets get the commit list
+    $ENV{GIT_DIR} = $self->{git_path};
+
+    # prepare database queries
+    my $db_insert_rev = $self->{dbh}->prepare_cached("INSERT INTO revision (name, revision, filehash, commithash, modified, author, mode) VALUES (?,?,?,?,?,?,?)",{},1);
+    my $db_insert_mergelog = $self->{dbh}->prepare_cached("INSERT INTO commitmsgs (key, value) VALUES (?,?)",{},1);
+    my $db_delete_head = $self->{dbh}->prepare_cached("DELETE FROM head",{},1);
+    my $db_insert_head = $self->{dbh}->prepare_cached("INSERT INTO head (name, revision, filehash, commithash, modified, author, mode) VALUES (?,?,?,?,?,?,?)",{},1);
+
+    my $commitinfo = `git-cat-file commit $self->{module} 2>&1`;
+    unless ( $commitinfo =~ /tree\s+[a-zA-Z0-9]{40}/ )
+    {
+        die("Invalid module '$self->{module}'");
+    }
+
+
+    my $git_log;
+    my $lastcommit = $self->_get_prop("last_commit");
+
+    # Start exclusive lock here...
+    $self->{dbh}->begin_work() or die "Cannot lock database for BEGIN";
+
+    # TODO: log processing is memory bound
+    # if we can parse into a 2nd file that is in reverse order
+    # we can probably do something really efficient
+    my @git_log_params = ('--parents', '--topo-order'); 
+
+    if (defined $lastcommit) {
+        push @git_log_params, "$lastcommit..$self->{module}";
+    } else {
+        push @git_log_params, $self->{module};
+    }
+    open(GITLOG, '-|', 'git-log', @git_log_params) or die "Cannot call git-log: $!";
+
+    my @commits;
+
+    my %commit = ();
+
+    while ( <GITLOG> )
+    {
+        chomp;
+        if (m/^commit\s+(.*)$/) {
+            # on ^commit lines put the just seen commit in the stack
+            # and prime things for the next one
+            if (keys %commit) {
+                my %copy = %commit;
+                unshift @commits, \%copy;
+                %commit = ();
+            }
+            my @parents = split(m/\s+/, $1);
+            $commit{hash} = shift @parents;
+            $commit{parents} = \@parents;
+        } elsif (m/^(\w+?):\s+(.*)$/ && !exists($commit{message})) {
+            # on rfc822-like lines seen before we see any message,
+            # lowercase the entry and put it in the hash as key-value
+            $commit{lc($1)} = $2;
+        } else {
+            # message lines - skip initial empty line
+            # and trim whitespace
+            if (!exists($commit{message}) && m/^\s*$/) {
+                # define it to mark the end of headers
+                $commit{message} = ''; 
+                next;
+            }
+            s/^\s+//; s/\s+$//; # trim ws
+            $commit{message} .= $_ . "\n";
+        }
+    }
+    close GITLOG;
+
+    unshift @commits, \%commit if ( keys %commit );
+
+    # Now all the commits are in the @commits bucket
+    # ordered by time DESC. for each commit that needs processing,
+    # determine whether it's following the last head we've seen or if
+    # it's on its own branch, grab a file list, and add whatever's changed
+    # NOTE: $lastcommit refers to the last commit from previous run
+    #       $lastpicked is the last commit we picked in this run
+    my $lastpicked;
+    my $head = {};
+    if (defined $lastcommit) {
+        $lastpicked = $lastcommit;
+    }
+
+    my $committotal = scalar(@commits);
+    my $commitcount = 0;
+
+    # Load the head table into $head (for cached lookups during the update process)
+    foreach my $file ( @{$self->gethead()} )
+    {
+        $head->{$file->{name}} = $file;
+    }
+
+    foreach my $commit ( @commits )
+    {
+        $self->{log}->debug("GITCVS::updater - Processing commit $commit->{hash} (" . (++$commitcount) . " of $committotal)");
+        if (defined $lastpicked) 
+        {
+            if (!in_array($lastpicked, @{$commit->{parents}}))
+            {
+                # skip, we'll see this delta 
+                # as part of a merge later
+                # warn "skipping off-track  $commit->{hash}\n";
+                next;
+            } elsif (@{$commit->{parents}} > 1) {
+                # it is a merge commit, for each parent that is
+                # not $lastpicked, see if we can get a log
+                # from the merge-base to that parent to put it
+                # in the message as a merge summary.
+                my @parents = @{$commit->{parents}};
+                foreach my $parent (@parents) {
+                    # git-merge-base can potentially (but rarely) throw
+                    # several candidate merge bases. let's assume
+                    # that the first one is the best one.
+                    if ($parent eq $lastpicked) {
+                        next;
+                    }
+                    open my $p, 'git-merge-base '. $lastpicked . ' '
+                    . $parent . '|';
+                    my @output = (<$p>);
+                    close $p;
+                    my $base = join('', @output);
+                    chomp $base;
+                    if ($base) {
+                        my @merged;
+                        # print "want to log between  $base $parent \n";
+                        open(GITLOG, '-|', 'git-log', "$base..$parent")
+                        or die "Cannot call git-log: $!";
+                        my $mergedhash;
+                        while (<GITLOG>) {
+                            chomp;
+                            if (!defined $mergedhash) {
+                                if (m/^commit\s+(.+)$/) {
+                                    $mergedhash = $1;
+                                } else {
+                                    next;
+                                }
+                            } else {
+                                # grab the first line that looks non-rfc822
+                                # aka has content after leading space
+                                if (m/^\s+(\S.*)$/) {
+                                    my $title = $1;
+                                    $title = substr($title,0,100); # truncate
+                                    unshift @merged, "$mergedhash $title";
+                                    undef $mergedhash;
+                                }
+                            }
+                        }
+                        close GITLOG;
+                        if (@merged) {
+                            $commit->{mergemsg} = $commit->{message};
+                            $commit->{mergemsg} .= "\nSummary of merged commits:\n\n";
+                            foreach my $summary (@merged) {
+                                $commit->{mergemsg} .= "\t$summary\n";
+                            }
+                            $commit->{mergemsg} .= "\n\n";
+                            # print "Message for $commit->{hash} \n$commit->{mergemsg}";
+                        }
+                    }
+                }
+            }
+        }
+
+        # convert the date to CVS-happy format
+        $commit->{date} = "$2 $1 $4 $3 $5" if ( $commit->{date} =~ /^\w+\s+(\w+)\s+(\d+)\s+(\d+:\d+:\d+)\s+(\d+)\s+([+-]\d+)$/ );
+
+        if ( defined ( $lastpicked ) )
+        {
+            my $filepipe = open(FILELIST, '-|', 'git-diff-tree', '-r', $lastpicked, $commit->{hash}) or die("Cannot call git-diff-tree : $!");
+            while ( <FILELIST> )
+            {
+                unless ( /^:\d{6}\s+\d{3}(\d)\d{2}\s+[a-zA-Z0-9]{40}\s+([a-zA-Z0-9]{40})\s+(\w)\s+(.*)$/o )
+                {
+                    die("Couldn't process git-diff-tree line : $_");
+                }
+
+                # $log->debug("File mode=$1, hash=$2, change=$3, name=$4");
+
+                my $git_perms = "";
+                $git_perms .= "r" if ( $1 & 4 );
+                $git_perms .= "w" if ( $1 & 2 );
+                $git_perms .= "x" if ( $1 & 1 );
+                $git_perms = "rw" if ( $git_perms eq "" );
+
+                if ( $3 eq "D" )
+                {
+                    #$log->debug("DELETE   $4");
+                    $head->{$4} = {
+                        name => $4,
+                        revision => $head->{$4}{revision} + 1,
+                        filehash => "deleted",
+                        commithash => $commit->{hash},
+                        modified => $commit->{date},
+                        author => $commit->{author},
+                        mode => $git_perms,
+                    };
+                    $db_insert_rev->execute($4, $head->{$4}{revision}, $2, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms);
+                }
+                elsif ( $3 eq "M" )
+                {
+                    #$log->debug("MODIFIED $4");
+                    $head->{$4} = {
+                        name => $4,
+                        revision => $head->{$4}{revision} + 1,
+                        filehash => $2,
+                        commithash => $commit->{hash},
+                        modified => $commit->{date},
+                        author => $commit->{author},
+                        mode => $git_perms,
+                    };
+                    $db_insert_rev->execute($4, $head->{$4}{revision}, $2, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms);
+                }
+                elsif ( $3 eq "A" )
+                {
+                    #$log->debug("ADDED    $4");
+                    $head->{$4} = {
+                        name => $4,
+                        revision => 1,
+                        filehash => $2,
+                        commithash => $commit->{hash},
+                        modified => $commit->{date},
+                        author => $commit->{author},
+                        mode => $git_perms,
+                    };
+                    $db_insert_rev->execute($4, $head->{$4}{revision}, $2, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms);
+                }
+                else
+                {
+                    $log->warn("UNKNOWN FILE CHANGE mode=$1, hash=$2, change=$3, name=$4");
+                    die;
+                }
+            }
+            close FILELIST;
+        } else {
+            # this is used to detect files removed from the repo
+            my $seen_files = {};
+
+            my $filepipe = open(FILELIST, '-|', 'git-ls-tree', '-r', $commit->{hash}) or die("Cannot call git-ls-tree : $!");
+            while ( <FILELIST> )
+            {
+                unless ( /^(\d+)\s+(\w+)\s+([a-zA-Z0-9]+)\s+(.*)$/o )
+                {
+                    die("Couldn't process git-ls-tree line : $_");
+                }
+
+                my ( $git_perms, $git_type, $git_hash, $git_filename ) = ( $1, $2, $3, $4 );
+
+                $seen_files->{$git_filename} = 1;
+
+                my ( $oldhash, $oldrevision, $oldmode ) = (
+                    $head->{$git_filename}{filehash},
+                    $head->{$git_filename}{revision},
+                    $head->{$git_filename}{mode}
+                );
+
+                if ( $git_perms =~ /^\d\d\d(\d)\d\d/o )
+                {
+                    $git_perms = "";
+                    $git_perms .= "r" if ( $1 & 4 );
+                    $git_perms .= "w" if ( $1 & 2 );
+                    $git_perms .= "x" if ( $1 & 1 );
+                } else {
+                    $git_perms = "rw";
+                }
+
+                # unless the file exists with the same hash, we need to update it ...
+                unless ( defined($oldhash) and $oldhash eq $git_hash and defined($oldmode) and $oldmode eq $git_perms )
+                {
+                    my $newrevision = ( $oldrevision or 0 ) + 1;
+
+                    $head->{$git_filename} = {
+                        name => $git_filename,
+                        revision => $newrevision,
+                        filehash => $git_hash,
+                        commithash => $commit->{hash},
+                        modified => $commit->{date},
+                        author => $commit->{author},
+                        mode => $git_perms,
+                    };
+
+
+                    $db_insert_rev->execute($git_filename, $newrevision, $git_hash, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms);
+                }
+            }
+            close FILELIST;
+
+            # Detect deleted files
+            foreach my $file ( keys %$head )
+            {
+                unless ( exists $seen_files->{$file} or $head->{$file}{filehash} eq "deleted" )
+                {
+                    $head->{$file}{revision}++;
+                    $head->{$file}{filehash} = "deleted";
+                    $head->{$file}{commithash} = $commit->{hash};
+                    $head->{$file}{modified} = $commit->{date};
+                    $head->{$file}{author} = $commit->{author};
+
+                    $db_insert_rev->execute($file, $head->{$file}{revision}, $head->{$file}{filehash}, $commit->{hash}, $commit->{date}, $commit->{author}, $head->{$file}{mode});
+                }
+            }
+            # END : "Detect deleted files"
+        }
+
+
+        if (exists $commit->{mergemsg})
+        {
+            $db_insert_mergelog->execute($commit->{hash}, $commit->{mergemsg});
+        }
+
+        $lastpicked = $commit->{hash};
+
+        $self->_set_prop("last_commit", $commit->{hash});
+    }
+
+    $db_delete_head->execute();
+    foreach my $file ( keys %$head )
+    {
+        $db_insert_head->execute(
+            $file,
+            $head->{$file}{revision},
+            $head->{$file}{filehash},
+            $head->{$file}{commithash},
+            $head->{$file}{modified},
+            $head->{$file}{author},
+            $head->{$file}{mode},
+        );
+    }
+    # invalidate the gethead cache
+    $self->{gethead_cache} = undef;
+
+
+    # Ending exclusive lock here
+    $self->{dbh}->commit() or die "Failed to commit changes to SQLite";
+}
+
+sub _headrev
+{
+    my $self = shift;
+    my $filename = shift;
+
+    my $db_query = $self->{dbh}->prepare_cached("SELECT filehash, revision, mode FROM head WHERE name=?",{},1);
+    $db_query->execute($filename);
+    my ( $hash, $revision, $mode ) = $db_query->fetchrow_array;
+
+    return ( $hash, $revision, $mode );
+}
+
+sub _get_prop
+{
+    my $self = shift;
+    my $key = shift;
+
+    my $db_query = $self->{dbh}->prepare_cached("SELECT value FROM properties WHERE key=?",{},1);
+    $db_query->execute($key);
+    my ( $value ) = $db_query->fetchrow_array;
+
+    return $value;
+}
+
+sub _set_prop
+{
+    my $self = shift;
+    my $key = shift;
+    my $value = shift;
+
+    my $db_query = $self->{dbh}->prepare_cached("UPDATE properties SET value=? WHERE key=?",{},1);
+    $db_query->execute($value, $key);
+
+    unless ( $db_query->rows )
+    {
+        $db_query = $self->{dbh}->prepare_cached("INSERT INTO properties (key, value) VALUES (?,?)",{},1);
+        $db_query->execute($key, $value);
+    }
+
+    return $value;
+}
+
+=head2 gethead
+
+=cut
+
+sub gethead
+{
+    my $self = shift;
+
+    return $self->{gethead_cache} if ( defined ( $self->{gethead_cache} ) );
+
+    my $db_query = $self->{dbh}->prepare_cached("SELECT name, filehash, mode, revision, modified, commithash, author FROM head",{},1);
+    $db_query->execute();
+
+    my $tree = [];
+    while ( my $file = $db_query->fetchrow_hashref )
+    {
+        push @$tree, $file;
+    }
+
+    $self->{gethead_cache} = $tree;
+
+    return $tree;
+}
+
+=head2 getlog
+
+=cut
+
+sub getlog
+{
+    my $self = shift;
+    my $filename = shift;
+
+    my $db_query = $self->{dbh}->prepare_cached("SELECT name, filehash, author, mode, revision, modified, commithash FROM revision WHERE name=? ORDER BY revision DESC",{},1);
+    $db_query->execute($filename);
+
+    my $tree = [];
+    while ( my $file = $db_query->fetchrow_hashref )
+    {
+        push @$tree, $file;
+    }
+
+    return $tree;
+}
+
+=head2 getmeta
+
+This function takes a filename (with path) argument and returns a hashref of
+metadata for that file.
+
+=cut
+
+sub getmeta
+{
+    my $self = shift;
+    my $filename = shift;
+    my $revision = shift;
+
+    my $db_query;
+    if ( defined($revision) and $revision =~ /^\d+$/ )
+    {
+        $db_query = $self->{dbh}->prepare_cached("SELECT * FROM revision WHERE name=? AND revision=?",{},1);
+        $db_query->execute($filename, $revision);
+    }
+    elsif ( defined($revision) and $revision =~ /^[a-zA-Z0-9]{40}$/ )
+    {
+        $db_query = $self->{dbh}->prepare_cached("SELECT * FROM revision WHERE name=? AND commithash=?",{},1);
+        $db_query->execute($filename, $revision);
+    } else {
+        $db_query = $self->{dbh}->prepare_cached("SELECT * FROM head WHERE name=?",{},1);
+        $db_query->execute($filename);
+    }
+
+    return $db_query->fetchrow_hashref;
+}
+
+=head2 commitmessage
+
+this function takes a commithash and returns the commit message for that commit
+
+=cut
+sub commitmessage
+{
+    my $self = shift;
+    my $commithash = shift;
+
+    die("Need commithash") unless ( defined($commithash) and $commithash =~ /^[a-zA-Z0-9]{40}$/ );
+
+    my $db_query;
+    $db_query = $self->{dbh}->prepare_cached("SELECT value FROM commitmsgs WHERE key=?",{},1);
+    $db_query->execute($commithash);
+
+    my ( $message ) = $db_query->fetchrow_array;
+
+    if ( defined ( $message ) )
+    {
+        $message .= " " if ( $message =~ /\n$/ );
+        return $message;
+    }
+
+    my @lines = safe_pipe_capture("git-cat-file", "commit", $commithash);
+    shift @lines while ( $lines[0] =~ /\S/ );
+    $message = join("",@lines);
+    $message .= " " if ( $message =~ /\n$/ );
+    return $message;
+}
+
+=head2 gethistory
+
+This function takes a filename (with path) argument and returns an arrayofarrays 
+containing revision,filehash,commithash ordered by revision descending
+
+=cut
+sub gethistory
+{
+    my $self = shift;
+    my $filename = shift;
+
+    my $db_query;
+    $db_query = $self->{dbh}->prepare_cached("SELECT revision, filehash, commithash FROM revision WHERE name=? ORDER BY revision DESC",{},1);
+    $db_query->execute($filename);
+
+    return $db_query->fetchall_arrayref;
+}
+
+=head2 gethistorydense
+
+This function takes a filename (with path) argument and returns an arrayofarrays 
+containing revision,filehash,commithash ordered by revision descending.
+
+This version of gethistory skips deleted entries -- so it is useful for annotate.
+The 'dense' part is a reference to a '--dense' option available for git-rev-list 
+and other git tools that depend on it.
+
+=cut
+sub gethistorydense
+{
+    my $self = shift;
+    my $filename = shift;
+
+    my $db_query;
+    $db_query = $self->{dbh}->prepare_cached("SELECT revision, filehash, commithash FROM revision WHERE name=? AND filehash!='deleted' ORDER BY revision DESC",{},1);
+    $db_query->execute($filename);
+
+    return $db_query->fetchall_arrayref;
+}
+
+=head2 in_array()
+
+from Array::PAT - mimics the in_array() function
+found in PHP. Yuck but works for small arrays.
+
+=cut
+sub in_array
+{
+    my ($check, @array) = @_;
+    my $retval = 0;
+    foreach my $test (@array){
+        if($check eq $test){
+            $retval =  1;
+        }
+    }
+    return $retval;
+}
+
+=head2 safe_pipe_capture
+
+an alterative to `command` that allows input to be passed as an array
+to work around shell problems with weird characters in arguments
+
+=cut
+sub safe_pipe_capture {
+
+    my @output;
+
+    if (my $pid = open my $child, '-|') {
+        @output = (<$child>);
+        close $child or die join(' ',@_).": $! $?";
+    } else {
+        exec(@_) or die "$! $?"; # exec() can fail the executable can't be found
+    }
+    return wantarray ? @output : join('',@output);
+}
+
+
+1;
-- 
1.2.2.gd9a1-dirty

^ permalink raw reply related

* Re: [PATCH] Add new git-rm command with documentation
From: Junio C Hamano @ 2006-02-22  8:19 UTC (permalink / raw)
  To: Carl Worth, Krzysiek Pawlik; +Cc: git
In-Reply-To: <87r75ws48c.wl%cworth@cworth.org>

Carl Worth <cworth@cworth.org> writes:

> +files=$(
> +    if test -f "$GIT_DIR/info/exclude" ; then
> +	git-ls-files \
> +	    --exclude-from="$GIT_DIR/info/exclude" \
> +	    --exclude-per-directory=.gitignore -- "$@"
> +    else
> +	git-ls-files \
> +	--exclude-per-directory=.gitignore -- "$@"
> +    fi | sort | uniq
> +)

Note you are not using -z, which means we will c-quote the funny
characters in the output...

> +case "$show_only" in
> +true)
> +	echo $files
> +	;;

And here $files lack surrounding double quote.  For human
consumption it might be OK, but I somehow care about a bit of
details like this.

> +*)
> +	[[ "$remove_files" = "true" ]] && rm -- $files

Same here. What happens to filenames with IFS letters in them?
"git-add" does not use -z and xargs -0 without a good reason.

> +	git-update-index $index_remove_option $verbose $files
> +	;;
> +esac

Even if rm -- $files were quoted correctly, and tried to remove
the right files, if some of the files failed to disappear for
whatever reason, what happens?

^ permalink raw reply

* Re: [PATCH] relax delta selection filtering in pack-objects
From: Junio C Hamano @ 2006-02-22  7:23 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <7vpslgrkr0.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> I haven't dug into the issue yet, but these four delta series
> seem to break the testsuite.

I bisected.  It is the adler32 one -- since it makes the
generated delta much smaller, it is understandable that it would
interact with the break/rename heuristics.  It is not strictly
breakage in that sense -- we just need to readjust the
heuristics thresholds for those algorithms.

^ permalink raw reply

* Re: [PATCH] relax delta selection filtering in pack-objects
From: Junio C Hamano @ 2006-02-22  6:05 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602212034180.5606@localhost.localdomain>

I haven't dug into the issue yet, but these four delta series
seem to break the testsuite.

^ permalink raw reply

* Re: What does this error message mean?
From: Alan Chandler @ 2006-02-22  4:42 UTC (permalink / raw)
  To: git
In-Reply-To: <20060222004637.1a066569.tihirvon@gmail.com>

On Tuesday 21 February 2006 22:46, Timo Hirvonen wrote:
> On Tue, 21 Feb 2006 22:06:36 +0000
>
> Alan Chandler <alan@chandlerfamily.org.uk> wrote:
> > alan@kanger usermgr[master]$ git commit -a
> > fatal: empty ident  <alan@chandlerfamily.org.uk> not allowed
> >
> > Suddenly started happening, possibly after upgrade (via debian) to git
> > 1.2.1
>
> Your GIT_AUTHOR_NAME is empty?

Yes - seems it was.

Although it has been like that for some months, whereas this is the first time 
git-commit failed.
-- 
Alan Chandler
http://www.chandlerfamily.org.uk
Open Source. It's the difference between trust and antitrust.

^ permalink raw reply

* [PATCH] git-push: Update documentation to describe the no-refspec behavior.
From: Carl Worth @ 2006-02-22  4:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1413 bytes --]

It turns out that the git-push documentation didn't describe what it
would do when not given a refspec, (not on the command line, nor in a
remotes file). This is fairly important for the user who is trying to
understand operations such as:

	git clone git://something/some/where
	# hack, hack, hack
	git push origin

I tracked the mystery behavior down to git-send-pack and lifted the
relevant portion of its documentation up to git-push, (namely that all
refs existing both locally and remotely are updated).

Signed-off-by: Carl Worth <cworth@cworth.org>

---

 Documentation/git-push.txt |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

a42f22171f6f3004e524b45b16a9c5cf0386ccf3
diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 5b89110..6f4a48a 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -43,6 +43,12 @@ to fast forward the remote ref that matc
 the optional plus `+` is used, the remote ref is updated
 even if it does not result in a fast forward update.
 +
+Note: If no explicit refspec is found, (that is neither
+on the command line nor in any Push line of the
+corresponding remotes file---see below), then all the
+refs that exist both on the local side and on the remote
+side are updated.
++
 Some short-cut notations are also supported.
 +
 * `tag <tag>` means the same as `refs/tags/<tag>:refs/tags/<tag>`.
-- 
1.2.2.gfdc0

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related

* Re: How to not download objects more than needed?
From: Junio C Hamano @ 2006-02-22  3:22 UTC (permalink / raw)
  To: Jan Harkes; +Cc: git
In-Reply-To: <20060222031136.GN5000@delft.aura.cs.cmu.edu>

Jan Harkes <jaharkes@cs.cmu.edu> writes:

> Neat, it only fetches tags that refer to things we already have. Hadn't
> checked what the automatic tag fetcher was doing.
>
> So either introduce temporary local refs that can be removed once the
> tags have been fetched,...

I think it is enough just to disable tag following when you are
promiscuously fetching.  That is, do the tag following only if
the main fetch is going to store a ref because it has tracking
branch for the remote side.  Otherwise the remote tags do not
matter and if you really care about them you can ask with --tags.

^ permalink raw reply

* Re: How to not download objects more than needed?
From: Jan Harkes @ 2006-02-22  3:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3bicupgb.fsf@assigned-by-dhcp.cox.net>

On Tue, Feb 21, 2006 at 05:55:48PM -0800, Junio C Hamano wrote:
> Jan Harkes <jaharkes@cs.cmu.edu> writes:
> > On Tue, Feb 21, 2006 at 04:42:34PM -0800, Linus Torvalds wrote:
> >> 
> >> 	git pull git://git.kernel.org/....
> >> 
> >> and the automatic tag following kicks in, it will first have fetched the 
> >> objects once, and then when it tries to fetch the tag objects, it will 
> >> fetch the objects it already fetched _again_ (plus the tags), because it 
> >> will do the same object pull, but the temporary branch (to be merged) will 
> >> never have been written as a branch head.
> >
> > Isn't this easily avoided by fetching the tags first?
> 
> I do not think so.
> 
> Notice how the tag following code uses cat-file to determine if
> the main fetch likely has slurped the object they point at.

Neat, it only fetches tags that refer to things we already have. Hadn't
checked what the automatic tag fetcher was doing.

So either introduce temporary local refs that can be removed once the
tags have been fetched, or else fix it in fetch-pack with the following
change that might do the trick for this case as well. However that one
already got shot down because of possible consistency problems.

    http://marc.theaimsgroup.com/?l=git&m=113030081014456&w=2

Jan

^ permalink raw reply

* Re: Git cannot push to repository with too many tags/heads
From: Junio C Hamano @ 2006-02-22  2:56 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git
In-Reply-To: <7vwtfotaq3.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> "Stephen C. Tweedie" <sct@redhat.com> writes:
>
>> send_pack()
>> then skips all other refs by doing a
>>
>> 		if (!ref->peer_ref)
>> 			continue;
>>
>> Unfortunately, exec_rev_list() is missing this, and it tries to ask
>> git-rev-list for the commit objects of *every* ref on the remote_refs
>> list, even if they have been explicitly excluded by match_refs() and
>> have no peer_ref.  So with this huge repository, I can't even push a
>> single refspec without bumping into the limit of 900 refs.
>
> IIRC, the distinction was deliberate.  send_pack() excludes
> what did not match because it does not want to send them.
> rev_list() adds what we know they have to "do not bother to
> send" list to make the resulting pack smaller.  The time where
> it matters most is when you are pushing a new branch head (or a
> tag).
>
> I think the exec_rev_list logic should be taught to first
> include all the positive refs (i.e. the ones we are going to
> send), and then as many the negative refs (i.e. the ones we know
> they have), from newer to older, as they fit without triggering
> "argument list too long".

That is, something like this.
-- >8 --
Do not give up running rev-list when remote has insanely large number of refs.

---
cd /opt/packrat/playpen/public/in-place/git/git.junio/
git diff
diff --git a/send-pack.c b/send-pack.c
index 990be3f..cfd0eeb 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -37,24 +37,27 @@ static void exec_pack_objects(void)
 
 static void exec_rev_list(struct ref *refs)
 {
+	struct ref *ref;
 	static char *args[1000];
 	int i = 0;
 
 	args[i++] = "rev-list";	/* 0 */
 	args[i++] = "--objects";	/* 1 */
-	while (refs) {
-		char *buf = malloc(100);
+	for (ref = refs; refs; refs = refs->next) {
+		char *buf = malloc(41);
 		if (i > 900)
 			die("git-rev-list environment overflow");
-		if (!is_zero_sha1(refs->old_sha1) &&
-		    has_sha1_file(refs->old_sha1)) {
+		if (!is_zero_sha1(refs->new_sha1)) {
 			args[i++] = buf;
-			snprintf(buf, 50, "^%s", sha1_to_hex(refs->old_sha1));
-			buf += 50;
+			snprintf(buf, 41, "%s", sha1_to_hex(refs->new_sha1));
 		}
-		if (!is_zero_sha1(refs->new_sha1)) {
+	}
+	for (ref = refs; i < 900 && refs; refs = refs->next) {
+		char *buf = malloc(42);
+		if (!is_zero_sha1(refs->old_sha1) &&
+		    has_sha1_file(refs->old_sha1)) {
 			args[i++] = buf;
-			snprintf(buf, 50, "%s", sha1_to_hex(refs->new_sha1));
+			snprintf(buf, 42, "^%s", sha1_to_hex(refs->old_sha1));
 		}
 		refs = refs->next;
 	}

Compilation finished at Tue Feb 21 18:52:49

^ permalink raw reply related

* Re: [PATCH] git-ls-files: Fix, document, and add test for --error-unmatch option.
From: Junio C Hamano @ 2006-02-22  1:59 UTC (permalink / raw)
  To: Carl Worth; +Cc: git
In-Reply-To: <87vev8sajl.wl%cworth@cworth.org>

Carl Worth <cworth@cworth.org> writes:

>  I'm still not sure what the easiest way is for me to provide changes
>  to you. I've been doing it here on the list, like with the current
>  message. But would it be easier for me to send pull requests?

You can decide what is easier for _you_ ;-).

But for me, emailed patches are easier to work on than pull
requests.  I will need to read and understand most of the
changes anyway, unless the change is to an isolated corner of
the system that would affect only one class of users and
breakage will be noticed either immediately or can be left
broken if nobody uses that (e.g. things like contrib/ and some
foreign SCM interfaces).  I would like others on the list to be
able to review the same changes that might hit my tree and
provide extra sets of eyeballs to spot things I might miss
myself alone.

>  For example, with the git-clone failure cleanup I recently did, it
>  seems the new test case I wrote didn't land in your tree.

Sorry, I think I just forgot to apply that one.  I still have
the message so no need to resend.  Thanks for reminding.

^ permalink raw reply

* Re: Git cannot push to repository with too many tags/heads
From: Junio C Hamano @ 2006-02-22  1:59 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: git
In-Reply-To: <1140547568.5509.21.camel@orbit.scot.redhat.com>

"Stephen C. Tweedie" <sct@redhat.com> writes:

> send_pack()
> then skips all other refs by doing a
>
> 		if (!ref->peer_ref)
> 			continue;
>
> Unfortunately, exec_rev_list() is missing this, and it tries to ask
> git-rev-list for the commit objects of *every* ref on the remote_refs
> list, even if they have been explicitly excluded by match_refs() and
> have no peer_ref.  So with this huge repository, I can't even push a
> single refspec without bumping into the limit of 900 refs.

IIRC, the distinction was deliberate.  send_pack() excludes
what did not match because it does not want to send them.
rev_list() adds what we know they have to "do not bother to
send" list to make the resulting pack smaller.  The time where
it matters most is when you are pushing a new branch head (or a
tag).

I think the exec_rev_list logic should be taught to first
include all the positive refs (i.e. the ones we are going to
send), and then as many the negative refs (i.e. the ones we know
they have), from newer to older, as they fit without triggering
"argument list too long".

Another, probably conceptually cleaner alternative might be to
allow rev-list to read from its stdin so that we do not have to
worry about the argument list issues.

^ permalink raw reply

* Re: How to not download objects more than needed?
From: Junio C Hamano @ 2006-02-22  1:55 UTC (permalink / raw)
  To: Jan Harkes; +Cc: git
In-Reply-To: <20060222011338.GL5000@delft.aura.cs.cmu.edu>

Jan Harkes <jaharkes@cs.cmu.edu> writes:

> On Tue, Feb 21, 2006 at 04:42:34PM -0800, Linus Torvalds wrote:
>> 
>> 	git pull git://git.kernel.org/....
>> 
>> and the automatic tag following kicks in, it will first have fetched the 
>> objects once, and then when it tries to fetch the tag objects, it will 
>> fetch the objects it already fetched _again_ (plus the tags), because it 
>> will do the same object pull, but the temporary branch (to be merged) will 
>> never have been written as a branch head.
>
> Isn't this easily avoided by fetching the tags first?

I do not think so.

Notice how the tag following code uses cat-file to determine if
the main fetch likely has slurped the object they point at.

^ permalink raw reply

* [PATCH] diff-delta: produce optimal pack data
From: Nicolas Pitre @ 2006-02-22  1:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


Indexing based on adler32 has a match precision based on the block size 
(currently 16).  Lowering the block size would produce smaller deltas 
but the indexing memory and computing cost increases significantly.

For optimal delta result the indexing block size should be 3 with an 
increment of 1 (instead of 16 and 16).  With such low params the adler32 
becomes a clear overhead increasing the time for git-repack by a factor 
of 3.  And with such small blocks the adler 32 is not very useful as the 
whole of the block bits can be used directly.

This patch replaces the adler32 with an open coded index value based on 
3 characters directly.  This gives sufficient bits for hashing and 
allows for optimal delta with reasonable CPU cycles.

The resulting packs are 6% smaller on average.  The increase in CPU time 
is about 25%.  But this cost is now hidden by the delta reuse patch 
while the saving on data transfers is always there.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

 diff-delta.c |   77 +++++++++++++++++++++++-----------------------------------
 1 files changed, 30 insertions(+), 47 deletions(-)

54aa50fb403981a9292453b76d894a79da9698de
diff --git a/diff-delta.c b/diff-delta.c
index 2ed5984..27f83a0 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -20,21 +20,11 @@
 
 #include <stdlib.h>
 #include <string.h>
-#include <zlib.h>
 #include "delta.h"
 
 
-/* block size: min = 16, max = 64k, power of 2 */
-#define BLK_SIZE 16
-
-#define MIN(a, b) ((a) < (b) ? (a) : (b))
-
-#define GR_PRIME 0x9e370001
-#define HASH(v, shift) (((unsigned int)(v) * GR_PRIME) >> (shift))
-
 struct index {
 	const unsigned char *ptr;
-	unsigned int val;
 	struct index *next;
 };
 
@@ -42,21 +32,21 @@ static struct index ** delta_index(const
 				   unsigned long bufsize,
 				   unsigned int *hash_shift)
 {
-	unsigned int hsize, hshift, entries, blksize, i;
+	unsigned long hsize;
+	unsigned int hshift, i;
 	const unsigned char *data;
 	struct index *entry, **hash;
 	void *mem;
 
 	/* determine index hash size */
-	entries = (bufsize + BLK_SIZE - 1) / BLK_SIZE;
-	hsize = entries / 4;
-	for (i = 4; (1 << i) < hsize && i < 16; i++);
+	hsize = bufsize / 4;
+	for (i = 8; (1 << i) < hsize && i < 16; i++);
 	hsize = 1 << i;
-	hshift = 32 - i;
+	hshift = i - 8;
 	*hash_shift = hshift;
 
 	/* allocate lookup index */
-	mem = malloc(hsize * sizeof(*hash) + entries * sizeof(*entry));
+	mem = malloc(hsize * sizeof(*hash) + bufsize * sizeof(*entry));
 	if (!mem)
 		return NULL;
 	hash = mem;
@@ -64,17 +54,12 @@ static struct index ** delta_index(const
 	memset(hash, 0, hsize * sizeof(*hash));
 
 	/* then populate it */
-	data = buf + entries * BLK_SIZE - BLK_SIZE;
-	blksize = bufsize - (data - buf);
-	while (data >= buf) {
-		unsigned int val = adler32(0, data, blksize);
-		i = HASH(val, hshift);
-		entry->ptr = data;
-		entry->val = val;
+	data = buf + bufsize - 2;
+	while (data > buf) {
+		entry->ptr = --data;
+		i = data[0] ^ data[1] ^ (data[2] << hshift);
 		entry->next = hash[i];
 		hash[i] = entry++;
-		blksize = BLK_SIZE;
-		data -= BLK_SIZE;
  	}
 
 	return hash;
@@ -141,29 +126,27 @@ void *diff_delta(void *from_buf, unsigne
 
 	while (data < top) {
 		unsigned int moff = 0, msize = 0;
-		unsigned int blksize = MIN(top - data, BLK_SIZE);
-		unsigned int val = adler32(0, data, blksize);
-		i = HASH(val, hash_shift);
-		for (entry = hash[i]; entry; entry = entry->next) {
-			const unsigned char *ref = entry->ptr;
-			const unsigned char *src = data;
-			unsigned int ref_size = ref_top - ref;
-			if (entry->val != val)
-				continue;
-			if (ref_size > top - src)
-				ref_size = top - src;
-			while (ref_size && *src++ == *ref) {
-				ref++;
-				ref_size--;
-			}
-			ref_size = ref - entry->ptr;
-			if (ref_size > msize) {
-				/* this is our best match so far */
-				moff = entry->ptr - ref_data;
-				msize = ref_size;
-				if (msize >= 0x10000) {
-					msize = 0x10000;
+		if (data + 2 < top) {
+			i = data[0] ^ data[1] ^ (data[2] << hash_shift);
+			for (entry = hash[i]; entry; entry = entry->next) {
+				const unsigned char *ref = entry->ptr;
+				const unsigned char *src = data;
+				unsigned int ref_size = ref_top - ref;
+				if (ref_size > top - src)
+					ref_size = top - src;
+				if (ref_size > 0x10000)
+					ref_size = 0x10000;
+				if (ref_size <= msize)
 					break;
+				while (ref_size && *src++ == *ref) {
+					ref++;
+					ref_size--;
+				}
+				ref_size = ref - entry->ptr;
+				if (msize < ref - entry->ptr) {
+					/* this is our best match so far */
+					msize = ref - entry->ptr;
+					moff = entry->ptr - ref_data;
 				}
 			}
 		}
-- 
1.2.2.g6643-dirty

^ permalink raw reply related

* [PATCH] diff-delta: big code simplification
From: Nicolas Pitre @ 2006-02-22  1:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


This is much smaller and hopefully clearer code now.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

 diff-delta.c |  235 +++++++++++++++++++++-------------------------------------
 1 files changed, 87 insertions(+), 148 deletions(-)

167f85034d634944e90f6bff683d74ec680bb331
diff --git a/diff-delta.c b/diff-delta.c
index ac992e2..2ed5984 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -19,8 +19,9 @@
  */
 
 #include <stdlib.h>
+#include <string.h>
+#include <zlib.h>
 #include "delta.h"
-#include "zlib.h"
 
 
 /* block size: min = 16, max = 64k, power of 2 */
@@ -29,123 +30,54 @@
 #define MIN(a, b) ((a) < (b) ? (a) : (b))
 
 #define GR_PRIME 0x9e370001
-#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
-	
-static unsigned int hashbits(unsigned int size)
-{
-	unsigned int val = 1, bits = 0;
-	while (val < size && bits < 32) {
-		val <<= 1;
-	       	bits++;
-	}
-	return bits ? bits: 1;
-}
-
-typedef struct s_chanode {
-	struct s_chanode *next;
-	int icurr;
-} chanode_t;
-
-typedef struct s_chastore {
-	int isize, nsize;
-	chanode_t *ancur;
-} chastore_t;
-
-static void cha_init(chastore_t *cha, int isize, int icount)
-{
-	cha->isize = isize;
-	cha->nsize = icount * isize;
-	cha->ancur = NULL;
-}
-
-static void *cha_alloc(chastore_t *cha)
-{
-	chanode_t *ancur;
-	void *data;
-
-	ancur = cha->ancur;
-	if (!ancur || ancur->icurr == cha->nsize) {
-		ancur = malloc(sizeof(chanode_t) + cha->nsize);
-		if (!ancur)
-			return NULL;
-		ancur->icurr = 0;
-		ancur->next = cha->ancur;
-		cha->ancur = ancur;
-	}
-
-	data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
-	ancur->icurr += cha->isize;
-	return data;
-}
-
-static void cha_free(chastore_t *cha)
-{
-	chanode_t *cur = cha->ancur;
-	while (cur) {
-		chanode_t *tmp = cur;
-		cur = cur->next;
-		free(tmp);
-	}
-}
+#define HASH(v, shift) (((unsigned int)(v) * GR_PRIME) >> (shift))
 
-typedef struct s_bdrecord {
-	struct s_bdrecord *next;
-	unsigned int fp;
+struct index {
 	const unsigned char *ptr;
-} bdrecord_t;
-
-typedef struct s_bdfile {
-	chastore_t cha;
-	unsigned int fphbits;
-	bdrecord_t **fphash;
-} bdfile_t;
-
-static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
-{
-	unsigned int fphbits;
-	int i, hsize;
-	const unsigned char *data, *top;
-	bdrecord_t *brec;
-	bdrecord_t **fphash;
-
-	fphbits = hashbits(bufsize / BLK_SIZE + 1);
-	hsize = 1 << fphbits;
-	fphash = malloc(hsize * sizeof(bdrecord_t *));
-	if (!fphash)
-		return -1;
-	for (i = 0; i < hsize; i++)
-		fphash[i] = NULL;
-	cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
-
-	top = buf + bufsize;
-	data = buf + (bufsize / BLK_SIZE) * BLK_SIZE;
-	if (data == top)
+	unsigned int val;
+	struct index *next;
+};
+
+static struct index ** delta_index(const unsigned char *buf,
+				   unsigned long bufsize,
+				   unsigned int *hash_shift)
+{
+	unsigned int hsize, hshift, entries, blksize, i;
+	const unsigned char *data;
+	struct index *entry, **hash;
+	void *mem;
+
+	/* determine index hash size */
+	entries = (bufsize + BLK_SIZE - 1) / BLK_SIZE;
+	hsize = entries / 4;
+	for (i = 4; (1 << i) < hsize && i < 16; i++);
+	hsize = 1 << i;
+	hshift = 32 - i;
+	*hash_shift = hshift;
+
+	/* allocate lookup index */
+	mem = malloc(hsize * sizeof(*hash) + entries * sizeof(*entry));
+	if (!mem)
+		return NULL;
+	hash = mem;
+	entry = mem + hsize * sizeof(*hash);
+	memset(hash, 0, hsize * sizeof(*hash));
+
+	/* then populate it */
+	data = buf + entries * BLK_SIZE - BLK_SIZE;
+	blksize = bufsize - (data - buf);
+	while (data >= buf) {
+		unsigned int val = adler32(0, data, blksize);
+		i = HASH(val, hshift);
+		entry->ptr = data;
+		entry->val = val;
+		entry->next = hash[i];
+		hash[i] = entry++;
+		blksize = BLK_SIZE;
 		data -= BLK_SIZE;
+ 	}
 
-	for ( ; data >= buf; data -= BLK_SIZE) {
-		brec = cha_alloc(&bdf->cha);
-		if (!brec) {
-			cha_free(&bdf->cha);
-			free(fphash);
-			return -1;
-		}
-		brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
-		brec->ptr = data;
-		i = HASH(brec->fp, fphbits);
-		brec->next = fphash[i];
-		fphash[i] = brec;
-	}
-
-	bdf->fphbits = fphbits;
-	bdf->fphash = fphash;
-
-	return 0;
-}
-
-static void delta_cleanup(bdfile_t *bdf)
-{
-	free(bdf->fphash);
-	cha_free(&bdf->cha);
+	return hash;
 }
 
 /* provide the size of the copy opcode given the block offset and size */
@@ -161,23 +93,24 @@ void *diff_delta(void *from_buf, unsigne
 		 unsigned long *delta_size,
 		 unsigned long max_size)
 {
-	unsigned int i, outpos, outsize, inscnt, csize, msize, moff;
-	unsigned int fp;
-	const unsigned char *ref_data, *ref_top, *data, *top, *ptr1, *ptr2;
-	unsigned char *out, *orig;
-	bdrecord_t *brec;
-	bdfile_t bdf;
+	unsigned int i, outpos, outsize, inscnt, hash_shift;
+	const unsigned char *ref_data, *ref_top, *data, *top;
+	unsigned char *out;
+	struct index *entry, **hash;
 
-	if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf))
+	if (!from_size || !to_size)
+		return NULL;
+	hash = delta_index(from_buf, from_size, &hash_shift);
+	if (!hash)
 		return NULL;
-	
+
 	outpos = 0;
 	outsize = 8192;
 	if (max_size && outsize >= max_size)
 		outsize = max_size + MAX_OP_SIZE + 1;
 	out = malloc(outsize);
 	if (!out) {
-		delta_cleanup(&bdf);
+		free(hash);
 		return NULL;
 	}
 
@@ -205,28 +138,32 @@ void *diff_delta(void *from_buf, unsigne
 	}
 
 	inscnt = 0;
-	moff = 0;
+
 	while (data < top) {
-		msize = 0;
-		fp = adler32(0, data, MIN(top - data, BLK_SIZE));
-		i = HASH(fp, bdf.fphbits);
-		for (brec = bdf.fphash[i]; brec; brec = brec->next) {
-			if (brec->fp == fp) {
-				csize = ref_top - brec->ptr;
-				if (csize > top - data)
-					csize = top - data;
-				for (ptr1 = brec->ptr, ptr2 = data; 
-				     csize && *ptr1 == *ptr2;
-				     csize--, ptr1++, ptr2++);
-
-				csize = ptr1 - brec->ptr;
-				if (csize > msize) {
-					moff = brec->ptr - ref_data;
-					msize = csize;
-					if (msize >= 0x10000) {
-						msize = 0x10000;
-						break;
-					}
+		unsigned int moff = 0, msize = 0;
+		unsigned int blksize = MIN(top - data, BLK_SIZE);
+		unsigned int val = adler32(0, data, blksize);
+		i = HASH(val, hash_shift);
+		for (entry = hash[i]; entry; entry = entry->next) {
+			const unsigned char *ref = entry->ptr;
+			const unsigned char *src = data;
+			unsigned int ref_size = ref_top - ref;
+			if (entry->val != val)
+				continue;
+			if (ref_size > top - src)
+				ref_size = top - src;
+			while (ref_size && *src++ == *ref) {
+				ref++;
+				ref_size--;
+			}
+			ref_size = ref - entry->ptr;
+			if (ref_size > msize) {
+				/* this is our best match so far */
+				moff = entry->ptr - ref_data;
+				msize = ref_size;
+				if (msize >= 0x10000) {
+					msize = 0x10000;
+					break;
 				}
 			}
 		}
@@ -241,13 +178,15 @@ void *diff_delta(void *from_buf, unsigne
 				inscnt = 0;
 			}
 		} else {
+			unsigned char *op;
+
 			if (inscnt) {
 				out[outpos - inscnt - 1] = inscnt;
 				inscnt = 0;
 			}
 
 			data += msize;
-			orig = out + outpos++;
+			op = out + outpos++;
 			i = 0x80;
 
 			if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
@@ -262,7 +201,7 @@ void *diff_delta(void *from_buf, unsigne
 			msize >>= 8;
 			if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
 
-			*orig = i;
+			*op = i;
 		}
 
 		if (outpos >= outsize - MAX_OP_SIZE) {
@@ -276,7 +215,7 @@ void *diff_delta(void *from_buf, unsigne
 				out = realloc(out, outsize);
 			if (!out) {
 				free(tmp);
-				delta_cleanup(&bdf);
+				free(hash);
 				return NULL;
 			}
 		}
@@ -285,7 +224,7 @@ void *diff_delta(void *from_buf, unsigne
 	if (inscnt)
 		out[outpos - inscnt - 1] = inscnt;
 
-	delta_cleanup(&bdf);
+	free(hash);
 	*delta_size = outpos;
 	return out;
 }
-- 
1.2.2.g6643-dirty

^ permalink raw reply related

* [PATCH] diff-delta: fold two special tests into one plus cleanups
From: Nicolas Pitre @ 2006-02-22  1:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


Testing for realloc and size limit can be done with only one test per 
loop. Make it so and fix a theoretical off-by-one comparison error in 
the process.

The output buffer memory allocation is also bounded by max_size when 
specified.

Finally make some variable unsigned to allow the handling of files up to 
4GB in size instead of 2GB.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

 diff-delta.c |   24 ++++++++++++++----------
 1 files changed, 14 insertions(+), 10 deletions(-)

95c1d1f82a8e36ab1e46b8186ecb34f441914961
diff --git a/diff-delta.c b/diff-delta.c
index c2f656a..ac992e2 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -148,16 +148,20 @@ static void delta_cleanup(bdfile_t *bdf)
 	cha_free(&bdf->cha);
 }
 
+/* provide the size of the copy opcode given the block offset and size */
 #define COPYOP_SIZE(o, s) \
     (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
      !!(s & 0xff) + !!(s & 0xff00) + 1)
 
+/* the maximum size for any opcode */
+#define MAX_OP_SIZE COPYOP_SIZE(0xffffffff, 0xffffffff)
+
 void *diff_delta(void *from_buf, unsigned long from_size,
 		 void *to_buf, unsigned long to_size,
 		 unsigned long *delta_size,
 		 unsigned long max_size)
 {
-	int i, outpos, outsize, inscnt, csize, msize, moff;
+	unsigned int i, outpos, outsize, inscnt, csize, msize, moff;
 	unsigned int fp;
 	const unsigned char *ref_data, *ref_top, *data, *top, *ptr1, *ptr2;
 	unsigned char *out, *orig;
@@ -169,6 +173,8 @@ void *diff_delta(void *from_buf, unsigne
 	
 	outpos = 0;
 	outsize = 8192;
+	if (max_size && outsize >= max_size)
+		outsize = max_size + MAX_OP_SIZE + 1;
 	out = malloc(outsize);
 	if (!out) {
 		delta_cleanup(&bdf);
@@ -259,17 +265,15 @@ void *diff_delta(void *from_buf, unsigne
 			*orig = i;
 		}
 
-		if (max_size && outpos > max_size) {
-			free(out);
-			delta_cleanup(&bdf);
-			return NULL;
-		}
-
-		/* next time around the largest possible output is 1 + 4 + 3 */
-		if (outpos > outsize - 8) {
+		if (outpos >= outsize - MAX_OP_SIZE) {
 			void *tmp = out;
 			outsize = outsize * 3 / 2;
-			out = realloc(out, outsize);
+			if (max_size && outsize >= max_size)
+				outsize = max_size + MAX_OP_SIZE + 1;
+			if (max_size && outpos > max_size)
+				out = NULL;
+			else
+				out = realloc(out, outsize);
 			if (!out) {
 				free(tmp);
 				delta_cleanup(&bdf);
-- 
1.2.2.g6643-dirty

^ permalink raw reply related

* [PATCH] relax delta selection filtering in pack-objects
From: Nicolas Pitre @ 2006-02-22  1:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


This change provides a 8% saving on the pack size with a 4% CPU time 
increase for git-repack -a on the current git archive.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

 pack-objects.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

2aed7126f9b44d9ef953e8a1cbeab34356410842
diff --git a/pack-objects.c b/pack-objects.c
index ceb107f..4f8814d 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -748,11 +748,10 @@ static int try_delta(struct unpacked *cu
 	}
 
 	size = cur_entry->size;
-	if (size < 50)
-		return -1;
 	oldsize = old_entry->size;
 	sizediff = oldsize > size ? oldsize - size : size - oldsize;
-	if (sizediff > size / 8)
+
+	if (size < 50)
 		return -1;
 	if (old_entry->depth >= max_depth)
 		return 0;
-- 
1.2.2.g6643-dirty

^ permalink raw reply related

* Re: How to not download objects more than needed?
From: Jan Harkes @ 2006-02-22  1:13 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0602211635450.30245@g5.osdl.org>

On Tue, Feb 21, 2006 at 04:42:34PM -0800, Linus Torvalds wrote:
> 
> 	git pull git://git.kernel.org/....
> 
> and the automatic tag following kicks in, it will first have fetched the 
> objects once, and then when it tries to fetch the tag objects, it will 
> fetch the objects it already fetched _again_ (plus the tags), because it 
> will do the same object pull, but the temporary branch (to be merged) will 
> never have been written as a branch head.

Isn't this easily avoided by fetching the tags first?

Jan


diff --git a/git-fetch.sh b/git-fetch.sh
index b4325d9..9c6748f 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -363,8 +363,6 @@ fetch_main () {
 
 }
 
-fetch_main "$reflist"
-
 # automated tag following
 case "$no_tags$tags" in
 '')
@@ -389,6 +387,8 @@ case "$no_tags$tags" in
 	esac
 esac
 
+fetch_main "$reflist"
+
 # If the original head was empty (i.e. no "master" yet), or
 # if we were told not to worry, we do not have to check.
 case ",$update_head_ok,$orig_head," in

^ permalink raw reply related

* [PATCH] git-rebase: Clarify usage statement and copy it into the actual documentation.
From: Carl Worth @ 2006-02-22  1:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3818 bytes --]

I found a paper thin man page for git-rebase, but was quite happy to
see something much more useful in the usage statement of the script
when I went there to find out how this thing worked. Here it is
cleaned up slightly and expanded a bit into the actual documentation.

Signed-off-by: Carl Worth <cworth@cworth.org>

---

 Drat. I just realized I've been neglecting the Signed-off-by: line in
 my last few patches. Junio, if you'd like me to re-send those with
 that fixed, just let me know.

 Documentation/git-rebase.txt |   44 ++++++++++++++++++++++++++++++++++++++++--
 git-rebase.sh                |   24 +++++++++++++----------
 2 files changed, 56 insertions(+), 12 deletions(-)

f78b6a97afd562d2cf2d30488892ff893dd81a3b
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 16c158f..f037d12 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -7,14 +7,54 @@ git-rebase - Rebase local commits to new
 
 SYNOPSIS
 --------
-'git-rebase' <upstream> [<head>]
+'git-rebase' [--onto <newbase>] <upstream> [<branch>]
 
 DESCRIPTION
 -----------
-Rebases local commits to the new head of the upstream tree.
+git-rebase applies to <upstream> (or optionally to <newbase>) commits
+from <branch> that do not appear in <upstream>. When <branch> is not
+specified it defaults to the current branch (HEAD).
+
+When git-rebase is complete, <branch> will be updated to point to the
+newly created line of commit objects, so the previous line will not be
+accessible unless there are other references to it already.
+
+Assume the following history exists and the current branch is "topic":
+
+          A---B---C topic
+         /
+    D---E---F---G master
+
+From this point, the result of the following commands:
+
+    git-rebase master
+    git-rebase master topic
+
+would be:
+
+                  A'--B'--C' topic
+                 /
+    D---E---F---G master
+
+While, starting from the same point, the result of the following
+commands:
+
+    git-rebase --onto master~1 master
+    git-rebase --onto master~1 master topic
+
+would be:
+
+              A'--B'--C' topic
+             /
+    D---E---F---G master
 
 OPTIONS
 -------
+<newbase>::
+	Starting point at which to create the new commits. If the
+	--onto option is not specified, the starting point is
+	<upstream>.
+
 <upstream>::
 	Upstream branch to compare against.
 
diff --git a/git-rebase.sh b/git-rebase.sh
index 21c3d83..211bf68 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -4,24 +4,28 @@
 #
 
 USAGE='[--onto <newbase>] <upstream> [<branch>]'
-LONG_USAGE='If <branch> is specified, switch to that branch first.  Then,
-extract commits in the current branch that are not in <upstream>,
-and reconstruct the current on top of <upstream>, discarding the original
-development history.  If --onto <newbase> is specified, the history is
-reconstructed on top of <newbase>, instead of <upstream>.  For example,
-while on "topic" branch:
+LONG_USAGE='git-rebase applies to <upstream> (or optionally to <newbase>) commits
+from <branch> that do not appear in <upstream>. When <branch> is not
+specified it defaults to the current branch (HEAD).
+
+When git-rebase is complete, <branch> will be updated to point to the
+newly created line of commit objects, so the previous line will not be
+accessible unless there are other references to it already.
+
+Assuming the following history:
 
           A---B---C topic
          /
     D---E---F---G master
 
-	$ '"$0"' --onto master~1 master topic
+The result of the following command:
 
-would rewrite the history to look like this:
+    git-rebase --onto master~1 master topic
 
+  would be:
 
-	      A'\''--B'\''--C'\'' topic
-	     /
+              A'\''--B'\''--C'\'' topic
+             /
     D---E---F---G master
 '
 
-- 
1.2.2.g0a5f5-dirty


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related

* [PATCH] cg-* manpages: Handle more than one `cg-cmd` per line
From: Jonas Fonseca @ 2006-02-22  0:48 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

... which is the case for cg-update.

Signed-off-by: Jonas Fonseca <fonseca@diku.dk>

---
author Jonas Fonseca <fonseca@diku.dk> Wed, 22 Feb 2006 01:45:18 +0100

 Documentation/make-cg-asciidoc |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/make-cg-asciidoc b/Documentation/make-cg-asciidoc
index bc31809..126d4eb 100755
--- a/Documentation/make-cg-asciidoc
+++ b/Documentation/make-cg-asciidoc
@@ -42,7 +42,7 @@ CAPTION=$(echo "$HEADER" | head -n 1 | t
 # were referenced as "`cg-command`". This way references from cg-* combos in
 # code listings will be ignored.
 BODY=$(echo "$HEADER" | sed '0,/^$/d' \
-		      | sed 's/`\(cg-[a-z-]\+\)`/gitlink:\1[1]/;s/^\(-.*\)::.*/\1::/')
+		      | sed 's/`\(cg-[a-z-]\+\)`/gitlink:\1[1]/g;s/^\(-.*\)::.*/\1::/')
 
 DESCRIPTION=
 OPTIONS=

-- 
Jonas Fonseca

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox