Git development
 help / color / mirror / Atom feed
* Re: [RFC/PATCH] remote: add new sync command
From: Jeff King @ 2011-11-30  7:01 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git
In-Reply-To: <CAMP44s3StLjb9KgBkRrG4nPqJD_ZjeSyFUwuP4A3b+mJKBgHWQ@mail.gmail.com>

On Tue, Nov 22, 2011 at 01:47:36AM +0200, Felipe Contreras wrote:

> > I didn't mean "you didn't define a mirror in your config". I meant "your
> > question is not well-defined, because you haven't defined the term
> > 'mirror'". IOW, I can't answer your question without knowing exactly
> > what you meant.
> 
> I wasn't the one that brought the mirror up, you did:

Yes, because that is the most related concept in current git. But I
thought you were asking from the perspective of a clueless user, and I
don't know what that clueless user meant by "mirror".

Anyway, I don't think it's important.

I read over your whole message, and I feel like our discussion isn't
really moving towards an actual goal. Junio and I both mentioned that in
the long-term, features like this should be part of "push", not
"remote". Do you want to try revising your patch in that direction?
That will give us something concrete to talk about (and hopefully
apply).

-Peff

^ permalink raw reply

* Re: Uniform branch coloring
From: Jeff King @ 2011-11-30  6:55 UTC (permalink / raw)
  To: Nicolas Dudebout; +Cc: git
In-Reply-To: <CA+TMmK=476h1YyVcegWia2F+1bhOQyNLf-150qF1bnU8Wu3qJA@mail.gmail.com>

On Thu, Nov 17, 2011 at 02:08:38PM -0500, Nicolas Dudebout wrote:

> I was looking at the coloring of git branches in the output of
> different commands and had an idea.
> It seems to me that the color coding for branches should be uniform throughout.

That sounds like a reasonable goal. At the very least all of the spots
should be configurable, so you can make it uniform if you want to.

And then having unset config fallback to a "master" switch (e.g., if
color.branch.current is unset, use color.default.branch or whatever).
Which I think is close to what you write below.

> I have identified ways to address some of these problems:
> [...]

These all looked like reasonable goals to me. You will have to do a
little refactoring, I'm sure.

> What are your thoughts on that?

Go for it. Your ideas sound sane. You might hit some ugly complexities
in the implementation, but you won't know until you dig in and try. And
patches are much easier to review (and apply!) than ideas. :)

-Peff

^ permalink raw reply

* Re: [PATCH] honour GIT_ASKPASS for querying username in git-svn
From: Jeff King @ 2011-11-30  6:44 UTC (permalink / raw)
  To: Sven Strickroth; +Cc: git, gitster
In-Reply-To: <4ED0CE8B.70205@tu-clausthal.de>

On Sat, Nov 26, 2011 at 12:33:31PM +0100, Sven Strickroth wrote:

> diff --git a/git-svn.perl b/git-svn.perl
> index e30df22..d7aeb11 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -4361,7 +4361,14 @@ prompt:
>  	      "(R)eject, accept (t)emporarily or accept (p)ermanently? " :
>  	      "(R)eject or accept (t)emporarily? ";
>  	STDERR->flush;
> -	$choice = lc(substr(<STDIN> || 'R', 0, 1));
> +	if (exists $ENV{GIT_ASKPASS}) {
> +		print STDERR "\n";
> +		open(PH, "-|", $ENV{GIT_ASKPASS}, "Certificate unknown");
> +		$choice = lc(substr(<PH> || 'R', 0, 1));
> +		close(PH);
> +	} else {
> +		$choice = lc(substr(<STDIN> || 'R', 0, 1));
> +	}

Why is the prompt simply "Certificate unknown"? Shouldn't it mention
that the right answers are "(R)eject, accept (t)emporarily, ..."?

That aside, I think this is an improvement over the current code. But
having just been looking at regular git's askpass code, I notice there
are some subtle differences:

  1. Regular git will also respect SSH_ASKPASS

  2. Regular git will ignore an askpass variable that is set but empty.

Perhaps git-svn should be refactored to have a reusable "prompt"
function that respects askpass and tries to behave like C git? It could
even go into the Git perl module.

-Peff

^ permalink raw reply

* Re: log: option "--follow" not the default for a single file?
From: Jeff King @ 2011-11-30  6:37 UTC (permalink / raw)
  To: Ralf Thielow; +Cc: git
In-Reply-To: <CAN0XMOJsiw0c4j_LooRrj80CVVy0omGLUcjDg4QoD4mNS3y1GA@mail.gmail.com>

On Tue, Nov 29, 2011 at 07:25:47PM +0100, Ralf Thielow wrote:

> Why is the option "--follow" not the default if the log-command
> is used with a single file? Many GUI tools don't show me the
> full history of a single file if there was a rename in it.

A few reasons, I think:

  1. There is no such thing as giving log a single file. You give it a
     pathspec. That may happen to match a single file in the current
     revision, but to git it is actually a prefix-limiting pattern, and
     "git log foo" will match "foo" if it is a file, or "foo/bar" if it
     was a directory in a previous revision.

  2. It can be slower than a regular traversal, since we have to do
     rename detection whenever we see a deletion. In practice I don't
     think it is much slower, though (mainly because files don't get
     moved all that much).

  3. The implementation is a bit hacky.  In particular, things like
     history simplification and parent rewriting won't work. So a
     GUI tool that tries to draw a graph will make a very ugly graph.

     Try looking at "gitk $file" and then "gitk --follow $file" in some
     repository (if you don't have gitk, try "git log --graph
     --oneline"). Even with a linear history, things look ugly: there
     are big gaps in the graph. But in a branch-y history, it's
     downright unreadable, as you have parallel disconnected lines of
     history next to each other.

I think (1) and (3) are fixable by improving the implementation (making
--follow respect pathspecs and not single files, and having it work at
the history-simplification level). And then you might argue that (2)
isn't a big deal, and we can turn on --follow all the time.

-Peff

^ permalink raw reply

* [PATCH 3/3] Implement fast hash-collision detection
From: Bill Zaumen @ 2011-11-30  6:30 UTC (permalink / raw)
  To: git; +Cc: gitster

Maintains a database of CRCs of Git objects to allow SHA-1 hash
collisions to be detected with high probability (1 - 1/2^32) and with
little computational overhead.  The CRCs cover the content of Git
objects, but not the header.  For loose objects, these are stored in
subdirectories of GIT_DIRECTORY/objects/crcs, with each subdirectory's
name consisting of the first two hexadecimal digits of the
corresponding object's SHA-1 hash.  For each pack file, FILE.pack, the
CRCs are stored in a FILE.mds, in the same order as the SHA-1 hashes
that appear in FILE.idx.  Checks for hash collisions are made whenever
a new loose object is created.

A new capability, "mds-check" has been added to git-fetch-pack,
git-upload-pack, git-send-pack, and git-receive-pack to allow a CRC to
be used in addition to SHA-1 hash values. For commits, a CRC of the
CRCs of each blob reachable from a commit's tree is also used.  The
result is that hash collisions can be detected during fetch, pull, and
push operations.

A few git commands had additional command-line arguments added:
count-objects, index-pack, and verify-pack.  Please read
Documentation/technical/collision-detect.txt for further details
and the documentation for each command for details on the new
arguments.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 Makefile                  |   32 +++
 builtin/count-objects.c   |   92 +++++++++-
 builtin/fetch-pack.c      |   42 ++++-
 builtin/index-pack.c      |  140 +++++++++++++--
 builtin/init-db.c         |   17 ++-
 builtin/pack-objects.c    |   57 ++++++-
 builtin/pack-redundant.c  |   14 +-
 builtin/prune-packed.c    |   21 ++-
 builtin/prune.c           |    1 +
 builtin/receive-pack.c    |  120 +++++++++++-
 builtin/send-pack.c       |   45 +++++-
 builtin/verify-pack.c     |   14 ++-
 cache.h                   |   61 ++++++-
 commit.c                  |  109 +++++++++++
 commit.h                  |    8 +
 environment.c             |   57 ++++++-
 fast-import.c             |   74 +++++++-
 git-repack.sh             |   12 +-
 git.c                     |   15 ++-
 hex.c                     |   58 ++++++-
 http.c                    |   19 ++-
 pack-write.c              |   95 +++++++++
 pack.h                    |    3 +
 sha1_file.c               |  461 ++++++++++++++++++++++++++++++++++++++++-----
 t/t0000-basic.sh          |   13 +-
 t/t5300-pack-object.sh    |   17 ++-
 t/t5301-sliding-window.sh |   14 +-
 t/t5302-pack-index.sh     |    6 +-
 t/t5304-prune.sh          |   13 +-
 t/t5500-fetch-pack.sh     |   10 +-
 t/t5510-fetch.sh          |   12 +-
 t/t9300-fast-import.sh    |    8 +-
 upload-pack.c             |   28 +++-
 33 files changed, 1550 insertions(+), 138 deletions(-)

diff --git a/Makefile b/Makefile
index b1c80a6..3dda96b 100644
--- a/Makefile
+++ b/Makefile
@@ -260,6 +260,19 @@ all::
 # dependency rules.
 #
 # Define NATIVE_CRLF if your platform uses CRLF for line endings.
+#
+# Define CRCDB to indicate the database type for the DB mapping SHA1
+# values to the CRCs of the objects git stores.  Valid values are 0
+# for storing each local-object crc in its own file, and 1 for storing
+# each local-object crc in its own file and additionally using a
+# GDBM implementation of packdb to store CRCs that are not in their
+# on files as an aid for generating pack mds files.  [more to be added
+# as needed].
+#
+# Note: the values for CRCDB are determined by preprocessor directives
+# defined in crcdb.h
+#
+CRCDB = 0
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -513,6 +526,7 @@ LIB_H += blob.h
 LIB_H += builtin.h
 LIB_H += cache.h
 LIB_H += cache-tree.h
+LIB_H += crcdb.h
 LIB_H += color.h
 LIB_H += commit.h
 LIB_H += compat/bswap.h
@@ -805,6 +819,7 @@ BUILTIN_OBJS += builtin/write-tree.o
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
 EXTLIBS =
 
+
 #
 # Platform specific tweaks
 #
@@ -1662,6 +1677,23 @@ ifeq ($(PYTHON_PATH),)
 NO_PYTHON=NoThanks
 endif
 
+ifdef CRCDB
+BASIC_CFLAGS += -DBLOB_MDS_CHECK
+endif
+
+ifeq ($(CRCDB), 0)
+BASIC_CFLAGS += -DCRCDB=$(CRCDB)
+CRCDB_SRC = objd-crcdb.c
+LIB_OBJS += objd-crcdb.o
+endif
+
+ifeq ($(CRCDB), 1)
+BASIC_CFLAGS += -DCRCDB=$(CRCDB) -DPACKDB
+CRCDB_SRC = objd-crcdb.c gdbm-packdb.c
+LIB_OBJS += objd-crcdb.o gdbm-packdb.o
+EXTLIBS += -lgdbm
+endif
+
 QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
 QUIET_SUBDIR1  =
 
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index c37cb98..898d970 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -8,6 +8,12 @@
 #include "dir.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "crcdb.h"
+
+int mdsmode = 0;
+unsigned long has_loose_mds = 0;
+unsigned long loose_mds_missing = 0;
+
 
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
@@ -53,20 +59,37 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 			continue;
 		}
 		(*loose)++;
-		if (!verbose)
+		if (!verbose) {
+			if (mdsmode) {
+				if (get_sha1_hex(hex, sha1)) {
+					die("internal error");
+				} else if (crcdb_lookup(NULL, sha1, NULL) > 0) {
+					has_loose_mds++;
+				} else {
+					loose_mds_missing++;
+				}
+			}
 			continue;
+		}
 		memcpy(hex, path+len, 2);
 		memcpy(hex+2, ent->d_name, 38);
 		hex[40] = 0;
 		if (get_sha1_hex(hex, sha1))
 			die("internal error");
+		if (mdsmode) {
+			if (crcdb_lookup(NULL, sha1, NULL) > 0) {
+				has_loose_mds++;
+			} else {
+				loose_mds_missing++;
+			}
+		}
 		if (has_sha1_pack(sha1))
 			(*packed_loose)++;
 	}
 }
 
 static char const * const count_objects_usage[] = {
-	"git count-objects [-v]",
+	"git count-objects [-v] [-M]",
 	NULL
 };
 
@@ -80,6 +103,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	off_t loose_size = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, "be verbose"),
+		OPT_BOOLEAN('M', "count-md", &mdsmode,
+			    "count MDs (Message Digests)"),
 		OPT_END(),
 	};
 
@@ -90,6 +115,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	memcpy(path, objdir, len);
 	if (len && objdir[len-1] != '/')
 		path[len++] = '/';
+	crcdb_open(NULL);
 	for (i = 0; i < 256; i++) {
 		DIR *d;
 		sprintf(path + len, "%02x", i);
@@ -100,10 +126,16 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 			      &loose, &loose_size, &packed_loose, &garbage);
 		closedir(d);
 	}
+	crcdb_close(NULL);
 	if (verbose) {
 		struct packed_git *p;
 		unsigned long num_pack = 0;
 		off_t size_pack = 0;
+		unsigned long mds_mismatched = 0;
+		unsigned long missing_mdsfile_count = 0;
+		unsigned long mds_count = 0;
+		int wsize = 0;
+		uint32_t crc;
 		if (!packed_git)
 			prepare_packed_git();
 		for (p = packed_git; p; p = p->next) {
@@ -114,6 +146,40 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 			packed += p->num_objects;
 			size_pack += p->pack_size + p->index_size;
 			num_pack++;
+			if (!mdsmode)
+				continue;
+			if (open_pack_mds(p)) {
+				missing_mdsfile_count++;
+				continue;
+			}
+			/*
+			 * Assume mds version 1 for now. We check that
+			 * the mds file has the right size and record if it
+			 * doesn't.  If it is the right size, we go through
+			 * all the entries and count the number of sha1 hashes
+			 * for which there is a recorded CRC.  We do not
+			 * check if the CRC is the right one for the
+			 * corresponding object: run git pack-verify to do
+			 * that.
+			 */
+			if (p->mds_size > 7) {
+			  wsize = ((unsigned char *)(p->mds_data))[7] * 4;
+			}
+			if (p->mds_size == (size_t)8 +
+			    (((size_t)
+			      ((p->num_objects)/4 + (p->num_objects % 4 != 0))
+			      * (size_t)4 * (size_t)(1 + wsize)) +
+			     (size_t)(20 * 2))) {
+				for (i = 0; i < p->num_objects; i++) {
+					mds_count +=
+					  (nth_packed_object_objcrc32(p,
+								      i,
+								      &crc)
+					   == 1);
+				}
+			} else {
+			  mds_mismatched++;
+			}
 		}
 		printf("count: %lu\n", loose);
 		printf("size: %lu\n", (unsigned long) (loose_size / 1024));
@@ -122,9 +188,31 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		printf("size-pack: %lu\n", (unsigned long) (size_pack / 1024));
 		printf("prune-packable: %lu\n", packed_loose);
 		printf("garbage: %lu\n", garbage);
+		if (mdsmode) {
+			if (missing_mdsfile_count) {
+				printf("missing MD (Message Digest) "
+				       "files: %lu\n",
+					missing_mdsfile_count);
+			}
+			if (mds_mismatched)
+				printf("MD (Message Digest) files with"
+				       " wrong size: %lu "
+				       "(file extension = .mds)\n",
+					mds_mismatched);
+			if (packed != mds_count) {
+				printf("missing MD (Message Digest)"
+				       " count: %lu\n",
+				       packed - mds_count);
+			}
+		}
 	}
 	else
 		printf("%lu objects, %lu kilobytes\n",
 		       loose, (unsigned long) (loose_size / 1024));
+	if (mdsmode && loose_mds_missing) {
+		assert(loose == (loose_mds_missing + has_loose_mds));
+		printf("%lu loose objects with no MD (Message Digest)\n",
+		       loose_mds_missing);
+	}
 	return 0;
 }
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c6bc8eb..7472021 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -18,6 +18,8 @@ static int prefer_ofs_delta = 1;
 static int no_done;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
+static int mds_check = 0;
+
 static struct fetch_pack_args args = {
 	/* .uploadpack = */ "git-upload-pack",
 };
@@ -390,9 +392,38 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 	flushes = 0;
 	retval = -1;
 	while ((sha1 = get_rev())) {
-		packet_buf_write(&req_buf, "have %s\n", sha1_to_hex(sha1));
-		if (args.verbose)
-			fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		if (mds_check) {
+			uint32_t objcrc;
+			if (has_sha1_file_crc(sha1, &objcrc)) {
+				uint32_t blobs_crc;
+				int has_blobs_crc = !get_blob_mds(sha1,
+								 &blobs_crc);
+				if (has_blobs_crc)
+					packet_buf_write(&req_buf,
+							 "have "
+							 "%s-%8.8x-%8.8x\n",
+							 sha1_to_hex(sha1),
+							 ntohl(objcrc),
+							 ntohl(blobs_crc));
+				else
+					packet_buf_write(&req_buf,
+							 "have "
+							 "%s-%8.8x\n",
+							 sha1_to_hex(sha1),
+							 ntohl(objcrc));
+
+			} else {
+				packet_buf_write(&req_buf, "have %s\n",
+						 sha1_to_hex(sha1));
+			}
+			if (args.verbose)
+				fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		} else {
+			packet_buf_write(&req_buf, "have %s\n",
+					 sha1_to_hex(sha1));
+			if (args.verbose)
+				fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		}
 		in_vain++;
 		if (flush_at <= ++count) {
 			int ack;
@@ -802,6 +833,11 @@ static struct ref *do_fetch_pack(int fd[2],
 			fprintf(stderr, "Server supports ofs-delta\n");
 	} else
 		prefer_ofs_delta = 0;
+	if (server_supports("mds-check")) {
+		if (args.verbose)
+			fprintf(stderr, "Server supports mds-check\n");
+		mds_check = 1;
+	}
 	if (everything_local(&ref, nr_match, match)) {
 		packet_flush(fd[1]);
 		goto all_done;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 0945adb..b49f6ee 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1,3 +1,4 @@
+#include <unistd.h>
 #include "builtin.h"
 #include "delta.h"
 #include "pack.h"
@@ -23,6 +24,14 @@ struct object_entry {
 	int base_object_no;
 };
 
+static int sha1_compare(const void *_a, const void *_b)
+{
+	struct object_entry *a = (struct object_entry *)_a;
+	struct object_entry *b = (struct object_entry *)_b;
+	return hashcmp(a->idx.sha1, b->idx.sha1);
+}
+
+
 union delta_base {
 	unsigned char sha1[20];
 	off_t offset;
@@ -447,9 +456,10 @@ static void find_delta_children(const union delta_base *base,
 }
 
 static void sha1_object(const void *data, unsigned long size,
-			enum object_type type, unsigned char *sha1)
+			enum object_type type, unsigned char *sha1,
+			uint32_t *objcrc32p)
 {
-	hash_sha1_file(data, size, typename(type), sha1);
+	hash_sha1_file_extended(data, size, typename(type), sha1, objcrc32p);
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
@@ -549,7 +559,8 @@ static void resolve_delta(struct object_entry *delta_obj,
 	if (!result->data)
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
-		    delta_obj->idx.sha1);
+		    delta_obj->idx.sha1, &(delta_obj->idx.objcrc32));
+	delta_obj->idx.has_objcrc32 = 1;
 	nr_resolved_deltas++;
 }
 
@@ -643,8 +654,12 @@ static void parse_pack_objects(unsigned char *sha1)
 			nr_deltas++;
 			delta->obj_no = i;
 			delta++;
-		} else
-			sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+		} else {
+		  sha1_object(data, obj->size, obj->type, obj->idx.sha1,
+			      &(obj->idx.objcrc32));
+		  obj->idx.has_objcrc32 = 1;
+		}
+
 		free(data);
 		display_progress(progress, i+1);
 	}
@@ -804,6 +819,7 @@ static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved)
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
+		  const char *final_mds_name, const char *curr_mds_name,
 		  const char *keep_name, const char *keep_msg,
 		  unsigned char *sha1)
 {
@@ -866,6 +882,18 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	} else
 		chmod(final_index_name, 0444);
 
+	if (final_mds_name != curr_mds_name) {
+		if (!final_mds_name) {
+			snprintf(name, sizeof(name), "%s/pack/pack-%s.mds",
+				 get_object_directory(), sha1_to_hex(sha1));
+			final_mds_name = name;
+		}
+		if (move_temp_to_file(curr_mds_name, final_mds_name))
+			die("cannot store mds file");
+	} else
+		chmod(final_mds_name, 0444);
+
+
 	if (!from_stdin) {
 		printf("%s\n", sha1_to_hex(sha1));
 	} else {
@@ -972,18 +1000,46 @@ static void read_idx_option(struct pack_idx_option *opts, const char *pack_name)
 	free(p);
 }
 
-static void show_pack_info(int stat_only)
+static void show_pack_info(int stat, int stat_only, int show_mds,
+			   int mds_file_exists, const char *path)
 {
 	int i, baseobjects = nr_objects - nr_deltas;
 	unsigned long *chain_histogram = NULL;
+	void *data = NULL;
+	size_t mds_size = 0;
+	struct packed_git pg;
+
+	if (mds_file_exists) {
+		int fd = git_open_noatime(path);
+		size_t required_size = 0;
+		struct stat st;
+		if (fd >= 0) {
+			if (fstat(fd, &st)) {
+				close(fd);
+			} else {
+				mds_size = xsize_t(st.st_size);
+				data = xmmap(NULL, mds_size,
+						PROT_READ, MAP_PRIVATE, fd, 0);
+				close(fd);
+				required_size = required_git_packed_mds_size
+					(path, data, nr_objects, mds_size);
+				if (required_size == 0) {
+					munmap(data, mds_size);
+					data = NULL;
+				}
+			}
+		}
+		if (data == NULL) mds_file_exists = 0;
+		pg.mds_data = data;
+	}
 
-	if (deepest_delta)
+
+	if (stat && deepest_delta)
 		chain_histogram = xcalloc(deepest_delta, sizeof(unsigned long));
 
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
-
-		if (is_delta_type(obj->type))
+		if (chain_histogram && is_delta_type(obj->type))
 			chain_histogram[obj->delta_depth - 1]++;
 		if (stat_only)
 			continue;
@@ -992,12 +1048,40 @@ static void show_pack_info(int stat_only)
 		       typename(obj->real_type), obj->size,
 		       (unsigned long)(obj[1].idx.offset - obj->idx.offset),
 		       (uintmax_t)obj->idx.offset);
+		if (show_mds) {
+			if (mds_file_exists) {
+				uint32_t crc;
+				int has_crc = nth_packed_object_objcrc32
+					(&pg, i, &crc);
+				crc = ntohl(crc);
+				if (has_crc) {
+					printf(" md=0x%8.8x", crc);
+					if (obj->idx.has_objcrc32) {
+						uint32_t ecrc =
+						  ntohl(obj->idx.objcrc32);
+						if (ecrc != crc) {
+							printf(" (should "
+							       "be 0x%x) ",
+							       ecrc);
+
+						}
+					}
+				} else {
+					printf(" <no md>      ");
+				}
+			} else {
+				printf(" <no md>      ");
+			}
+		}
 		if (is_delta_type(obj->type)) {
 			struct object_entry *bobj = &objects[obj->base_object_no];
 			printf(" %u %s", obj->delta_depth, sha1_to_hex(bobj->idx.sha1));
 		}
 		putchar('\n');
 	}
+	if (data) munmap(data, mds_size);
+	if (!stat)
+		return;
 
 	if (baseobjects)
 		printf("non delta: %d object%s\n",
@@ -1015,10 +1099,12 @@ static void show_pack_info(int stat_only)
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, stat = 0;
-	const char *curr_pack, *curr_index;
-	const char *index_name = NULL, *pack_name = NULL;
+	int show_mds = 0;
+	const char *curr_pack, *curr_index, *curr_mds;
+	const char *index_name = NULL, *pack_name = NULL, *mds_name = NULL;;
 	const char *keep_name = NULL, *keep_msg = NULL;
 	char *index_name_buf = NULL, *keep_name_buf = NULL;
+	char *mds_name_buf = NULL;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
@@ -1052,6 +1138,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				verify = 1;
 				stat = 1;
 				stat_only = 1;
+			} else if (!strcmp(arg, "-M") ||
+				   !strcmp(arg, "--show-mds")) {
+				verify = 1;
+				show_mds = 1;
 			} else if (!strcmp(arg, "--keep")) {
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
@@ -1075,6 +1165,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				if (index_name || (i+1) >= argc)
 					usage(index_pack_usage);
 				index_name = argv[++i];
+			} else if (!strcmp(arg, "-m")) {
+				if (mds_name || (i+1) >= argc)
+					usage(index_pack_usage);
+				mds_name = argv[++i];
 			} else if (!prefixcmp(arg, "--index-version=")) {
 				char *c;
 				opts.version = strtoul(arg + 16, &c, 10);
@@ -1108,6 +1202,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		strcpy(index_name_buf + len - 5, ".idx");
 		index_name = index_name_buf;
 	}
+	if (!mds_name && pack_name) {
+		int len = strlen(pack_name);
+		if (!has_extension(pack_name, ".pack"))
+			die("packfile name '%s' does not end with '.pack'",
+			    pack_name);
+		mds_name_buf = xmalloc(len);
+		memcpy(mds_name_buf, pack_name, len - 5);
+		strcpy(mds_name_buf + len - 5, ".mds");
+		mds_name = mds_name_buf;
+	}
 	if (keep_msg && !keep_name && pack_name) {
 		int len = strlen(pack_name);
 		if (!has_extension(pack_name, ".pack"))
@@ -1168,24 +1272,34 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		check_objects();
 
-	if (stat)
-		show_pack_info(stat_only);
+	if (stat || show_mds) {
+		int mds_file_exists = !access(mds_name, R_OK);
+		if (mds_file_exists && show_mds) {
+		  qsort (objects, nr_objects, sizeof (struct object_entry),
+			 sha1_compare);
+		}
+		show_pack_info(stat, stat_only, show_mds, mds_file_exists,
+			       mds_name);
+	}
 
 	idx_objects = xmalloc((nr_objects) * sizeof(struct pack_idx_entry *));
 	for (i = 0; i < nr_objects; i++)
 		idx_objects[i] = &objects[i].idx;
 	curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_sha1);
+	curr_mds = write_mds_file(mds_name, idx_objects, nr_objects, &opts,pack_sha1);
 	free(idx_objects);
 
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
+		      mds_name, curr_mds,
 		      keep_name, keep_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
 	free(objects);
 	free(index_name_buf);
+	free(mds_name_buf);
 	free(keep_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index d07554c..9ff2e88 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -7,6 +7,10 @@
 #include "builtin.h"
 #include "exec_cmd.h"
 #include "parse-options.h"
+#include "crcdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 #ifndef DEFAULT_GIT_TEMPLATE_DIR
 #define DEFAULT_GIT_TEMPLATE_DIR "/usr/share/git-core/templates"
@@ -308,7 +312,18 @@ static void create_object_directory(void)
 	safe_create_dir(path, 1);
 	strcpy(path+len, "/info");
 	safe_create_dir(path, 1);
-
+#if (CRCDB == 0) || (CRCDB == 1)
+	strcpy(path+len, "/crcs");
+	safe_create_dir(path, 1);
+#endif
+	/*
+	 * In case the call in environent.c failed to initialize
+	 * (missing directory?) or somehow wasn't called at all.
+	 */
+	crcdb_init();
+#ifdef PACKDB
+	packdb_init();
+#endif
 	free(path);
 }
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 824ecee..3ed1e71 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -17,6 +17,7 @@
 #include "progress.h"
 #include "refs.h"
 #include "thread-utils.h"
+#include "crcdb.h"
 
 static const char pack_usage[] =
   "git pack-objects [ -q | --progress | --all-progress ]\n"
@@ -529,6 +530,8 @@ static struct object_entry **compute_write_order(void)
 		objects[i].filled = 0;
 		objects[i].delta_child = NULL;
 		objects[i].delta_sibling = NULL;
+		objects[i].idx.has_objcrc32 = 0;
+		objects[i].idx.objcrc32 = 0;
 	}
 
 	/*
@@ -663,10 +666,13 @@ static void write_pack_file(void)
 		if (!pack_to_stdout) {
 			struct stat st;
 			const char *idx_tmp_name;
+			const char *mds_tmp_name;
 			char tmpname[PATH_MAX];
 
 			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
 						      &pack_idx_opts, sha1);
+			mds_tmp_name = write_mds_file(NULL, written_list, nr_written,
+						      &pack_idx_opts, sha1);
 
 			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
 				 base_name, sha1_to_hex(sha1));
@@ -704,7 +710,16 @@ static void write_pack_file(void)
 			if (rename(idx_tmp_name, tmpname))
 				die_errno("unable to rename temporary index file");
 
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.mds",
+				 base_name, sha1_to_hex(sha1));
+			if (adjust_shared_perm(mds_tmp_name))
+				die_errno("unable to make temporary mds file readable");
+			if (rename(mds_tmp_name, tmpname))
+				die_errno("unable to rename temporary mds file");
+
+
 			free((void *) idx_tmp_name);
+			free((void *) mds_tmp_name);
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
@@ -821,6 +836,8 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 	off_t found_offset = 0;
 	int ix;
 	unsigned hash = name_hash(name);
+	int hasobjcrc32;
+	uint32_t objcrc32;
 
 	ix = nr_objects ? locate_object_entry_hash(sha1) : -1;
 	if (ix >= 0) {
@@ -837,7 +854,10 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 		return 0;
 
 	for (p = packed_git; p; p = p->next) {
-		off_t offset = find_pack_entry_one(sha1, p);
+		hasobjcrc32 = 0;
+		objcrc32 = 0;
+		off_t offset = find_pack_entry_one_extended(sha1, p,
+						   &hasobjcrc32, &objcrc32);
 		if (offset) {
 			if (!found_pack) {
 				if (!is_pack_valid(p)) {
@@ -865,7 +885,37 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 
 	entry = objects + nr_objects++;
 	memset(entry, 0, sizeof(*entry));
+	if (hasobjcrc32 == 0) {
+		/*
+		 * We pick up CRCs for local objects (we already checked the
+		 * pack files).  If that doesn't work, we compute it from
+		 * scratch (which should occur rarely if at all).
+		 */
+		crcdb_open(NULL);
+		switch (crcdb_lookup(NULL, sha1, &objcrc32)) {
+		case 1:
+			hasobjcrc32 = 1;
+			break;
+		default:
+			hasobjcrc32 = 0;
+		}
+		crcdb_close(NULL);
+		if (!hasobjcrc32) {
+		  enum object_type type;
+		  unsigned long size;
+		  unsigned char sbuf[20];
+		  void *buf = read_sha1_file(sha1, &type, &size);
+		  if (buf) {
+			const char *stype = typename(type);
+			hash_sha1_file_extended(buf, size, stype, sbuf,
+						&objcrc32);
+			hasobjcrc32 = 1;
+		  }
+		}
+	}
 	hashcpy(entry->idx.sha1, sha1);
+	entry->idx.has_objcrc32 = hasobjcrc32;
+	entry->idx.objcrc32 = objcrc32;
 	entry->hash = hash;
 	if (type)
 		entry->type = type;
@@ -2148,7 +2198,8 @@ struct in_pack {
 
 static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack)
 {
-	in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->sha1, p);
+	in_pack->array[in_pack->nr].offset =
+		find_pack_entry_one(object->sha1, p);
 	in_pack->array[in_pack->nr].object = object;
 	in_pack->nr++;
 }
@@ -2220,7 +2271,7 @@ static int has_sha1_pack_kept_or_nonlocal(const unsigned char *sha1)
 
 	while (p) {
 		if ((!p->pack_local || p->pack_keep) &&
-			find_pack_entry_one(sha1, p)) {
+		     find_pack_entry_one(sha1, p)) {
 			last_found = p;
 			return 1;
 		}
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index f5c6afc..c09397c 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -6,6 +6,7 @@
 *
 */
 
+#include <unistd.h>
 #include "builtin.h"
 
 #define BLKSIZE 512
@@ -682,9 +683,16 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 	}
 	pl = red = pack_list_difference(local_packs, min);
 	while (pl) {
-		printf("%s\n%s\n",
-		       sha1_pack_index_name(pl->pack->sha1),
-		       pl->pack->pack_name);
+		char *mdsfile = sha1_pack_mds_name(pl->pack->sha1);
+		if (!access(mdsfile, F_OK)) {
+		  printf("%s\n%s\n%s\n", mdsfile,
+			       sha1_pack_index_name(pl->pack->sha1),
+			       pl->pack->pack_name);
+		} else {
+			printf("%s\n%s\n",
+			       sha1_pack_index_name(pl->pack->sha1),
+			       pl->pack->pack_name);
+		}
 		pl = pl->next;
 	}
 	if (verbose)
diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index f9463de..5c682dd 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -43,8 +43,11 @@ void prune_packed_objects(int opts)
 {
 	int i;
 	static char pathname[PATH_MAX];
+	static char mds_pathname[PATH_MAX];
 	const char *dir = get_object_directory();
+	const char *mdsdir = get_object_crc_node();
 	int len = strlen(dir);
+	int mdslen = strlen(mdsdir);
 
 	if (opts == VERBOSE)
 		progress = start_progress_delay("Removing duplicate objects",
@@ -55,16 +58,26 @@ void prune_packed_objects(int opts)
 	memcpy(pathname, dir, len);
 	if (len && pathname[len-1] != '/')
 		pathname[len++] = '/';
+	memcpy(mds_pathname, mdsdir, mdslen);
+	if (mdslen && mds_pathname[mdslen-1] != '/')
+		mds_pathname[mdslen++] = '/';
 	for (i = 0; i < 256; i++) {
 		DIR *d;
+		DIR *mds_d;
 
 		display_progress(progress, i + 1);
 		sprintf(pathname + len, "%02x/", i);
 		d = opendir(pathname);
-		if (!d)
-			continue;
-		prune_dir(i, d, pathname, len + 3, opts);
-		closedir(d);
+		sprintf(mds_pathname + len, "%02x/", i);
+		mds_d = opendir(mds_pathname);
+		if (d) {
+			prune_dir(i, d, pathname, len + 3, opts);
+			closedir(d);
+		}
+		if (mds_d) {
+			prune_dir(i, mds_d, mds_pathname, mdslen + 3, opts);
+			closedir(mds_d);
+		}
 	}
 	stop_progress(&progress);
 }
diff --git a/builtin/prune.c b/builtin/prune.c
index e65690b..e9fbc99 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -154,6 +154,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	}
 	mark_reachable_objects(&revs, 1);
 	prune_object_dir(get_object_directory());
+	prune_object_dir(get_object_crc_node());
 
 	prune_packed_objects(show_only);
 	remove_temporary_files(get_object_directory());
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 7ec68a1..a7b6474 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -121,7 +121,8 @@ static int show_ref(const char *path, const unsigned char *sha1, int flag, void
 	else
 		packet_write(1, "%s %s%c%s%s\n",
 			     sha1_to_hex(sha1), path, 0,
-			     " report-status delete-refs side-band-64k",
+			     " report-status delete-refs side-band-64k"
+			     " mds-check",
 			     prefer_ofs_delta ? " ofs-delta" : "");
 	sent_capabilities = 1;
 	return 0;
@@ -712,32 +713,128 @@ static struct command *read_head_info(void)
 		struct command *cmd;
 		char *refname;
 		int len, reflen;
+		int has_old_sha1_crc = 0;
+		int has_new_sha1_crc = 0;
+		int has_old_blob_crc = 0;
+		int has_new_blob_crc = 0;
+		uint32_t old_sha1_crc = 0;
+		uint32_t new_sha1_crc = 0;
+		uint32_t new_blob_crc = 0;
+		uint32_t old_blob_crc = 0;
+		uint32_t objcrc32;
+		int old_hashlen = 40;
+		int new_hashlen = 40;
+		int hashlen = 80;
+		static const int crc_field_len = 9;
 
 		len = packet_read_line(0, line, sizeof(line));
 		if (!len)
 			break;
 		if (line[len-1] == '\n')
 			line[--len] = 0;
-		if (len < 83 ||
-		    line[40] != ' ' ||
-		    line[81] != ' ' ||
-		    get_sha1_hex(line, old_sha1) ||
-		    get_sha1_hex(line + 41, new_sha1))
+		if (len > (old_hashlen + crc_field_len) &&
+		    line[old_hashlen] == '-') {
+			old_hashlen += crc_field_len;
+			hashlen += crc_field_len;
+		}
+		if (len > (old_hashlen + crc_field_len) &&
+		    line[old_hashlen] == '-') {
+			old_hashlen += crc_field_len;
+			hashlen += crc_field_len;
+		}
+		if (len > (hashlen + crc_field_len + 1) &&
+		    line[hashlen+1] == '-') {
+			new_hashlen += crc_field_len;
+			hashlen += crc_field_len;
+		}
+		if (len > (hashlen + crc_field_len + 1) &&
+		    line[hashlen+1] == '-') {
+			new_hashlen += crc_field_len;
+			hashlen += crc_field_len;
+		}
+
+		if (old_hashlen != 40 && old_hashlen != 49
+		    && old_hashlen != 58) {
+			die("protocol error: expected old/new/ref, got '%s'",
+			    line);
+		}
+
+		if (new_hashlen != 40 && new_hashlen != 49 &&
+		    new_hashlen != 58) {
+			die("protocol error: expected old/new/ref, got '%s'",
+			    line);
+		}
+
+		if (len < hashlen + 3 ||
+		    line[old_hashlen] != ' ' ||
+		    line[hashlen + 1] != ' ' ||
+		    get_sha1_hex_crc(line, old_sha1,
+				     &has_old_sha1_crc, &old_sha1_crc,
+				     &has_old_blob_crc,
+				     &old_blob_crc) ||
+		    get_sha1_hex_crc(line + old_hashlen + 1, new_sha1,
+				     &has_new_sha1_crc, &new_sha1_crc,
+				     &has_new_blob_crc,
+				     &new_blob_crc))
 			die("protocol error: expected old/new/ref, got '%s'",
 			    line);
 
-		refname = line + 82;
+		if (has_old_sha1_crc &&
+		    has_sha1_file_crc(old_sha1, &objcrc32)) {
+			if (old_sha1_crc != objcrc32) {
+				die("hash collision for %s",
+				    sha1_to_hex(old_sha1));
+			}
+		}
+
+		if (has_old_sha1_crc && has_old_blob_crc) {
+			uint32_t blobcrc;
+			int has_blob_crc = !get_blob_mds(old_sha1, &blobcrc);
+			if (has_blob_crc) {
+				if (old_blob_crc != blobcrc)
+					die("hash collision for %s",
+					    sha1_to_hex(old_sha1));
+			} else {
+#ifdef BLOB_MDS_CHECK
+				push_mds_check(old_sha1, old_blob_crc);
+#endif
+			}
+		}
+
+		if (has_new_sha1_crc &&
+		    has_sha1_file_crc(new_sha1, &objcrc32)) {
+			if (new_sha1_crc != objcrc32) {
+				die("hash collision for %s",
+				    sha1_to_hex(new_sha1));
+			}
+		}
+
+		if (has_new_sha1_crc && has_new_blob_crc) {
+			uint32_t blobcrc;
+			int has_blob_crc = !get_blob_mds(new_sha1, &blobcrc);
+			if (has_blob_crc) {
+				if (new_blob_crc != blobcrc)
+					die("hash collision for %s",
+					    sha1_to_hex(new_sha1));
+			} else  {
+#ifdef BLOB_MDS_CHECK
+				push_mds_check(new_sha1, new_blob_crc);
+#endif
+			}
+		}
+
+		refname = line + hashlen + 2;
 		reflen = strlen(refname);
-		if (reflen + 82 < len) {
+		if (reflen + hashlen + 2 < len) {
 			if (strstr(refname + reflen + 1, "report-status"))
 				report_status = 1;
 			if (strstr(refname + reflen + 1, "side-band-64k"))
 				use_sideband = LARGE_PACKET_MAX;
 		}
-		cmd = xcalloc(1, sizeof(struct command) + len - 80);
+		cmd = xcalloc(1, sizeof(struct command) + len - hashlen);
 		hashcpy(cmd->old_sha1, old_sha1);
 		hashcpy(cmd->new_sha1, new_sha1);
-		memcpy(cmd->ref_name, line + 82, len - 81);
+		memcpy(cmd->ref_name, line + hashlen + 2, len - (hashlen +1));
 		*p = cmd;
 		p = &cmd->next;
 	}
@@ -966,6 +1063,9 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		if (auto_update_server_info)
 			update_server_info(0);
 	}
+#ifdef BLOB_MDS_CHECK
+	process_mds_checks(rp_warning);
+#endif
 	if (use_sideband)
 		packet_flush(1);
 	return 0;
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e0b8030..a3ce108 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -250,6 +250,7 @@ int send_pack(struct send_pack_args *args,
 	int allow_deleting_refs = 0;
 	int status_report = 0;
 	int use_sideband = 0;
+	int mds_check = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -263,6 +264,8 @@ int send_pack(struct send_pack_args *args,
 		args->use_ofs_delta = 1;
 	if (server_supports("side-band-64k"))
 		use_sideband = 1;
+	if (server_supports("mds-check"))
+		mds_check = 1;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
@@ -298,8 +301,46 @@ int send_pack(struct send_pack_args *args,
 		if (args->dry_run) {
 			ref->status = REF_STATUS_OK;
 		} else {
-			char *old_hex = sha1_to_hex(ref->old_sha1);
-			char *new_hex = sha1_to_hex(ref->new_sha1);
+			char *old_hex, *new_hex;
+			if (mds_check) {
+				uint32_t objcrc32;
+				uint32_t blobcrc;
+				if (has_sha1_file_crc(ref->old_sha1,
+						      &objcrc32)) {
+					if (get_blob_mds(ref->old_sha1,
+							  &blobcrc)) {
+						old_hex =
+						  sha1_to_hex_crc(ref->old_sha1,
+								  objcrc32);
+					} else {
+						old_hex =
+						  sha1_to_hex_crc2
+						  (ref->old_sha1, objcrc32,
+						   blobcrc);
+					}
+				} else {
+					old_hex = sha1_to_hex(ref->old_sha1);
+				}
+				if (has_sha1_file_crc(ref->new_sha1,
+						      &objcrc32)) {
+					if (get_blob_mds(ref->new_sha1,
+						   &objcrc32)) {
+						new_hex =
+						  sha1_to_hex_crc(ref->new_sha1,
+								  objcrc32);
+					} else {
+						new_hex =
+						  sha1_to_hex_crc2
+						  (ref->new_sha1, objcrc32,
+						   blobcrc);
+					}
+				} else {
+					new_hex = sha1_to_hex(ref->new_sha1);
+				}
+			} else {
+				old_hex = sha1_to_hex(ref->old_sha1);
+				new_hex = sha1_to_hex(ref->new_sha1);
+			}
 
 			if (!cmds_sent && (status_report || use_sideband)) {
 				packet_buf_write(&req_buf, "%s %s %s%c%s%s",
diff --git a/builtin/verify-pack.c b/builtin/verify-pack.c
index e841b4a..b94a11e 100644
--- a/builtin/verify-pack.c
+++ b/builtin/verify-pack.c
@@ -5,14 +5,16 @@
 
 #define VERIFY_PACK_VERBOSE 01
 #define VERIFY_PACK_STAT_ONLY 02
+#define SHOW_MDS 04
 
 static int verify_one_pack(const char *path, unsigned int flags)
 {
 	struct child_process index_pack;
-	const char *argv[] = {"index-pack", NULL, NULL, NULL };
+	const char *argv[] = {"index-pack", NULL, NULL, NULL, NULL };
 	struct strbuf arg = STRBUF_INIT;
 	int verbose = flags & VERIFY_PACK_VERBOSE;
 	int stat_only = flags & VERIFY_PACK_STAT_ONLY;
+	int show_mds = ((flags & SHOW_MDS) != 0)  && !stat_only;
 	int err;
 
 	if (stat_only)
@@ -22,6 +24,8 @@ static int verify_one_pack(const char *path, unsigned int flags)
 	else
 		argv[1] = "--verify";
 
+	if (show_mds) argv[2] = "-M";
+
 	/*
 	 * In addition to "foo.pack" we accept "foo.idx" and "foo";
 	 * normalize these forms to "foo.pack" for "index-pack --verify".
@@ -31,7 +35,7 @@ static int verify_one_pack(const char *path, unsigned int flags)
 		strbuf_splice(&arg, arg.len - 3, 3, "pack", 4);
 	else if (!has_extension(arg.buf, ".pack"))
 		strbuf_add(&arg, ".pack", 5);
-	argv[2] = arg.buf;
+	argv[2 + show_mds] = arg.buf;
 
 	memset(&index_pack, 0, sizeof(index_pack));
 	index_pack.argv = argv;
@@ -46,6 +50,10 @@ static int verify_one_pack(const char *path, unsigned int flags)
 			if (!stat_only)
 				printf("%s: ok\n", arg.buf);
 		}
+	} else if (show_mds) {
+		printf("%s: listed (%s)\n-----------------\n", arg.buf,
+		       (err? "bad": "ok"));
+
 	}
 	strbuf_release(&arg);
 
@@ -67,6 +75,8 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
 			VERIFY_PACK_VERBOSE),
 		OPT_BIT('s', "stat-only", &flags, "show statistics only",
 			VERIFY_PACK_STAT_ONLY),
+		OPT_BIT('M', "show-mds", &flags, "show message digests / CRCs",
+			SHOW_MDS),
 		OPT_END()
 	};
 
diff --git a/cache.h b/cache.h
index 2e6ad36..85ecff2 100644
--- a/cache.h
+++ b/cache.h
@@ -433,6 +433,10 @@ extern int is_inside_work_tree(void);
 extern int have_git_dir(void);
 extern const char *get_git_dir(void);
 extern char *get_object_directory(void);
+extern char *get_object_crc_node(void);
+#ifdef PACKDB
+extern char *get_object_packdb_node(void);
+#endif
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
 extern int set_git_dir(const char *path);
@@ -541,8 +545,15 @@ extern int ce_path_match(const struct cache_entry *ce, const struct pathspec *pa
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
-extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
-extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
+
+#define index_fd(sha1,fd,st,type,path,flags)			\
+	index_fd_extended((sha1), NULL, (fd), (st), (type), (path), (flags))
+extern int index_fd_extended(unsigned char *sha1, uint32_t *objcrc32p, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
+
+#define index_path(sha1, path, st, flags) \
+	index_path_extended((sha1), NULL, (path), (st), (flags))
+extern int index_path_extended(unsigned char *sha1, uint32_t *objcrc32p, const char *path
+, struct stat *st, unsigned flags);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
 #define REFRESH_REALLY		0x0001	/* ignore_valid */
@@ -669,6 +680,7 @@ extern char *git_path_submodule(const char *path, const char *fmt, ...)
 extern char *sha1_file_name(const unsigned char *sha1);
 extern char *sha1_pack_name(const unsigned char *sha1);
 extern char *sha1_pack_index_name(const unsigned char *sha1);
+extern char *sha1_pack_mds_name(const unsigned char *sha1);
 extern const char *find_unique_abbrev(const unsigned char *sha1, int);
 extern const unsigned char null_sha1[20];
 
@@ -768,9 +780,18 @@ static inline const unsigned char *lookup_replace_object(const unsigned char *sh
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern int sha1_object_info(const unsigned char *, unsigned long *);
-extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
-extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
-extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
+
+#define hash_sha1_file(buf,len,type,sha1) \
+	hash_sha1_file_extended((buf), (len), (type), (sha1), NULL)
+extern int hash_sha1_file_extended(const void *buf, unsigned long len, const char *type, unsigned char *sha1, uint32_t *objcrc32p);
+
+#define write_sha1_file(buf,len,type,return_sha1) \
+	write_sha1_file_extended((buf), (len), (type), (return_sha1), NULL)
+extern int write_sha1_file_extended(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1, uint32_t *objcrc32p);
+
+#define pretend_sha1_file(buf,len,type,sha1) \
+	pretend_sha1_file_extended((buf), (len), (type), (sha1), NULL)
+extern int pretend_sha1_file_extended(void *, unsigned long, enum object_type, unsigned char *, uint32_t *objcrc32p);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
@@ -782,13 +803,17 @@ extern int do_check_packed_object_crc;
 /* for development: log offset of pack access */
 extern const char *log_pack_access;
 
-extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
+#define check_sha1_signature(sha1,buf,size,type) \
+	check_sha1_signature_extended((sha1), NULL, (buf), (size), (type))
+extern int check_sha1_signature_extended(const unsigned char *sha1, uint32_t *objcrc32p, void *buf, unsigned long size, const char *type);
 
 extern int move_temp_to_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
+extern int has_sha1_file_crc(const unsigned char *sha1, uint32_t *objcrc32p);
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
+extern int has_loose_object_nonlocal_crc(const unsigned char *sha1, uint32_t *objcrc32p);
 
 extern int has_pack_index(const unsigned char *sha1);
 
@@ -830,8 +855,12 @@ static inline int get_sha1_with_context(const char *str, unsigned char *sha1, st
  * null-terminated string.
  */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
+extern int get_sha1_hex_crc(const char *hex, unsigned char *sha1, int *hascrc, uint32_t *crc, int *hasblobmdsp, uint32_t *blobmdsp);
 
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern char *sha1_to_hex_crc(const unsigned char *sha1, const uint32_t objcrc);	/* static buffer result! */
+extern char *sha1_to_hex_crc2(const unsigned char *sha1, const uint32_t objcrc,
+			      const uint32_t blobcrc);	/* static buffer result! */
 extern int read_ref(const char *filename, unsigned char *sha1);
 
 /*
@@ -974,10 +1003,13 @@ extern struct packed_git {
 	off_t pack_size;
 	const void *index_data;
 	size_t index_size;
+	const void *mds_data; /*objcrc32 table*/
+	size_t mds_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
 	unsigned char *bad_object_sha1;
 	int index_version;
+	int mds_version;
 	time_t mtime;
 	int pack_fd;
 	unsigned pack_local:1,
@@ -992,6 +1024,8 @@ struct pack_entry {
 	off_t offset;
 	unsigned char sha1[20];
 	struct packed_git *p;
+	int has_objcrc32;
+	uint32_t objcrc32;
 };
 
 struct ref {
@@ -1047,6 +1081,11 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 
 extern void pack_report(void);
 extern int open_pack_index(struct packed_git *);
+extern int open_pack_mds(struct packed_git *p);
+extern int git_open_noatime(const char *name);
+extern size_t required_git_packed_mds_size(const char *path,
+					   void *data, uint32_t nobjects,
+					   size_t actual_size);
 extern void close_pack_index(struct packed_git *);
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
@@ -1055,8 +1094,16 @@ extern void free_pack_by_name(const char *);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *, int, int);
 extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
+extern int nth_packed_object_objcrc32(const struct packed_git *p, uint32_t n,
+				      uint32_t *objcrc32p);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
-extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
+
+#define find_pack_entry_one(sha1,p) find_pack_entry_one_extended((sha1),(p), NULL, NULL)
+extern off_t find_pack_entry_one_extended(const unsigned char *,
+					  struct packed_git *,
+					  int *hasojb32crcp,
+					  uint32_t *objcrc32p);
+
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
diff --git a/commit.c b/commit.c
index 73b7e00..815f4a0 100644
--- a/commit.c
+++ b/commit.c
@@ -11,6 +11,115 @@ int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
 
+struct blob_mds_context {
+  unsigned long missing;
+  uint32_t crc;
+};
+
+static int get_blob_mds_f(const unsigned char *sha1,
+			  const char *basebuf, int baselen,
+			  const char *path, unsigned int mode, int stage,
+			  void *context)
+{
+	struct blob_mds_context *c = (struct blob_mds_context *)context;
+	uint32_t crc;
+	unsigned long size;
+	int type;
+
+	if (!has_sha1_file(sha1)) {
+		c->missing++;
+		return -1;
+	}
+	type = sha1_object_info(sha1, &size);
+	switch(type) {
+	case OBJ_TREE:
+	  return ((mode) == 0040000)? READ_TREE_RECURSIVE: 0;
+	case OBJ_BLOB:
+		if (has_sha1_file_crc(sha1, &crc)) {
+			crc = ntohl(crc);
+			c->crc = crc32(c->crc, (unsigned char *)&crc,
+					 sizeof (uint32_t));
+		} else {
+			c->missing++;
+		}
+		return 0;
+	default:
+		if (type <= OBJ_NONE) {
+		  c->missing++;
+		}
+		return 0;
+	}
+}
+
+/*
+ * Works with a tree or a commit sha1 - recursively traverses the trees
+ * and computes the CRC of each blob's CRC.
+ */
+int get_blob_mds(const unsigned char *sha1, uint32_t *blobcrcp)
+{
+	struct blob_mds_context context;
+	context.crc = crc32(0, NULL, 0);
+	context.missing = 0;
+	struct tree *tree = parse_tree_indirect(sha1);
+	struct pathspec ps;
+	if (tree == NULL) {
+		return -1;
+	} else {
+		init_pathspec(&ps, NULL);
+		parse_tree(tree);
+		read_tree_recursive(tree, "", 0, 0, &ps, get_blob_mds_f,
+				    &context);
+		if (blobcrcp) (*blobcrcp) = htonl(context.crc);
+		return ((context.missing == 0)? 0: -1);
+	}
+}
+
+#ifdef BLOB_MDS_CHECK
+/*
+ * Used to check the values returned by get_mds_blob in cases involving
+ * a transfer protocol where a commit is transferred and processed
+ * before all of the objects associated with it are accessible.
+ */
+
+struct mds_check {
+  struct mds_check *next;
+  unsigned char sha1[20];
+  uint32_t blobcrc;
+} *mds_check_list = NULL;
+
+void push_mds_check(unsigned char *sha1, uint32_t crc) {
+	struct mds_check *ptr = (struct mds_check *)
+	  xmalloc(sizeof (struct mds_check));
+	hashcpy(ptr->sha1, sha1);
+	ptr->blobcrc = crc;
+	ptr->next = mds_check_list;
+	mds_check_list = ptr;
+}
+
+/*
+ * Must be called after the uploaded data is entered in the repository -
+ * otherwise we won't be able to get the CRCs needed by get_blob_mds,
+ * which will result in get_blob_mds returning -1 instead of 0.
+ */
+
+void process_mds_checks(rp_warning_f rp_warning) {
+	while (mds_check_list) {
+		uint32_t sha1_blobcrc;
+		int has_sha1_blobcrc = !get_blob_mds(mds_check_list->sha1,
+						     &sha1_blobcrc);
+		if (has_sha1_blobcrc &&
+		    mds_check_list->blobcrc != sha1_blobcrc) {
+			rp_warning("hash collision for %s",
+				   sha1_to_hex(mds_check_list->sha1));
+		}
+		mds_check_list = mds_check_list->next;
+	}
+}
+
+#endif /* BLOB_MDS_CHECK */
+
+
+
 static struct commit *check_commit(struct object *obj,
 				   const unsigned char *sha1,
 				   int quiet)
diff --git a/commit.h b/commit.h
index 009b113..65f2bd5 100644
--- a/commit.h
+++ b/commit.h
@@ -185,4 +185,12 @@ extern int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author);
 
+extern int get_blob_mds(const unsigned char *sha1, uint32_t *blobcrcp);
+
+#ifdef BLOB_MDS_CHECK
+extern void push_mds_check(unsigned char *sha1, uint32_t crc);
+typedef void (*rp_warning_f)(const char *err, ...) __attribute__((format (printf, 1, 2))
+);
+extern void process_mds_checks(rp_warning_f rp_warning);
+#endif /* BLOB_MDS_CHECK */
 #endif /* COMMIT_H */
diff --git a/environment.c b/environment.c
index 0bee6a7..8a45c82 100644
--- a/environment.c
+++ b/environment.c
@@ -9,6 +9,10 @@
  */
 #include "cache.h"
 #include "refs.h"
+#include "crcdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 char git_default_email[MAX_GITNAME];
 char git_default_name[MAX_GITNAME];
@@ -73,7 +77,10 @@ static size_t namespace_len;
 
 static const char *git_dir;
 static char *git_object_dir, *git_index_file, *git_graft_file;
-
+static char *git_object_crc_node;
+#ifdef PACKDB
+static char *git_object_packdb_node;
+#endif
 /*
  * Repository-local GIT_* environment variables
  * Remember to update local_repo_env_size in cache.h when
@@ -115,6 +122,11 @@ static char *expand_namespace(const char *raw_namespace)
 
 static void setup_git_env(void)
 {
+	static char cwdbuf[PATH_MAX];
+	int ocn_len;
+#ifdef PACKDB
+	int opn_len;
+#endif
 	git_dir = getenv(GIT_DIR_ENVIRONMENT);
 	git_dir = git_dir ? xstrdup(git_dir) : NULL;
 	if (!git_dir) {
@@ -128,6 +140,31 @@ static void setup_git_env(void)
 		git_object_dir = xmalloc(strlen(git_dir) + 9);
 		sprintf(git_object_dir, "%s/objects", git_dir);
 	}
+	ocn_len = strlen(git_object_dir) + 8 + strlen(getcwd(cwdbuf, PATH_MAX));
+	git_object_crc_node = xmalloc(ocn_len);
+	memset(git_object_crc_node, 0, ocn_len);
+	sprintf(git_object_crc_node, "%s/crcs", git_object_dir);
+	if (git_object_crc_node[0] != '/') {
+		int ocn_offset = (git_object_crc_node[0] == '.' &&
+				  git_object_crc_node[1] == '/')? 2:0;
+		memset(git_object_crc_node, 0, ocn_len);
+		sprintf(git_object_crc_node, "%s/%s/crcs",
+			getcwd(cwdbuf, PATH_MAX), git_object_dir + ocn_offset);
+	}
+#ifdef PACKDB
+	opn_len = strlen(git_object_dir)
+		+ 10 + strlen(getcwd(cwdbuf, PATH_MAX));
+	git_object_packdb_node = xmalloc(opn_len);
+	memset(git_object_packdb_node, 0, opn_len);
+	sprintf(git_object_packdb_node, "%s/packdb", git_object_dir);
+	if (git_object_packdb_node[0] != '/') {
+		int opn_offset = (git_object_crc_node[0] == '.' &&
+				  git_object_crc_node[1] == '/')? 2:0;
+		memset(git_object_packdb_node, 0, opn_len);
+		sprintf(git_object_packdb_node, "%s/%s/packdb",
+			getcwd(cwdbuf, PATH_MAX), git_object_dir + opn_offset);
+	}
+#endif
 	git_index_file = getenv(INDEX_ENVIRONMENT);
 	if (!git_index_file) {
 		git_index_file = xmalloc(strlen(git_dir) + 7);
@@ -140,6 +177,10 @@ static void setup_git_env(void)
 		read_replace_refs = 0;
 	namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
 	namespace_len = strlen(namespace);
+	crcdb_init();
+#ifdef PACKDB
+	packdb_init();
+#endif
 }
 
 int is_bare_repository(void)
@@ -207,6 +248,20 @@ char *get_object_directory(void)
 	return git_object_dir;
 }
 
+char *get_object_crc_node(void) {
+	if (!git_object_crc_node)
+		setup_git_env();
+	return git_object_crc_node;
+}
+
+#ifdef PACKDB
+char *get_object_packdb_node(void) {
+	if (!git_object_packdb_node)
+		setup_git_env();
+	return git_object_packdb_node;
+}
+#endif
+
 int odb_mkstemp(char *template, size_t limit, const char *pattern)
 {
 	int fd;
diff --git a/fast-import.c b/fast-import.c
index 8d8ea3c..62a675c 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -165,6 +165,11 @@ Format of STDIN stream:
 #include "exec_cmd.h"
 #include "dir.h"
 
+#ifdef PACKDB
+#include "packdb.h"
+#endif
+
+
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
 #define DEPTH_BITS 13
@@ -558,6 +563,8 @@ static struct object_entry *new_object(unsigned char *sha1)
 
 	e = blocks->next_free++;
 	hashcpy(e->idx.sha1, sha1);
+	e->idx.has_objcrc32 = 0;
+	e->idx.objcrc32 = 0;
 	return e;
 }
 
@@ -904,9 +911,34 @@ static const char *create_index(void)
 	return tmpfile;
 }
 
-static char *keep_pack(const char *curr_index_name)
+static const char *create_mds(void)
+{
+	const char *tmpfile;
+	struct pack_idx_entry **mds, **c, **last;
+	struct object_entry *e;
+	struct object_entry_pool *o;
+
+	/* Build the table of object IDs. */
+	mds = xmalloc(object_count * sizeof(*mds));
+	c = mds;
+	for (o = blocks; o; o = o->next_pool)
+		for (e = o->next_free; e-- != o->entries;)
+			if (pack_id == e->pack_id)
+				*c++ = &e->idx;
+	last = mds + object_count;
+	if (c != last)
+		die("internal consistency error creating the mds file");
+
+	tmpfile = write_mds_file(NULL, mds, object_count, &pack_idx_opts, pack_data->sha1);
+	free(mds);
+	return tmpfile;
+}
+
+
+static char *keep_pack(const char *curr_index_name, const char *curr_mds_name)
 {
 	static char name[PATH_MAX];
+	static char rname[PATH_MAX];
 	static const char *keep_msg = "fast-import";
 	int keep_fd;
 
@@ -927,6 +959,13 @@ static char *keep_pack(const char *curr_index_name)
 	if (move_temp_to_file(curr_index_name, name))
 		die("cannot store index file");
 	free((void *)curr_index_name);
+
+	snprintf(rname, sizeof(rname), "%s/pack/pack-%s.mds",
+		 get_object_directory(), sha1_to_hex(pack_data->sha1));
+	if (move_temp_to_file(curr_mds_name, rname))
+		die("cannot store index file");
+	free((void *)curr_mds_name);
+
 	return name;
 }
 
@@ -951,6 +990,7 @@ static void end_packfile(void)
 	if (object_count) {
 		unsigned char cur_pack_sha1[20];
 		char *idx_name;
+		const char *n1, *n2;
 		int i;
 		struct branch *b;
 		struct tag *t;
@@ -961,7 +1001,9 @@ static void end_packfile(void)
 				    pack_data->pack_name, object_count,
 				    cur_pack_sha1, pack_size);
 		close(pack_data->pack_fd);
-		idx_name = keep_pack(create_index());
+		n1 = create_index();
+		n2 = create_mds();
+		idx_name = keep_pack(n1, n2);
 
 		/* Register the packfile with core git's machinery. */
 		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
@@ -1021,6 +1063,7 @@ static int store_object(
 	unsigned long hdrlen, deltalen;
 	git_SHA_CTX c;
 	git_zstream s;
+	uint32_t objcrc;
 
 	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
 		(unsigned long)dat->len) + 1;
@@ -1028,10 +1071,27 @@ static int store_object(
 	git_SHA1_Update(&c, hdr, hdrlen);
 	git_SHA1_Update(&c, dat->buf, dat->len);
 	git_SHA1_Final(sha1, &c);
+	objcrc = crc32(0, NULL, 0);
+	objcrc = htonl(crc32(objcrc, (unsigned char *)(dat->buf), dat->len));
+
+	if (has_sha1_file(sha1)) {
+		uint32_t oldcrc;
+		if (has_sha1_file_crc(sha1, &oldcrc)) {
+			if (objcrc != oldcrc) {
+				die("hash collision on %s [fast-import]",
+				    sha1_to_hex(sha1));
+			}
+		}
+	}
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
 	e = insert_object(sha1);
+	e->idx.has_objcrc32 = 1;
+	e->idx.objcrc32 = objcrc;
+#ifdef PACKDB
+	packdb_process(sha1, objcrc);
+#endif
 	if (mark)
 		insert_mark(mark, e);
 	if (e->idx.offset) {
@@ -1163,6 +1223,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	unsigned char *out_buf = xmalloc(out_sz);
 	struct object_entry *e;
 	unsigned char sha1[20];
+	uint32_t objcrc;
 	unsigned long hdrlen;
 	off_t offset;
 	git_SHA_CTX c;
@@ -1186,6 +1247,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 		die("impossibly large object header");
 
 	git_SHA1_Init(&c);
+	objcrc = crc32(0, NULL, 0);
 	git_SHA1_Update(&c, out_buf, hdrlen);
 
 	crc32_begin(pack_file);
@@ -1208,6 +1270,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 				die("EOF in data (%" PRIuMAX " bytes remaining)", len);
 
 			git_SHA1_Update(&c, in_buf, n);
+			objcrc = crc32(objcrc, in_buf, n);
 			s.next_in = in_buf;
 			s.avail_in = n;
 			len -= n;
@@ -1234,11 +1297,14 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	}
 	git_deflate_end(&s);
 	git_SHA1_Final(sha1, &c);
+	objcrc = htonl(objcrc);
 
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
 	e = insert_object(sha1);
+	e->idx.has_objcrc32 = 1;
+	e->idx.objcrc32 = objcrc;
 
 	if (mark)
 		insert_mark(mark, e);
@@ -1837,6 +1903,8 @@ static void read_marks(void)
 			if (type < 0)
 				die("object not found: %s", sha1_to_hex(sha1));
 			e = insert_object(sha1);
+			e->idx.has_objcrc32 =
+				has_sha1_file_crc(sha1, &e->idx.objcrc32);
 			e->type = type;
 			e->pack_id = MAX_PACK_ID;
 			e->idx.offset = 1; /* just not zero! */
@@ -2883,6 +2951,8 @@ static struct object_entry *dereference(struct object_entry *oe,
 			die("object not found: %s", sha1_to_hex(sha1));
 		/* cache it! */
 		oe = insert_object(sha1);
+		oe->idx.has_objcrc32 =
+			has_sha1_file_crc(sha1, &oe->idx.objcrc32);
 		oe->type = type;
 		oe->pack_id = MAX_PACK_ID;
 		oe->idx.offset = 1;
diff --git a/git-repack.sh b/git-repack.sh
index 624feec..7602853 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -91,6 +91,7 @@ if [ -z "$names" ]; then
 	say Nothing new to pack.
 fi
 
+
 # Ok we have prepared all new packfiles.
 
 # First see if there are packs of the same name and if so
@@ -100,7 +101,7 @@ rollback=
 failed=
 for name in $names
 do
-	for sfx in pack idx
+	for sfx in pack idx mds
 	do
 		file=pack-$name.$sfx
 		test -f "$PACKDIR/$file" || continue
@@ -148,15 +149,22 @@ do
 	fullbases="$fullbases pack-$name"
 	chmod a-w "$PACKTMP-$name.pack"
 	chmod a-w "$PACKTMP-$name.idx"
+	(chmod a-w "$PACKTMP-$name.mds" 2>/dev/null || exit 0 )
 	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
 	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
 	exit
+	if test -f "$PACKTMP-$name.mds"
+	then
+		mv -f "$PACKTMP-$name.mds"  "$PACKDIR/pack-$name.mds" \
+		    2>/dev/null || exit
+	fi
 done
 
 # Remove the "old-" files
 for name in $names
 do
 	rm -f "$PACKDIR/old-pack-$name.idx"
+	rm -f "$PACKDIR/old-pack-$name.mds"
 	rm -f "$PACKDIR/old-pack-$name.pack"
 done
 
@@ -172,7 +180,7 @@ then
 		  do
 			case " $fullbases " in
 			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+			*)	rm -f "$e.pack" "$e.idx" "$e.mds" "$e.keep" ;;
 			esac
 		  done
 		)
diff --git a/git.c b/git.c
index 8e34903..c6924f0 100644
--- a/git.c
+++ b/git.c
@@ -4,7 +4,10 @@
 #include "help.h"
 #include "quote.h"
 #include "run-command.h"
-
+#include "crcdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 const char git_usage_string[] =
 	"git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]\n"
 	"           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]\n"
@@ -278,7 +281,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 	int status, help;
 	struct stat st;
 	const char *prefix;
-
+	static int crcdb_need_atexit = 1;
+
+	if (crcdb_need_atexit) {
+#ifdef PACKDB
+		atexit(packdb_finish);
+#endif
+		atexit(crcdb_finish);
+		crcdb_need_atexit = 0;
+	}
 	prefix = NULL;
 	help = argc == 2 && !strcmp(argv[1], "-h");
 	if (!help) {
diff --git a/hex.c b/hex.c
index 9ebc050..73c88fa 100644
--- a/hex.c
+++ b/hex.c
@@ -56,10 +56,50 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 	return 0;
 }
 
+static int get_crc_hex(const char *hex, unsigned char *crc) {
+	int i;
+	for (i = 0; i < 4; i++) {
+		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+		if (val & ~0xff)
+			return -1;
+		*crc++ = val;
+		hex += 2;
+	}
+	return 0;
+}
+
+int get_sha1_hex_crc(const char *hex, unsigned char *sha1, int *hascrc,
+		     uint32_t *crc, int *hasblobmds, uint32_t *blobmds)
+{
+	int result = get_sha1_hex(hex, sha1);
+	if (result) return result;
+	if (hex[20] == '-') {
+		unsigned char *ptr = (unsigned char *)crc;
+		if (!get_crc_hex(hex+21, ptr))
+			return -1;
+		*hascrc = 1;
+		if (hex[25] == '-') {
+			ptr = (unsigned char *)blobmds;
+			if (!get_crc_hex(hex+26, ptr)) {
+				return -1;
+			}
+			*hasblobmds = 1;
+		} else {
+			*hasblobmds = 0;
+		}
+	} else {
+		*hascrc = 0;
+		*hasblobmds = 0;
+		*crc = 0;
+		*blobmds = 0;
+	}
+	return 0;
+}
+
 char *sha1_to_hex(const unsigned char *sha1)
 {
 	static int bufno;
-	static char hexbuffer[4][50];
+	static char hexbuffer[4][59];
 	static const char hex[] = "0123456789abcdef";
 	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
 	int i;
@@ -73,3 +113,19 @@ char *sha1_to_hex(const unsigned char *sha1)
 
 	return buffer;
 }
+
+char *sha1_to_hex_crc(const unsigned char *sha1, const uint32_t objcrc32)
+{
+	char *result = sha1_to_hex(sha1);
+	sprintf(result+40, "-%8.8x", ntohl(objcrc32));
+	return result;
+}
+
+char *sha1_to_hex_crc2(const unsigned char *sha1, const uint32_t objcrc32,
+		      const uint32_t blobcrc32)
+{
+	char *result = sha1_to_hex(sha1);
+	sprintf(result+40, "-%8.8x-%8.8x", ntohl(objcrc32),
+		ntohl(blobcrc32));
+	return result;
+}
diff --git a/http.c b/http.c
index e6c7597..a92fd9f 100644
--- a/http.c
+++ b/http.c
@@ -1072,8 +1072,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	struct packed_git **lst;
 	struct packed_git *p = preq->target;
 	char *tmp_idx;
+	char *tmp_mds;
 	struct child_process ip;
-	const char *ip_argv[8];
+	const char *ip_argv[10];
 
 	close_pack_index(p);
 
@@ -1087,14 +1088,20 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	*lst = (*lst)->next;
 
 	tmp_idx = xstrdup(preq->tmpfile);
+	tmp_mds = xstrdup(preq->tmpfile);
 	strcpy(tmp_idx + strlen(tmp_idx) - strlen(".pack.temp"),
 	       ".idx.temp");
+	strcpy(tmp_mds + strlen(tmp_mds) - strlen(".pack.temp"),
+	       ".mds.temp");
+
 
 	ip_argv[0] = "index-pack";
 	ip_argv[1] = "-o";
 	ip_argv[2] = tmp_idx;
-	ip_argv[3] = preq->tmpfile;
-	ip_argv[4] = NULL;
+	ip_argv[3] = "-m";
+	ip_argv[4] = tmp_mds;
+	ip_argv[5] = preq->tmpfile;
+	ip_argv[6] = NULL;
 
 	memset(&ip, 0, sizeof(ip));
 	ip.argv = ip_argv;
@@ -1105,20 +1112,24 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	if (run_command(&ip)) {
 		unlink(preq->tmpfile);
 		unlink(tmp_idx);
+		unlink(tmp_mds);
 		free(tmp_idx);
+		free(tmp_mds);
 		return -1;
 	}
 
 	unlink(sha1_pack_index_name(p->sha1));
 
 	if (move_temp_to_file(preq->tmpfile, sha1_pack_name(p->sha1))
-	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))) {
+	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))
+	 || move_temp_to_file(tmp_mds, sha1_pack_mds_name(p->sha1))) {
 		free(tmp_idx);
 		return -1;
 	}
 
 	install_packed_git(p);
 	free(tmp_idx);
+	free(tmp_mds);
 	return 0;
 }
 
diff --git a/pack-write.c b/pack-write.c
index 9cd3bfb..1c8fb72 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -178,6 +178,101 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+
+const char *write_mds_file(const char *crc_name,
+			   struct pack_idx_entry **objects,
+			   int nr,
+			   const struct pack_idx_option *opts,
+			   unsigned char *sha1)
+{
+	static unsigned char buffer[20];
+	unsigned char *base = buffer;
+	int i, j, fd;
+	struct sha1file *f;
+
+	if (nr) {
+		/*
+		 * Sort just in case objects not already sorted.
+		 */
+		qsort(objects, nr, sizeof(objects[0]), sha1_compare);
+	}
+
+	if (opts->flags & WRITE_IDX_VERIFY) {
+		assert(crc_name);
+		f = sha1fd_check(crc_name);
+		if (f == NULL) {
+			/*
+			 * For backwards-compatability, assume a missing
+			 * mds file is OK.
+			 */
+			return crc_name;
+		}
+	} else {
+	  if (!crc_name) {
+	    static char tmpfile[PATH_MAX];
+	    fd = odb_mkstemp(tmpfile, sizeof(tmpfile),
+			     "pack/tmp_mds_XXXXXX");
+	    crc_name = xstrdup(tmpfile);
+	  } else {
+	    unlink(crc_name);
+	    fd = open(crc_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+	  }
+	  if (fd < 0)
+	    die_errno("unable to create '%s'", crc_name);
+	  f = sha1fd(fd, crc_name);
+	}
+
+	*(base++) = 'P';
+	*(base++) = 'K';
+	*(base++) = 'M';
+	*(base++) = 'D';
+	*(base++) = 'S';
+	*(base++) = 0;
+	*(base++) = 1; /* version number */
+	*(base++) = 1; /* wsize */
+	sha1write(f, buffer, base - buffer);
+	base = buffer;
+
+	for (i = 0; i < nr; i += 4) {
+		int lim = ((nr-i) > 3)? 4: nr-i;
+		int has[4];
+		uint32_t crc[4];
+		for (j = 0; j < lim; j++) {
+			if (objects[i+j]->has_objcrc32) {
+				has[j] = 1;
+				crc[j] = objects[i+j]->objcrc32;
+			} else {
+				has[j] =
+				  (has_sha1_file_crc(objects[i + j]->sha1,
+						     &crc[j]) == 1);
+			}
+		}
+		for (j = 0; j < 4; j++) {
+			if (j < lim) {
+				*(base)++ = has[j];
+			} else {
+				has[j] = 0;
+				crc[j] = 0;
+				*(base++) = 0;
+			}
+		}
+		for (j = 0; j < 4; j += 1) {
+		  if (j < lim) {
+			*((uint32_t *)base) = has[j]? crc[j]: 0;
+		  } else {
+			*((uint32_t *)base) = 0;
+		  }
+		  base += 4;
+		}
+		sha1write(f, buffer, base - buffer);
+		base = buffer;
+	}
+	sha1write(f, sha1, 20);
+	sha1close(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
+			    ? CSUM_CLOSE : CSUM_FSYNC));
+	return crc_name;
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
diff --git a/pack.h b/pack.h
index 722a54e..62f5d41 100644
--- a/pack.h
+++ b/pack.h
@@ -68,9 +68,12 @@ struct pack_idx_entry {
 	unsigned char sha1[20];
 	uint32_t crc32;
 	off_t offset;
+	int has_objcrc32;
+	uint32_t objcrc32;
 };
 
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
+extern const char *write_mds_file(const char *mds_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *);
diff --git a/sha1_file.c b/sha1_file.c
index 6dcae38..da51b40 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -18,6 +18,10 @@
 #include "refs.h"
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
+#include "crcdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -40,6 +44,7 @@ const unsigned char null_sha1[20];
  */
 static struct cached_object {
 	unsigned char sha1[20];
+	uint32_t objcrc32;
 	enum object_type type;
 	void *buf;
 	unsigned long size;
@@ -48,6 +53,7 @@ static int cached_object_nr, cached_object_alloc;
 
 static struct cached_object empty_tree = {
 	EMPTY_TREE_SHA1_BIN_LITERAL,
+	0u,
 	OBJ_TREE,
 	"",
 	0
@@ -222,11 +228,18 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 	return sha1_get_pack_name(sha1, &name, &base, "idx");
 }
 
+char *sha1_pack_mds_name(const unsigned char *sha1)
+{
+	static char *name, *base;
+
+	return sha1_get_pack_name(sha1, &name, &base, "mds");
+}
+
+
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
 static void read_info_alternates(const char * alternates, int depth);
-static int git_open_noatime(const char *name);
 
 /*
  * Prepare alternate object database registry.
@@ -415,6 +428,7 @@ void prepare_alt_odb(void)
 	link_alt_odb_entries(alt, alt + strlen(alt), PATH_SEP, NULL, 0);
 
 	read_info_alternates(get_object_directory(), 0);
+	crcdb_init_alts();
 }
 
 static int has_loose_object_local(const unsigned char *sha1)
@@ -441,6 +455,52 @@ static int has_loose_object(const unsigned char *sha1)
 	       has_loose_object_nonlocal(sha1);
 }
 
+static int has_loose_object_local_crc(const unsigned char *sha1,
+				      uint32_t *objcrc32p)
+{
+	int status;
+	crcdb_open(NULL);
+	status = crcdb_lookup(NULL, sha1, objcrc32p) > 0;
+	crcdb_close(NULL);
+	return status;
+}
+
+int has_loose_object_nonlocal_crc(const unsigned char *sha1,
+				  uint32_t *objcrc32p)
+{
+	struct alternate_object_database *alt;
+	if (objcrc32p == NULL) return 0;
+	/* memset(buffer, 0, PATH_MAX); */
+	prepare_alt_odb();
+	for (alt = alt_odb_list; alt; alt = alt->next) {
+		fill_sha1_path(alt->name, sha1);
+		if (!access(alt->base, F_OK)) {
+			uint32_t xcrc;
+			/* Use the crc corresponding to the hash */
+			crcdb_t dbf;
+			int status;
+			dbf = crcdb_open_alt(alt);
+			status = crcdb_lookup(dbf, sha1,
+					      (objcrc32p? objcrc32p: &xcrc));
+			crcdb_close(dbf);
+			switch (status) {
+			case 0: return 0;
+			case 1: return 1;
+			case -1:
+			default:
+				return 0;
+			}
+		}
+	}
+	return 0;
+}
+
+static int has_loose_object_crc(const unsigned char *sha1, uint32_t *objcrc32p)
+{
+	return has_loose_object_local_crc(sha1, objcrc32p) ||
+	       has_loose_object_nonlocal_crc(sha1, objcrc32p);
+}
+
 static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
@@ -574,6 +634,87 @@ static int check_packed_git_idx(const char *path,  struct packed_git *p)
 	return 0;
 }
 
+size_t required_git_packed_mds_size(const char *path, void *data,
+				    uint32_t nobjects,
+				    size_t actual_size) {
+	unsigned char *base;
+	int wsize, version;
+	size_t required_size;
+	if (actual_size < 8) {
+		error("mds/crc file %s is too small", path);
+		return 0;
+	}
+
+	base = data;
+	if ((*(base++) != 'P')
+	    || (*(base++) != 'K')
+	    || (*(base++) != 'M')
+	    || (*(base++) != 'D')
+	    || (*(base++) != 'S')
+	    || (*(base++) != 0)) {
+		error("mds/crc file %s corrupted (bad header)",
+			     path);
+		return 0;
+
+	}
+	if ((version = *(base++)) != 1) {
+		error("mds/crc file %s uses an unrecognized version %d",
+		      path, version);
+		return 0;
+	}
+	wsize = (*(base++)) * 4;
+	if (wsize != 4) {
+		/* other values not defined currently. */
+		error("mds/crc file %s corrupted (bad wsize field)",
+			     path);
+		return 0;
+	}
+	required_size = (size_t)8 +
+	  ((size_t)((nobjects)/4 + (nobjects % 4 != 0))
+	   * (size_t)(4 * (1 + wsize))) + (size_t)(20 * 2);
+	if (required_size != actual_size) {
+		error("mds/crc file %s not the right size: %ld != %ld",
+		      path, (long)actual_size, (long)required_size);
+		return 0;
+	}
+	return required_size;
+}
+
+static int check_packed_git_mds(const char *path, struct packed_git *p)
+{
+	void *mds_map;
+	size_t mds_size, required_size;
+	unsigned char *base;
+	int fd = git_open_noatime(path);
+	int version;
+	struct stat st;
+	if (fd < 0)
+		return -1;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+	mds_size = xsize_t(st.st_size);
+	if (mds_size < 8 + 20 + 20) {
+		close(fd);
+		return error("mds/crc file %s is too small", path);
+	}
+	mds_map = xmmap(NULL, mds_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	base = mds_map;
+	required_size = required_git_packed_mds_size(path, mds_map,
+						     p->num_objects,
+						     mds_size);
+	if (required_size == 0) {
+		munmap(mds_map, mds_size);
+		return -1;
+	}
+	p->mds_data = mds_map;
+	p->mds_size = mds_size;
+	p->mds_version = version;
+	return 0;
+}
+
 int open_pack_index(struct packed_git *p)
 {
 	char *idx_name;
@@ -589,6 +730,20 @@ int open_pack_index(struct packed_git *p)
 	return ret;
 }
 
+int open_pack_mds(struct packed_git *p) {
+	char *crc_name;
+	int ret;
+
+	if (p->mds_data)
+		return 0;
+
+	crc_name = xstrdup(p->pack_name);
+	strcpy(crc_name + strlen(crc_name) - strlen(".pack"), ".mds");
+	ret = check_packed_git_mds(crc_name, p);
+	free(crc_name);
+	return ret;
+}
+
 static void scan_windows(struct packed_git *p,
 	struct packed_git **lru_p,
 	struct pack_window **lru_w,
@@ -690,6 +845,15 @@ void close_pack_index(struct packed_git *p)
 	if (p->index_data) {
 		munmap((void *)p->index_data, p->index_size);
 		p->index_data = NULL;
+		p->index_size = 0;
+	}
+}
+
+void close_pack_mds(struct packed_git *p) {
+	if (p->mds_data) {
+		munmap((void *)p->mds_data, p->mds_size);
+		p->mds_data = NULL;
+		p->mds_size = 0;
 	}
 }
 
@@ -717,6 +881,7 @@ void free_pack_by_name(const char *pack_name)
 				pack_open_fds--;
 			}
 			close_pack_index(p);
+			close_pack_mds(p);
 			free(p->bad_object_sha1);
 			*pp = p->next;
 			free(p);
@@ -740,6 +905,10 @@ static int open_packed_git_1(struct packed_git *p)
 
 	if (!p->index_data && open_pack_index(p))
 		return error("packfile %s index unavailable", p->pack_name);
+	/*
+	 * Assume an mds file might not be available - backwards compatibility
+	 */
+	if (!p->mds_data) open_pack_mds(p);
 
 	if (!pack_max_fds) {
 		struct rlimit lim;
@@ -1141,14 +1310,23 @@ static const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
 	return NULL;
 }
 
-int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long size, const char *type)
+int check_sha1_signature_extended(const unsigned char *sha1,
+				  uint32_t *objcrc32p,
+				  void *map, unsigned long size,
+				  const char *type)
 {
 	unsigned char real_sha1[20];
-	hash_sha1_file(map, size, type, real_sha1);
-	return hashcmp(sha1, real_sha1) ? -1 : 0;
+	uint32_t realcrc;
+	hash_sha1_file_extended(map, size, type, real_sha1,
+				((objcrc32p == NULL)? NULL: &realcrc));
+	int ret = hashcmp(sha1, real_sha1) ? -1 : 0;
+	if (objcrc32p && ret == 0) {
+		ret = ((*objcrc32p) - realcrc)? -1 : 0;
+	}
+	return ret;
 }
 
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
 {
 	static int sha1_file_open_flag = O_NOATIME;
 
@@ -1924,15 +2102,46 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
 	}
 }
 
-off_t find_pack_entry_one(const unsigned char *sha1,
-				  struct packed_git *p)
+int nth_packed_object_objcrc32(const struct packed_git *p, uint32_t n,
+			       uint32_t *objcrc32p)
+{
+	int r;
+	unsigned char *base = (unsigned char *)(p->mds_data);
+	int wsize; /*size in bytes per CRC field, stored as 32-bit words */
+
+	if (base == NULL) return 0;
+
+	base += 7;
+	wsize = (*(base++)) * 4;
+	if (wsize != 4) {
+		/* other values not defined currently. */
+		return -1;
+	}
+	base += (n / 4) * (uint32_t)(4 * (1 + wsize));
+	r = n % 4;
+	if (base[r] == 0) return 0;
+	base += 4;
+	base += wsize * r;
+	*objcrc32p = *(uint32_t *) base;
+	return 1;
+}
+
+
+
+off_t find_pack_entry_one_extended(const unsigned char *sha1,
+				   struct packed_git *p,
+				   int *has_objcrc32p, uint32_t *objcrc32p)
 {
 	const uint32_t *level1_ofs = p->index_data;
 	const unsigned char *index = p->index_data;
+	const unsigned char *mds = p->mds_data;
 	unsigned hi, lo, stride;
 	static int use_lookup = -1;
 	static int debug_lookup = -1;
 
+	if (has_objcrc32p) *has_objcrc32p = 0;
+	if (objcrc32p) *objcrc32p = 0;
+
 	if (debug_lookup < 0)
 		debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
 
@@ -1942,6 +2151,11 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		level1_ofs = p->index_data;
 		index = p->index_data;
 	}
+
+	if (!mds) {
+		open_pack_mds(p);
+	}
+
 	if (p->index_version > 1) {
 		level1_ofs += 2;
 		index += 8;
@@ -1977,8 +2191,14 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		if (debug_lookup)
 			printf("lo %u hi %u rg %u mi %u\n",
 			       lo, hi, hi - lo, mi);
-		if (!cmp)
+		if (!cmp) {
+			if (has_objcrc32p && objcrc32p)
+				*(has_objcrc32p) =
+				  (nth_packed_object_objcrc32(p,
+							      mi,
+							      objcrc32p) == 1);
 			return nth_packed_object_offset(p, mi);
+		}
 		if (cmp > 0)
 			hi = mi;
 		else
@@ -2027,7 +2247,9 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
 					goto next;
 		}
 
-		offset = find_pack_entry_one(sha1, p);
+		offset = find_pack_entry_one_extended(sha1, p,
+						      &(e->has_objcrc32),
+						      &(e->objcrc32));
 		if (offset) {
 			/*
 			 * We are about to tell the caller where they can
@@ -2173,14 +2395,32 @@ static void *read_packed_sha1(const unsigned char *sha1,
 	return data;
 }
 
-int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
-		      unsigned char *sha1)
+int pretend_sha1_file_extended(void *buf, unsigned long len,
+			       enum object_type type,
+			       unsigned char *sha1, uint32_t *objcrc32p)
 {
-	struct cached_object *co;
+	struct cached_object *co = NULL;
+	uint32_t crc32;
+	int has_crc32 = 0;
 
-	hash_sha1_file(buf, len, typename(type), sha1);
-	if (has_sha1_file(sha1) || find_cached_object(sha1))
+	hash_sha1_file_extended(buf, len, typename(type), sha1, &crc32);
+	if (has_sha1_file(sha1) || (co = find_cached_object(sha1))) {
+		uint32_t oldcrc32;
+		if (!has_sha1_file_crc(sha1, &oldcrc32)) {
+			if (co != NULL) {
+				oldcrc32 = co->objcrc32;
+				has_crc32 = 1;
+			}
+		} else {
+			has_crc32 = 1;
+		}
+		if (has_crc32 && oldcrc32 != crc32) {
+			  die("SHA1 COLLISION FOUND FOR %s "
+			      "(dummy commit when running blame?)",
+			      sha1_to_hex(sha1));
+		}
 		return 0;
+	}
 	if (cached_object_alloc <= cached_object_nr) {
 		cached_object_alloc = alloc_nr(cached_object_alloc);
 		cached_objects = xrealloc(cached_objects,
@@ -2191,8 +2431,10 @@ int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
 	co->size = len;
 	co->type = type;
 	co->buf = xmalloc(len);
+	co->objcrc32 = crc32;
 	memcpy(co->buf, buf, len);
 	hashcpy(co->sha1, sha1);
+	if (objcrc32p) *objcrc32p = crc32;
 	return 0;
 }
 
@@ -2314,8 +2556,9 @@ void *read_object_with_reference(const unsigned char *sha1,
 }
 
 static void write_sha1_file_prepare(const void *buf, unsigned long len,
-                                    const char *type, unsigned char *sha1,
-                                    char *hdr, int *hdrlen)
+				    const char *type, unsigned char *sha1,
+				    uint32_t * objcrc32p,
+				    char *hdr, int *hdrlen)
 {
 	git_SHA_CTX c;
 
@@ -2327,6 +2570,10 @@ static void write_sha1_file_prepare(const void *buf, unsigned long len,
 	git_SHA1_Update(&c, hdr, *hdrlen);
 	git_SHA1_Update(&c, buf, len);
 	git_SHA1_Final(sha1, &c);
+	if (objcrc32p) {
+		*objcrc32p = crc32(0, NULL, 0);
+		*objcrc32p = htonl(crc32(*objcrc32p, buf, len));
+	}
 }
 
 /*
@@ -2382,12 +2629,13 @@ static int write_buffer(int fd, const void *buf, size_t len)
 	return 0;
 }
 
-int hash_sha1_file(const void *buf, unsigned long len, const char *type,
-                   unsigned char *sha1)
+int hash_sha1_file_extended(const void *buf, unsigned long len,
+			    const char *type,
+			    unsigned char *sha1, uint32_t *objcrc32p)
 {
 	char hdr[32];
 	int hdrlen;
-	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	write_sha1_file_prepare(buf, len, type, sha1, objcrc32p, hdr, &hdrlen);
 	return 0;
 }
 
@@ -2441,10 +2689,13 @@ static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
 	return fd;
 }
 
-static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
+
+static int write_loose_object(const unsigned char *sha1, uint32_t *objcrc32p,
+			      char *hdr, int hdrlen,
 			      const void *buf, unsigned long len, time_t mtime)
 {
 	int fd, ret;
+	uint32_t crc;
 	unsigned char compressed[4096];
 	git_zstream stream;
 	git_SHA_CTX c;
@@ -2467,7 +2718,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	stream.next_out = compressed;
 	stream.avail_out = sizeof(compressed);
 	git_SHA1_Init(&c);
-
+	crc = crc32(0, NULL, 0);
 	/* First header.. */
 	stream.next_in = (unsigned char *)hdr;
 	stream.avail_in = hdrlen;
@@ -2482,11 +2733,13 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 		unsigned char *in0 = stream.next_in;
 		ret = git_deflate(&stream, Z_FINISH);
 		git_SHA1_Update(&c, in0, stream.next_in - in0);
+		crc = crc32(crc, in0, stream.next_in - in0);
 		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
 			die("unable to write sha1 file");
 		stream.next_out = compressed;
 		stream.avail_out = sizeof(compressed);
 	} while (ret == Z_OK);
+	crc = htonl(crc);
 
 	if (ret != Z_STREAM_END)
 		die("unable to deflate new object %s (%d)", sha1_to_hex(sha1), ret);
@@ -2496,9 +2749,10 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	git_SHA1_Final(parano_sha1, &c);
 	if (hashcmp(sha1, parano_sha1) != 0)
 		die("confused by unstable object source data for %s", sha1_to_hex(sha1));
-
+	if (objcrc32p && ((*objcrc32p) != crc)) {
+		die("confused by unstable object source data (crc mismatch) for %s", sha1_to_hex(sha1));
+	}
 	close_sha1_file(fd);
-
 	if (mtime) {
 		struct utimbuf utb;
 		utb.actime = mtime;
@@ -2508,24 +2762,41 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 				tmpfile, strerror(errno));
 	}
 
-	return move_temp_to_file(tmpfile, filename);
+	ret = move_temp_to_file(tmpfile, filename);
+	if (ret == 0) {
+		crcdb_open(NULL);
+		crcdb_process((crcdb_t)NULL, sha1, crc);
+		crcdb_close(NULL);
+	}
+	return ret;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
+int write_sha1_file_extended(const void *buf, unsigned long len,
+			     const char *type, unsigned char *returnsha1,
+			     uint32_t *objcrc32p)
 {
 	unsigned char sha1[20];
 	char hdr[32];
 	int hdrlen;
+	uint32_t newcrc;
 
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	write_sha1_file_prepare(buf, len, type, sha1, &newcrc, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (has_sha1_file(sha1))
+	if (objcrc32p) *objcrc32p = newcrc;
+	if (has_sha1_file(sha1)) {
+		uint32_t oldcrc;
+		if (has_sha1_file_crc(sha1, &oldcrc)) {
+			if (newcrc != oldcrc) {
+				die("hash collision");
+			}
+		}
 		return 0;
-	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
+	}
+	return write_loose_object(sha1, &newcrc, hdr, hdrlen, buf, len, 0);
 }
 
 int force_object_loose(const unsigned char *sha1, time_t mtime)
@@ -2536,6 +2807,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 	char hdr[32];
 	int hdrlen;
 	int ret;
+	uint32_t * const objcrc32p = NULL;
 
 	if (has_loose_object(sha1))
 		return 0;
@@ -2543,7 +2815,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 	if (!buf)
 		return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
 	hdrlen = sprintf(hdr, "%s %lu", typename(type), len) + 1;
-	ret = write_loose_object(sha1, hdr, hdrlen, buf, len, mtime);
+	ret = write_loose_object(sha1, objcrc32p, hdr, hdrlen, buf, len, mtime);
 	free(buf);
 
 	return ret;
@@ -2572,6 +2844,86 @@ int has_sha1_file(const unsigned char *sha1)
 	return has_loose_object(sha1);
 }
 
+int has_sha1_file_crc(const unsigned char *sha1, uint32_t *objcrc32p)
+{
+	struct pack_entry e;
+
+
+	/*
+	 * builtin/send-pack.c uses a null SHA1 (all bytes zero) to
+	 * indicate that a SHA-1 hash does not exist.  We explicitly
+	 * return 0 for this case, for correct behavior even if we
+	 * somehow get that value into the database.
+	 */
+	if (!hashcmp(sha1, null_sha1)) return 0;
+
+	if (find_pack_entry(sha1, &e)) {
+		if (e.has_objcrc32) {
+			if (objcrc32p) *objcrc32p = e.objcrc32;
+			return 1;
+		} else {
+#ifdef PACKDB
+			if (e.p && e.p->pack_local) {
+				/*
+				 * We have a local pack file, but could not
+				 * find the CRC, so we first check if the
+				 * CRC is still stored for loose objects.
+				 * Then we try packdb (separate database for
+				 * packed objects) and if it is not there, we
+				 * compute it from scratch and add it to
+				 * packdb.
+				 */
+				if (has_loose_object_local_crc(sha1,
+							       objcrc32p)) {
+					return 1;
+				} else {
+					int status ;
+					packdb_open();
+					status = (packdb_lookup(sha1,
+								objcrc32p)
+						  == 1);
+					if (status == 0) {
+						unsigned long len;
+						enum object_type type;
+						uint32_t crc;
+						crc = crc32(0, NULL, 0);
+						void *buf = read_sha1_file
+							(sha1, &type, &len);
+						crc = htonl(crc32(crc,
+								  buf, len));
+						switch(packdb_process
+						       (sha1, crc)) {
+						case 0:
+							if (objcrc32p)
+								*objcrc32p
+									= crc;
+							status = 1;
+							break;
+						case 1:
+							error("packdb insert"
+							      " botched");
+							status = 0;
+							break;
+						case -1:
+							error("packdb failed");
+							status = 0;
+							break;
+						}
+					}
+					packdb_close();
+					return status;
+				}
+			} else {
+				return 0;
+			}
+#else
+			return has_loose_object_local_crc(sha1, objcrc32p);
+#endif
+		}
+	}
+	return has_loose_object_crc(sha1, objcrc32p);
+}
+
 static void check_tree(const void *buf, size_t size)
 {
 	struct tree_desc desc;
@@ -2600,7 +2952,8 @@ static void check_tag(const void *buf, size_t size)
 		die("corrupt tag");
 }
 
-static int index_mem(unsigned char *sha1, void *buf, size_t size,
+static int index_mem(unsigned char *sha1, uint32_t *objcrc32p,
+		     void *buf, size_t size,
 		     enum object_type type,
 		     const char *path, unsigned flags)
 {
@@ -2631,22 +2984,26 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	}
 
 	if (write_object)
-		ret = write_sha1_file(buf, size, typename(type), sha1);
+		ret = write_sha1_file_extended(buf, size, typename(type), sha1,
+					       objcrc32p);
 	else
-		ret = hash_sha1_file(buf, size, typename(type), sha1);
+		ret = hash_sha1_file_extended(buf, size, typename(type), sha1,
+				     objcrc32p);
 	if (re_allocated)
 		free(buf);
 	return ret;
 }
 
-static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
+static int index_pipe(unsigned char *sha1, uint32_t *objcrc32p,
+		      int fd, enum object_type type,
 		      const char *path, unsigned flags)
 {
 	struct strbuf sbuf = STRBUF_INIT;
 	int ret;
 
 	if (strbuf_read(&sbuf, fd, 4096) >= 0)
-		ret = index_mem(sha1, sbuf.buf, sbuf.len, type,	path, flags);
+		ret = index_mem(sha1, objcrc32p, sbuf.buf, sbuf.len, type,
+				path, flags);
 	else
 		ret = -1;
 	strbuf_release(&sbuf);
@@ -2655,24 +3012,26 @@ static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
 
 #define SMALL_FILE_SIZE (32*1024)
 
-static int index_core(unsigned char *sha1, int fd, size_t size,
+static int index_core(unsigned char *sha1, uint32_t *objcrc32p,
+		      int fd, size_t size,
 		      enum object_type type, const char *path,
 		      unsigned flags)
 {
 	int ret;
 
 	if (!size) {
-		ret = index_mem(sha1, NULL, size, type, path, flags);
+		ret = index_mem(sha1, objcrc32p, NULL, size, type, path, flags);
 	} else if (size <= SMALL_FILE_SIZE) {
 		char *buf = xmalloc(size);
 		if (size == read_in_full(fd, buf, size))
-			ret = index_mem(sha1, buf, size, type, path, flags);
+			ret = index_mem(sha1, objcrc32p,
+					buf, size, type, path, flags);
 		else
 			ret = error("short read %s", strerror(errno));
 		free(buf);
 	} else {
 		void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-		ret = index_mem(sha1, buf, size, type, path, flags);
+		ret = index_mem(sha1, objcrc32p, buf, size, type, path, flags);
 		munmap(buf, size);
 	}
 	return ret;
@@ -2692,7 +3051,8 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
  * avoid mmaping it in core is to deal with large binary blobs, and
  * by definition they do _not_ want to get any conversion.
  */
-static int index_stream(unsigned char *sha1, int fd, size_t size,
+static int index_stream(unsigned char *sha1, uint32_t *ojbcrc32p,
+			int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
@@ -2757,23 +3117,25 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 	return 0;
 }
 
-int index_fd(unsigned char *sha1, int fd, struct stat *st,
-	     enum object_type type, const char *path, unsigned flags)
+int index_fd_extended(unsigned char *sha1, uint32_t *objcrc32p,
+		      int fd, struct stat *st,
+		      enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
 	size_t size = xsize_t(st->st_size);
 
 	if (!S_ISREG(st->st_mode))
-		ret = index_pipe(sha1, fd, type, path, flags);
+		ret = index_pipe(sha1, objcrc32p, fd, type, path, flags);
 	else if (size <= big_file_threshold || type != OBJ_BLOB)
-		ret = index_core(sha1, fd, size, type, path, flags);
+		ret = index_core(sha1, objcrc32p, fd, size, type, path, flags);
 	else
-		ret = index_stream(sha1, fd, size, type, path, flags);
+		ret = index_stream(sha1, objcrc32p,
+				   fd, size, type, path, flags);
 	close(fd);
 	return ret;
 }
 
-int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags)
+int index_path_extended(unsigned char *sha1, uint32_t *objcrc32p, const char *path, struct stat *st, unsigned flags)
 {
 	int fd;
 	struct strbuf sb = STRBUF_INIT;
@@ -2784,7 +3146,8 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 		if (fd < 0)
 			return error("open(\"%s\"): %s", path,
 				     strerror(errno));
-		if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+		if (index_fd_extended(sha1, objcrc32p, fd, st,
+				      OBJ_BLOB, path, flags) < 0)
 			return error("%s: failed to insert into database",
 				     path);
 		break;
@@ -2795,8 +3158,10 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 			             errstr);
 		}
 		if (!(flags & HASH_WRITE_OBJECT))
-			hash_sha1_file(sb.buf, sb.len, blob_type, sha1);
-		else if (write_sha1_file(sb.buf, sb.len, blob_type, sha1))
+			hash_sha1_file_extended(sb.buf, sb.len, blob_type, sha1,
+				       objcrc32p);
+		else if (write_sha1_file_extended(sb.buf, sb.len, blob_type,
+						  sha1, objcrc32p))
 			return error("%s: failed to insert into database",
 				     path);
 		strbuf_release(&sb);
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index f4e8f43..28c2ba2 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -34,17 +34,18 @@ fi
 # git init has been done in an empty repository.
 # make sure it is empty.
 
-find .git/objects -type f -print >should-be-empty
+find .git/objects -type f -a  ! -name crcs -a ! -name packdb -print >should-be-empty
 test_expect_success \
     '.git/objects should be empty after git init in an empty repo.' \
     'cmp -s /dev/null should-be-empty'
 
-# also it should have 2 subdirectories; no fan-out anymore, pack, and info.
-# 3 is counting "objects" itself
-find .git/objects -type d -print >full-of-directories
+# also it should have 3 subdirectories;
+# no fan-out anymore, pack, and info and crcs.
+# 4 (listed by find) is the result of counting "objects" as well.
+find .git/objects \( -type d -o -name crcs  \) -print >full-of-directories
 test_expect_success \
-    '.git/objects should have 3 subdirectories.' \
-    'test $(wc -l < full-of-directories) = 3'
+    '.git/objects should have 3 subdirectories or files.' \
+    'test $(wc -l < full-of-directories) = 4'
 
 ################################################################
 # Test harness
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 602806d..880b425 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -54,7 +54,7 @@ cd "$TRASH/.git2"
 
 test_expect_success \
     'check unpack without delta' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f  -print) | grep -v crcs |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -84,7 +84,7 @@ unset GIT_OBJECT_DIRECTORY
 cd "$TRASH/.git2"
 test_expect_success \
     'check unpack with REF_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f -print) | grep -v crcs |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -114,7 +114,7 @@ unset GIT_OBJECT_DIRECTORY
 cd "$TRASH/.git2"
 test_expect_success \
     'check unpack with OFS_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f -print) | grep -v crcs |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -211,6 +211,17 @@ test_expect_success \
 			test-3-${packname_3}.idx'
 
 test_expect_success \
+    'verify pack -v -M' \
+    'test -z "`git verify-pack -v -M test-1-${packname_1}.idx \
+			test-2-${packname_2}.idx \
+			test-3-${packname_3}.idx | grep \<no\ md\>`" &&
+     test 0 != `git verify-pack -v -M test-1-${packname_1}.idx | grep md= | wc -l` &&
+     test -z "`git verify-pack -v -M test-1-${packname_1}.idx | grep "should be"`" &&
+     (x=`git verify-pack -v -M test-1-${packname_1}.idx | wc -l`
+     y=`git verify-pack -v -M test-1-${packname_1}.idx |grep -v \<no\ md\> | wc -l`
+     test $x = $y)'
+
+test_expect_success \
     'verify-pack catches mismatched .idx and .pack files' \
     'cat test-1-${packname_1}.idx >test-3.idx &&
      cat test-2-${packname_2}.pack >test-3.pack &&
diff --git a/t/t5301-sliding-window.sh b/t/t5301-sliding-window.sh
index 2fc5af6..ec0d72f 100755
--- a/t/t5301-sliding-window.sh
+++ b/t/t5301-sliding-window.sh
@@ -22,13 +22,19 @@ test_expect_success \
      git repack -a -d &&
      test "`git count-objects`" = "0 objects, 0 kilobytes" &&
      pack1=`ls .git/objects/pack/*.pack` &&
-     test -f "$pack1"'
+     test -f "$pack1"  &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'verify-pack -v, defaults' \
     'git verify-pack -v "$pack1"'
 
 test_expect_success \
+    'verify-pack -v -M, defaults' \
+    'git verify-pack -v -M "$pack1" | grep "<no md>" > tmp
+     test -z "`cat tmp`"'
+
+test_expect_success \
     'verify-pack -v, packedGitWindowSize == 1 page' \
     'git config core.packedGitWindowSize 512 &&
      git verify-pack -v "$pack1"'
@@ -49,12 +55,14 @@ test_expect_success \
      test "`git count-objects`" = "0 objects, 0 kilobytes" &&
      pack2=`ls .git/objects/pack/*.pack` &&
      test -f "$pack2" &&
-     test "$pack1" \!= "$pack2"'
+     test "$pack1" \!= "$pack2" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'verify-pack -v, defaults' \
     'git config --unset core.packedGitWindowSize &&
      git config --unset core.packedGitLimit &&
-     git verify-pack -v "$pack2"'
+     git verify-pack -v "$pack2" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_done
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index f8fa924..da10200 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -37,12 +37,14 @@ test_expect_success \
 test_expect_success \
     'pack-objects with index version 1' \
     'pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
-     git verify-pack -v "test-1-${pack1}.pack"'
+     git verify-pack -v "test-1-${pack1}.pack" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'pack-objects with index version 2' \
     'pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
-     git verify-pack -v "test-2-${pack2}.pack"'
+     git verify-pack -v "test-2-${pack2}.pack" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'both packs should be identical' \
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index d645328..86075a7 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -37,7 +37,8 @@ test_expect_success 'prune stale packs' '
 	git prune --expire 1.day &&
 	test -f $orig_pack &&
 	test -f .git/objects/tmp_2.pack &&
-	! test -f .git/objects/tmp_1.pack
+	! test -f .git/objects/tmp_1.pack  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -50,7 +51,8 @@ test_expect_success 'prune --expire' '
 	test-chmtime =-86500 $BLOB_FILE &&
 	git prune --expire 1.day &&
 	test $before = $(git count-objects | sed "s/ .*//") &&
-	! test -f $BLOB_FILE
+	! test -f $BLOB_FILE  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -64,7 +66,8 @@ test_expect_success 'gc: implicit prune --expire' '
 	test-chmtime =-$((2*$week+1)) $BLOB_FILE &&
 	git gc &&
 	test $before = $(git count-objects | sed "s/ .*//") &&
-	! test -f $BLOB_FILE
+	! test -f $BLOB_FILE  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -78,8 +81,8 @@ test_expect_success 'gc: refuse to start with invalid gc.pruneExpire' '
 test_expect_success 'gc: start with ok gc.pruneExpire' '
 
 	git config gc.pruneExpire 2.days.ago &&
-	git gc
-
+	git gc &&
+	test -z "`git count-objects -v -M | grep MD`"
 '
 
 test_expect_success 'prune: prune nonsense parameters' '
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index bafcca7..1274c79 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -53,8 +53,8 @@ pull_to_client () {
 			git symbolic-ref HEAD refs/heads/`echo $heads \
 				| sed -e "s/^\(.\).*$/\1/"` &&
 
-			git fsck --full &&
-
+			git fsck --full  &&
+			test -z "`git count-objects -v -M | grep MD`" &&
 			mv .git/objects/pack/pack-* . &&
 			p=`ls -1 pack-*.pack` &&
 			git unpack-objects <$p &&
@@ -142,7 +142,8 @@ test_expect_success 'fsck in shallow repo' '
 test_expect_success 'simple fetch in shallow repo' '
 	(
 		cd shallow &&
-		git fetch
+		git fetch &&
+		test -z "`git count-objects -v -M | grep MD`"
 	)
 '
 
@@ -245,7 +246,8 @@ test_expect_success 'clone shallow object count' '
 		cd shallow &&
 		git count-objects -v
 	) > count.shallow &&
-	grep "^count: 52" count.shallow
+	grep "^count: 52" count.shallow  &&
+	test -z "`git count-objects -v -M | grep MD`"
 '
 
 test_done
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index e0af4c4..3ce6027 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -14,6 +14,12 @@ test_bundle_object_count () {
 	test "$2" = $(grep '^[0-9a-f]\{40\} ' verify.out | wc -l)
 }
 
+test_bundle_mds_count () {
+	git verify-pack -v -M "$1" >verify.out &&
+	test "$2" = $(grep '^[0-9a-f]\{40\} ' verify.out | grep -v "<no md>" | wc -l)
+}
+
+
 test_expect_success setup '
 	echo >file original &&
 	git add file &&
@@ -215,7 +221,8 @@ test_expect_success 'bundle 1 has only 3 files ' '
 		cat
 	) <bundle1 >bundle.pack &&
 	git index-pack bundle.pack &&
-	test_bundle_object_count bundle.pack 3
+	test_bundle_object_count bundle.pack 3 &&
+	test_bundle_mds_count bundle.pack 3
 '
 
 test_expect_success 'unbundle 2' '
@@ -238,7 +245,8 @@ test_expect_success 'bundle does not prerequisite objects' '
 		cat
 	) <bundle3 >bundle.pack &&
 	git index-pack bundle.pack &&
-	test_bundle_object_count bundle.pack 3
+	test_bundle_object_count bundle.pack 3 &&
+	test_bundle_mds_count bundle.pack 3
 '
 
 test_expect_success 'bundle should be able to create a full history' '
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 438aaf6..63b4a13 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -109,6 +109,12 @@ test_expect_success \
 	'A: verify pack' \
 	'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
 
+test_expect_success \
+	'A: verify pack -v -M --- all objects have CRCs' \
+	'for p in .git/objects/pack/*.pack;
+	do git verify-pack -v -M $p | grep "<no md>" > tmp;
+	   test -z "`cat tmp`" || exit; done'
+
 cat >expect <<EOF
 author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -1504,7 +1510,7 @@ INPUT_END
 test_expect_success \
 	'O: blank lines not necessary after other commands' \
 	'git fast-import <input &&
-	 test 8 = `find .git/objects/pack -type f | wc -l` &&
+	 test 8 = `find .git/objects/pack -type f | grep -v .mds | wc -l` &&
 	 test `git rev-parse refs/tags/O3-2nd` = `git rev-parse O3^` &&
 	 git log --reverse --pretty=oneline O3 | sed s/^.*z// >actual &&
 	 test_cmp expect actual'
diff --git a/upload-pack.c b/upload-pack.c
index 470cffd..9721074 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -320,11 +320,32 @@ static int got_sha1(char *hex, unsigned char *sha1)
 {
 	struct object *o;
 	int we_knew_they_have = 0;
+	int has_sha1_objcrc, has_objcrc32, has_sha1_blobcrc, has_blobcrc;
+	uint32_t sha1_objcrc, objcrc32, sha1_blobcrc, blobcrc;
 
-	if (get_sha1_hex(hex, sha1))
+	if (get_sha1_hex_crc(hex, sha1, &has_sha1_objcrc, &sha1_objcrc,
+			     &has_sha1_blobcrc, &sha1_blobcrc))
 		die("git upload-pack: expected SHA1 object, got '%s'", hex);
 	if (!has_sha1_file(sha1))
 		return -1;
+	has_sha1_objcrc = has_sha1_file_crc(sha1, &objcrc32);
+	if (has_sha1_objcrc && has_objcrc32 && objcrc32 != sha1_objcrc) {
+		die("git upload-pack: SHA1 collision on MD for %s", hex);
+	}
+	has_blobcrc = !get_blob_mds(sha1, &blobcrc);
+
+	if (has_sha1_blobcrc) {
+		if (has_blobcrc) {
+			if (sha1_blobcrc != blobcrc) {
+				die("git upload-pack: SHA1 collision "
+				    "on blob-MD for %s", hex);
+			}
+		} else {
+#ifdef BLOB_MDS_CHECK
+			push_mds_check(sha1, sha1_blobcrc);
+#endif
+		}
+	}
 
 	o = lookup_object(sha1);
 	if (!(o && o->parsed))
@@ -719,7 +740,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed mds-check";
 	struct object *o = parse_object(sha1);
 	const char *refname_nons = strip_namespace(refname);
 
@@ -775,6 +796,9 @@ static void upload_pack(void)
 	if (want_obj.nr) {
 		get_common_commits();
 		create_pack_file();
+#ifdef BLOB_MDS_CHECK
+		process_mds_checks(die);
+#endif
 	}
 }
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] Implement fast hash-collision detection
From: Jeff King @ 2011-11-30  6:25 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, gitster, pclouds, spearce, torvalds
In-Reply-To: <1322603788.1728.190.camel@yos>

On Tue, Nov 29, 2011 at 01:56:28PM -0800, Bill Zaumen wrote:

> The additional CRC (easily changed to whatever message digest one might
> prefer) makes a malicious attack far more difficult: the modified file
> has to have both the same SHA-1 hash (including the Git header) and 
> the same CRC (not including the Git header).

Only if the attack actually involves creating a collision on both. But I
think the important attacks bypass your CRC anyway. Consider this attack
scenario:

  1. Linus signs a tag (or a commit) and pushes it to kernel.org.

  2. kernel.org gets hacked, and the attacker replaces an object with
     an evil colliding version[1].

  3. I clone from kernel.org, and run "git tag --verify". Git says it's
     OK, because the signature checks out, but I have a bogus object.

How does your CRC help? If I understand your scheme correctly,
kernel.org will have told me the CRC of all of the objects during the
clone. But that isn't part of what Linus signed, so the attacker in step
2 could just as easily have overwritten kernel.org's crc file, and the
signature will remain valid.

[1] This is an over-simplification, of course. Because the only even
    remotely feasible attacks on sha1 are birthday attacks, not pre-image
    attacks, there is a step 0 in which the attacker generates a
    colliding pair, convinces Linus to commit it, and then waits.

    Which is probably really hard, but for the purposes of this
    discussion, we assume the attacker is capable of inserting a
    colliding object maliciously into a repo you will fetch from.
    Otherwise, the integrity of sha1 isn't an issue at all.

> An efficient algorithm to do both simultaneously does not yet exist.
> So, if we could generate a SHA-1 collision in one second, it would
> presumably take billions of seconds (many decades of continuous
> computation) to generate a SHA-1 hash with the same CRC, and well
> before a year has elapsed, the original object should have been in all
> the repositories, preventing a forged object from being inserted. Of
> course, eventually you might need a real message digest.

This is wrong, for two reasons.

  1. The method for generating an object that collides in both sha-1 and
     CRC is not necessarily to generate a colliding sha-1 and then do a
     pre-image attack on the CRC. It is to do a birthday attack on the
     sha-1 and the CRC together. Which halves the bit-strength of the
     CRC to 16 bits (just as we can generally find collisions in 160-bit
     sha1s in 2^80). 16 bits isn't a lot to add when you are trying to
     fix a broken cryptosystem (it's not broken yet, obviously, but when
     it does get broken, will it be because computing reaches the 2^57
     or so that sha1 is broken at, or will it be because a new weakness
     is found that drops sha1's bit-strength to something much lower?).

     This assumes that you can combine the two in a birthday attack.
     Certainly this analysis works against brute-force 2^80 sha1
     collision attacks. But I haven't actually read the details of the
     sha1 attacks, so maybe some of the tweaking they do to get those
     results makes it harder. On the other hand, attacking CRC is far
     from hard, so I certainly wouldn't stake money that sha1 reseachers
     couldn't tweak their attacks in a way that also allows finding CRC
     collisions. You say that an algorithm to do both simultaneously
     does not yet exist. But is that because it's hard, or simply
     because nobody has bothered trying?

     Anyway, all of that is just reiterating that CRC should not be used
     as a security function. It can easily be replaced in your scheme by
     sha-256, which does have the desired properties.

  2. Your attack seems to be "find the sha-1 collision, publish one of
     your colliding objects (i.e., the innocent-looking half), then try
     to break the CRC". And then you claim that by the time you find the
     CRC, everybody will already have the object.

     But wouldn't a smarter attack be to first find the collision, including
     the CRC, and only _then_ start the attack? Then nobody will have
     the object.

     Moreover, it's not true that after a year everyone will have the
     object. People still run "git clone" against kernel.org. Those
     repos do not have the object.

> The weakness of a CRC as an integrity check is not an issue since it
> is never used alone: it's use is more analogous to the few extra bits
> added to a data stream when error-detecting codes are used.  I used a
> CRC in the initial implementation rather than a message digest because
> it is faster, and because the initial goal was to get things to work
> correctly.  In any case, the patch does not eliminate any code in
> which Git already does a byte-by-byte comparison.  In cases where Git
> currently assumes that two objects are the same because the SHA-1
> hashes are the same, the patch compares CRCs as an additional test.

Right. I don't claim that your scheme makes git any weaker. I just claim
that it fails to solve the problems people are actually concerned about,
and it adds a lot of complexity while doing so.

> Regarding your [Jeff's] second concern, "how does this alternative
> digest have any authority?" there are two things to keep in mind.
> First, it is a supplement to the existing digest.

Right, but we are assuming that sha1 is broken. That's the whole
security problem. So the existing digest is not worth much.

> Second, any value of the CRC that is stored permanently (baring bugs,
> in my implementation, of course) is computed locally - when a loose
> object is created or when a pack file's index is created.  At no point
> is a CRC that was obtained from another repository trusted. While the
> patch modifies Git so that it can send CRCs when using the git
> protocol, these CRCs are never stored, but are instead used only for
> cross checks.  If one side or the other "lies", you get an error.

But if I don't already have the object, then I have nothing to compare
against. So when I get it from kernel.org, I have to simply accept that
the object I'm getting is good, and write it into my object db.

> BTW, regarding your [Jeff's] discussion about putting an additional
> header in commit messages - I tried that.  The existing versions of
> Git didn't like it: barring a bug in my test code, it seems that Git
> expects headers in commit messages to be in a particular order and
> treats deviations from that to be an error.

Yes, the header has to go at the end of the existing headers. But I
don't see any reason that would be a problem for the scheme I described.

> I even tried appending blank lines at the end of a commit, with spaces
> and tabs encoding an additional CRC, and that didn't work either - at
> least it never got through all the test programs, failing in places
> like the tests involving notes.

Yes, git will helpfully trim whitespace in commit messages. With the
current code, you can hide arbitrary bytes in a commit message after a
NUL, but don't do that. It's not guaranteed to stay that way, and the
appropriate place to add new information is in a header.

> In any case, you'd have to phase in such a change gradually, first
> putting in the code to read the new header if it is there, and
> subsequently (after ample time so that everyone is running a
> sufficiently new version) enabling the code to create the new header.

Current git should ignore headers that it doesn't understand. I haven't
tested this, but Junio recently has been experimenting with
gpg-signature lines in commits, and I'm pretty sure he checked that
older gits properly ignore them.

-Peff

^ permalink raw reply

* [PATCH 2/3] Add documentation for fast hash collision detection
From: Bill Zaumen @ 2011-11-30  6:12 UTC (permalink / raw)
  To: git; +Cc: gitster

The documentation added is technical documentation describing
how fast hash collision detection operates and changes to
several git commands (a few new command-line arguments, mostly).

Note: the change to the implementation is in the child of
this commit.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 Documentation/git-count-objects.txt          |   12 +-
 Documentation/git-index-pack.txt             |   16 +-
 Documentation/git-verify-pack.txt            |   23 ++
 Documentation/technical/collision-detect.txt |  385 ++++++++++++++++++++++++++
 Documentation/technical/pack-format.txt      |   40 +++
 5 files changed, 471 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/collision-detect.txt

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 23c80ce..4cdbaf5 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -8,7 +8,7 @@ git-count-objects - Count unpacked number of objects and their disk consumption
 SYNOPSIS
 --------
 [verse]
-'git count-objects' [-v]
+'git count-objects' [-v] [-M]
 
 DESCRIPTION
 -----------
@@ -25,6 +25,16 @@ OPTIONS
 	objects, number of packs, disk space consumed by those packs,
 	and number of objects that can be removed by running
 	`git prune-packed`.
+-M::
+--count-md::
+	Report the number of loose objects with no stored message digests.
+	With the -v option, the number of missing "mds" files (these
+	contain the message digests for the SHA1 hashes in the corresponding
+	"idx" files) is reported, along with a count of the number of
+	mds files whose size is wrong (e.g., an index was created but the
+	existing MDS file was not updated) and a count of the number of
+	objects in pack files that do not have a stored message digest.
+	Values that are zero are not shown.
 
 GIT
 ---
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..3285fae 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -11,14 +11,14 @@ SYNOPSIS
 [verse]
 'git index-pack' [-v] [-o <index-file>] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
-                 [<pack-file>]
+		 [-m <mds-file>] [<pack-file>]
 

 DESCRIPTION
 -----------
-Reads a packed archive (.pack) from the specified file, and
-builds a pack index file (.idx) for it.  The packed archive
-together with the pack index can then be placed in the
+Reads a packed archive (.pack) from the specified file, and builds a
+pack index file (.idx) and a pack mds file (.mds) for it.  The packed
+archive together with the pack index can then be placed in the
 objects/pack/ directory of a git repository.
 

@@ -35,6 +35,14 @@ OPTIONS
 	fails if the name of packed archive does not end
 	with .pack).
 
+-m <mds-file>::
+	Write the generated pack mds file into the specified.
+	file Without this option, the name of the pack mds
+	file is constructed from the name of packed archive
+	file by replacing .pack with .idx (and the program
+	fails if the name of packed archive does not end
+	with .pack).
+
 --stdin::
 	When this flag is provided, the pack is read from stdin
 	instead and a copy is then written to <pack-file>. If
diff --git a/Documentation/git-verify-pack.txt b/Documentation/git-verify-pack.txt
index cd23076..e81c514 100644
--- a/Documentation/git-verify-pack.txt
+++ b/Documentation/git-verify-pack.txt
@@ -33,6 +33,15 @@ OPTIONS
 	Do not verify the pack contents; only show the histogram of delta
 	chain length.  With `--verbose`, list of objects is also shown.
 
+-M::
+--show-mds::
+	Show the message digests along with the 40-character object names
+	(SHA1 value in hexidecimal). Ignored if --stat-only is set. If
+	--verbose is not set, only the table indexed by object names is
+	shown, although the files will be verified.  The message digests
+	printed are the actual ones - if the MDS file does not contain these,
+	the verification will fail.
+
 \--::
 	Do not interpret any more arguments as options.
 
@@ -48,6 +57,20 @@ for objects that are not deltified in the pack, and
 
 for objects that are deltified.
 
+When the -M option is used, the offset-in-pack field is followed by an
+entry giving the message digest.  The format used is:
+
+      md=0xHEX_VALUE
+
+when a message digest exists, and
+
+     <no md>
+
+when a message digest does not exist.  These entries precede the depth
+entry for deltified objects.  A non-existent message digest will be shown
+only if the MDS file is missing - while the MDS-file format allows missing
+entries, the file will not be considered valid.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/collision-detect.txt b/Documentation/technical/collision-detect.txt
new file mode 100644
index 0000000..0d33da8
--- /dev/null
+++ b/Documentation/technical/collision-detect.txt
@@ -0,0 +1,385 @@
+Fast Hash-Collision Detection
+=============================
+
+Initially Git used a SHA-1 hash as an object ID under the assumption
+that a hash collision would never occur in practice. While an
+accidental SHA-1 collision is extremely unlikely, it is possible,
+although very expensive, to generate multiple files with the same
+SHA-1 value in under 2^57 operations.  With computer performance
+increasing significantly from one year to the next, Git's assumptions
+about SHA-1 will eventually not hold in the case of a malicious
+attempt to damage a project.  One should note that just because the
+probability of a SHA-1 collision occurring accidentally is extremely
+low does not mean a priori that SHA-1 provides an adequate safety
+margin for preventing a malicious attempt to damage repositories and a
+discussion below outlines some of the issues regarding this
+possibility.
+
+While one could modify Git to use SHA-224, SHA-256, SHA-384, or
+SHA-512 instead of SHA-1, the change would have to support the
+original format as well (in order to deal with existing Git
+repositories). While one could convert an existing repository to use
+the new hash function, this would require rewriting every object,
+including trees and commits.  The outcome would be problematic given
+the existence of email and documentation that might name commits by
+their SHA-1 hashes. One should note that Git performs a byte-by-byte
+check for hash collisions when a pack file is indexed.  Unfortunately,
+during fetch or pull operations, Git tries to avoid copying objects
+when a peer already has a copy, and this is determined solely on the
+basis of SHA-1 hashes.
+
+The following describes a modification to Git's initial design that is
+(a) relatively easy to implement, (b) is compatible with and can
+interoperate with older versions of Git (both the program and the
+repositories) (c) has a small computational overhead, and (d)
+increases security substantially, with a goal of detecting hash
+collisions early and automatically.  Because the implementation is
+relatively simple and the overhead very low, it makes sense to
+incorporate this change (or some alternative) before the security
+issue becomes a serious problem.
+
+Although Git generally uses that assumption that there will never be a
+hash collision using SHA-1 in practice, under some circumstances, Git
+will detect collisions via a byte-by-byte comparison as objects are
+added to the repository or as pack files are indexed.  This test is
+performed when an index is built (via the Git pack-index command), but
+a byte-by-byte comparison was deemed too computationally expensive to
+use in all circumstances: with pack files in particular, simply
+extracting an object can require not only decompressing it, but
+handling a series of delta encodings.
+
+Collision detection has been extended by computing a message digest or
+CRC of the object's contents (i.e., excluding the Git header). These
+message digests are stored separately from Git objects and are used
+for an independent collision test - looking up the message digests or
+CRCs using the SHA-1 IDs as a key can be done quickly, and comparing
+them is fast as well (a single unsigned-integer comparison for a
+32-bit CRC).  Assuming statistical independence in the CRC case, the
+changes of an undetected SHA-1 collision, should one occur, is 1 in
+2^32.  This extension is computationally cheap (timing the Git test
+suite (run via 'make test') showed only a small increase in running
+time and the extension is backwards compatible with existing Git
+repositories - if a CRC is not available for a SHA-1 value, the
+implementation reverts to its former behavior and simply compares
+SHA-1 values.  The CRC can, of course, be easily replaced with a
+SHA-256 or SHA-512 digest to reduce the chances of an undetected SHA-1
+collision to nearly zero.  For that reason, in the following we will
+use the terms MD (Message Digest) and CRC interchangeably.
+
+The implementation creates a directory in .git/objects named "crcs",
+which contains sub-directories and file names identical to the
+sub-directories in objects used to store loose objects: a two
+character directory name, with a 38-character file name, the
+concatenation of which gives the SHA-1 hash for the object.  the files
+in sub-directories of "crcs", however, simply contain 32-bit CRCs
+stored in network byte order.  In addition, for each pack file
+(.../objects/pack/FILE.pack), there is a corresponding file named
+.../objects/pack/FILE.mds in addition to .../objects/pack/FILE.idx.
+The MDS file contains the CRCs, stored in the same order as the SHA-1
+hashes in .../objects/pack/FILE.idx.  The format of the MDS file is
+described in pack-format.txt.
+
+Thus, the directory structure (only part of it is shown) is as
+follows:
+
+ .git---.
+	|
+	|-objects-.
+	|	  |--XX--.
+	.	  |	 |--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+	.	  |	 .
+	.	  |	 .
+		  |	 .
+		  .
+		  .
+		  .
+		  |-crcs-.
+		  |	 |--XX--.
+		  |	 |	|--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+		  |	 .	.
+		  |	 .	.
+		  |	 .	.
+		  |
+		  |-pack-.
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.pack
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.idx
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.mds
+		  |	 .
+		  |	 .
+		  |	 .
+		  |
+		  `-info-.
+			 .
+			 .
+
+The mds files are relatively short - an average of 5 bytes per CRC
+plus some fixed overhead due to a header and trailer, with the CRCs
+listed in the same order as the SHA-1 values in the matching idx file
+(a function named nth_packed_object_objcrc32 has the same signature as
+the previously-defined function nth_packed_object_offset, so the
+procedure to look up the MD/CRC value from a pack file is the same).
+
+For fetch and push operations, the commands fetch-pack, send-pack,
+receive-pack, and upload-pack were modified so that various object IDs
+can have any one of the following formats, with each number
+represented in hexadecimal:
+
+		SHA1
+		SHA1-MD
+		SHA1-MD-BlobMD
+
+where SHA1 is the SHA-1 hash of a commit, MD is the 32-bit CRC of the
+commit (uncompressed, not including the Git object header), and BlobMD
+is a CRC of the CRCs for each blob found by traversing the commit's
+tree using a depth-first search, with blobs processed in the order
+they appear in the trees (this is a relatively fast operation because
+the CRCs of each blob are stored in the repository).
+
+Both receive-pack and upload-pack send a capability named "mds-check"
+to allow the two longer object IDs.  When the CRCs are available, the
+longer formats are used, but are generated only by fetch-pack and
+send-pack: because of backwards-compatibility constraints,
+receive-pack and upload-pack cannot determine the capabilities of
+fetch-pack and send-pack when connected to a remote repository).
+While the message digests associated with each object are computed
+once and stored, the "BlobMD" ones are computed each time - the
+"BlobMD" ones are only used by the Git commands fetch-pack,
+upload-pack, send-pack, and receive-pack.  In each case, a
+hash-collision check is performed only if the message digest used is
+available.  The collision checks during a fetch, push, or pull command
+are done by receive-pack and upload pack because send-pack and
+fetch-pack do not receive their peers' MD and BlobMD values.
+
+
+Implementation Details
+----------------------
+
+Functions to manipulate the message digest/CRC database are declared
+in the file crcdb.h.  The implementation as described above is in the
+file objd-crcdb.c: it is thus easy to change the implementation of how
+these objects are stored with minimal impact on the rest of the source
+code.
+
+In pack-write.c, there is a new function named write_mds_file with the
+same function signature as write_idx_file.  Both are called in pairs
+(write_idx_file first) so that the idx file and mds file for the
+corresponding pack file will always be created.
+
+In commit.c, there is a new function that recursively traverses the
+tree associated with a commit and finds the "blob" entries and looks
+up those entries' message digests in order to compute a message digest
+of these message digests (which is faster than computing a message
+digest of all the bytes in the blobs associated with a commit).
+
+Various function names signatures in sha1_file.c were changed to take
+two additional arguments, the first a pointer to an int used as a flag
+to indicate whether a MD/CRC exists, and the second a pointer to a
+uint32_t containing the MD/CRC (a CRC currently).  For backwards
+compatibility with previously existing functions, those functions had
+there names changed by adding "_extended" to them, with macros in
+cache.h defined so that existing code that does not need to obtain a
+MD/CRC would not be changed. There are a few additional functions
+added to sha1_file.c such as one to determine if there is an MD/CRC
+for a given SHA-1 value. Many changes in the rest of Git that result
+from this simply change the arguments to these functions.  As a
+convention, most such arguments use names like objcrc32, objcrc32p,
+has_objcrc32 and has_objcrc32p in order to make it easy to find areas
+of the code implementing hash-collision detection using the git-grep
+command.
+
+A few data structures (notably struct pack_idx_entry and struct
+packed_git) contain fields used to store has_objcrc32 and objcrc32
+values or data associated with MDS files.  These are used while
+building new MDS files.
+
+Some of the Git commands (count-objects, index-pack, and verify-pack)
+have additional command-line options related to the MD/CRCs and mds
+files. This makes it possible to explicitly name an mds file being
+created and to request that various listings show both the MD/CRC
+values in addition to SHA-1 hashes (the MD/CRC values are not listed
+by default in case user-defined scripts assume the current behavior).
+
+For C files, changes were made to the following files (compared to
+commit 017d1e13) for the initial collision-detection implementation:
+
+       * builtin/count-objects.c
+       * builtin/fetch-pack.c
+       * builtin/index-pack.c
+       * builtin/init-db.c
+       * builtin/pack-objects.c
+       * builtin/pack-redundant.c
+       * builtin/prune-packed.c
+       * builtin/prune.c
+       * builtin/receive-pack.c
+       * builtin/send-pack.c
+       * builtin/verify-pack.c
+       * commit.c
+       * environment.c
+       * fast-import.c
+       * gdbm-packdb.c (new file)
+       * git.c
+       * hex.c
+       * http.c
+       * objd-crcdb.c (new file)
+       * pack-write.c
+       * sha1_file.c
+       * upload-pack.c
+
+The other files had changes that reflected changes to function
+signatures.
+
+The header files that were modified are
+
+    * cache.h
+    * commit.h
+    * crcdb.h (new file)
+    * pack.h
+    * packdb.h (new file)
+
+where the changes are mostly new function declarations, a few macros
+for backwards-compatibility, and a few additional fields in some
+data structures.
+
+Minor changes were made to the test suite: t0000-basic.sh,
+t5300-pack-object.sh, t5304-prune.sh, t5500-fetch-pack.sh, and
+t5510-fetch.sh.
+
+The packdb functions are conditionally compiled and by default are not
+used.  When used, these use GDBM to store CRCs for SHA-1 hashes in
+cases in which the hash was not available - in this case the hash will
+be recomputed and stored for future use.  Testing indicates that
+packdb is not needed. It may be worth turning on during debugging to
+verify if a problem is discovered involving a missing MD/CRC. (As an
+aside, the packdb code is based on a test to see if GDBM would be
+efficient enough to store the MD/CRC values in general, thus avoiding
+the need to create "mds" files and reducing the number of files in the
+"crcs" directory, but it turned out that performance was not
+acceptable.)
+
+Security-Issue Details
+----------------------
+
+Without hash-collision detection, Git has a higher risk of data
+corruption due to the obvious hash-collision vulnerability, so the
+issue is really whether a usable vulnerability exists. Recent research
+has shown that SHA-1 collisions can be found in 2^63 operations or
+less.  While one result claimed 2^53 operations, the paper claiming
+that value was withdrawn from publication due to an error in the
+estimate. Another result claimed a complexity of between 2^51 and 2^57
+operations, and still another claimed a complexity of 2^57.5 SHA-1
+computations. A summary is available at
+<http://hackipedia.org/Checksums/SHA/html/SHA-1.htm#SHA-1>. Given the
+number of recent attacks, possibly by governments or large-scale
+criminal enterprises
+(<http://www.csmonitor.com/World/terrorism-security/2011/0906/Iranian-government-may-be-behind-hack-of-Dutch-security-firm>,
+<http://en.wikipedia.org/wiki/Operation_Aurora>,
+<http://en.wikipedia.org/wiki/Botnet#Historical_list_of_botnets>),
+which include botnets with an estimated 30 million computers, there is
+reason for some concern: while generating a SHA-1 collision for
+purposes of damaging a Git repository is extremely expensive
+computationally, it is possibly within reach of very well funded
+organizations. 2^32 operations, even if the operations are as
+expensive as computing a SHA-1 hash of a modest source-code file, can
+be performed in a reasonably short period of time on the type of
+hardware widely used in desktop or laptop computers at present. With
+sufficient parallelism, 30 million personal computers sufficient for
+playing the latest video games could perform 2^56 operations in a
+reasonable time.
+
+The security implications depend on how Git is used.  In the simplest
+case in which a single, shared repository is used by a number of
+developers, with source code only shared though this repository, the
+problems are minimal - Git will not allow one to insert an object
+whose SHA-1 hash matches that of an existing object.  Since an
+attacker would not know the SHA-1 hash until the correct object is
+already in the shared repository, all an attacker would succeed in
+doing is to create some confusion in his/her private repository and
+working copy (but note that some of the assumptions break down if
+developers email source files between themselves rather than transferring
+everything via a Git repository).
+developers, with source code only shared though this repository, the
+problems are minimal - Git will not allow one to insert an object
+whose SHA-1 hash matches that of an existing object.  Since an
+attacker would not know the SHA-1 hash until the correct object is
+already in the shared repository, all an attacker would succeed in
+doing is to create some confusion in his/her private repository and
+working copy (but note that some of the assumptions break down if
+developers email source files between themselves rather than transferring
+everything via a Git repository).
+
+In other cases, however, there could be an issue if a SHA-1 collision
+can be created quickly enough. As an example, suppose one is using the
+"Integration-manager workflow" described in "Pro Git" (see
+<http://progit.org/book/ch5-1.html>) with a "blessed repository" BR
+and two public developer repositories DPR1 and DPR2.  Suppose a
+developer puts a legitimate change to the code into DPR2. Another
+developer with read access to DPR2 and write access to DPR1
+immediately fetches this commit, and replaces one file with a modified
+version of the same size that has the same SHA-1 value, and then puts
+the commit into DPR1 (the change may be a trivial one designed to
+introduce an obscure buffer overflow error that can be exploited and
+may not stand out in a code review).  At this point DPR1 and DPR2 have
+identical commits (i.e., the same commit object) but with one file
+modified.  No local test on either DPR1 or DPR2 will uncover a
+discrepancy based on SHA-1 values, object contents, and digital
+signatures for commits.  Further assume that both developers send an
+email to the "integration manager" notifying him/her of the changes.
+
+Now suppose the above changes occurred late on a Friday afternoon and
+over a weekend.  On the next Monday morning, the "integration manager"
+reads the emails and pulls changes from DPR1 and then DPR2.  Once a
+'fetch' from DPR1 is complete, a subsequent fetch from DBR2 will not
+transfer any data related to the commit, as the integration manager's
+repository already has a commit with the matching SHA-1 value.  A
+modified copy will have been introduced into the system, and the
+developers using DPR2 may never notice the difference - as they will
+pull from DPR2 more often than BR, they will most likely have the
+correct files and the commit in their local repositories and the
+transfer protocol will avoid sending the commit if it is already
+available.  Furthermore, if the modified file introduces a hard-to-find
+buffer overflow testing may not uncover the problem.
+
+If the file in question is modified again, and a thin pack is used in
+a fetch so that the change is delta-encoded, the SHA-1 hashes may
+differ after de-deltafication, but such a change might not be
+introduced before a release.  While obtaining a different SHA-1
+hash than expected would be detected, the error would be a corrupted
+repository - a missing SHA-1 value in the tree associated with a commit.
+It would take some effort to figure out what went wrong - what
+happens depends on the state of both the client and server when a
+git-fetch operation runs.
+
+As a justification for the scenarios just described, one can use MD5
+as a model: http://www.mscs.dal.ca/~selinger/md5collision/ gives the
+following two 128-byte sequences (expressed in hexadecimal) as ones
+with the same MD5 hash:
+
+d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
+55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
+d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
+e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70
+
+and
+
+d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
+55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
+d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
+e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70
+
+These can be used for test purposes.  If you change the first 16 bytes
+of both sequences to the same new value, before the point where the
+files differ, you will get different MD5 hash values.  If you append
+the same text to both files, the MD5 values of the modified files are
+equal (and, of course, different from the previous value) but
+fortunately Git includes the length of an object's contents in the
+object's header, and the SHA-1 hash includes the header).  In most
+cases, whever a file is modified during software development, the
+file's length will change.  This causes a change early enough in the
+object so that applying the same patch to two different files that
+have the same SHA-1 value will typically result in two files with
+different SHA-1 values.  So, the result of a hash collision when
+multiple remote repositories are used would initially be different
+versions of the same file for the same commit in different
+repositories, possibly followed by some of the repositories being
+corrupted as one of these files is modified, depending on the state
+of each server and client when a fetch or pull operation is run.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1803e64..4dfaf92 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -158,3 +158,43 @@ Pack file entry: <+
     corresponding packfile.
 
     20-byte SHA1-checksum of all of the above.
+
+
+= pack-*.mds files contain message digests for objects (CRCs
+  initially, with other options possibly added at a later date such as
+  SHA-256).  The digests are stored in the same order as the sha-1 values
+  in the matching idx file.  These files have the following format:
+
+  - A 6-byte magic number consisting the the characters "PKMDS" followed
+    by a NULL character (0).
+
+  - A one-byte version number (= 1)
+
+  - A one-byte field-length value for message digest fields, in units of
+    4-byte words, with the legal value being 1. The length of the message
+    digest fields in bytes is denoted as wsize below).
+
+  - A set of blocks, each of which contains 4 entries encoded as follows:
+
+      * four one-byte fields, one per entry, for which a zero value
+	indicates that a matching entry does not exist and for which a
+	value of 1 indicates that the field contains a 32-bit CRC stored
+	in network byte order.
+
+      * 4 wsize-byte fields, one per entry, each containing a CRC
+	(by conventionwhich should be 0 if the CRC does not exist).
+	For each field, the data it contains should start at the first
+	byte, padded with NULL characters if the field is longer than
+	the digest it stores.
+
+    For the set of all blocks, the nth one-byte field and the nth 4-byte
+    field store the values for the nth entry in the file. The format
+    ensures that each message digest starts on a 32-bit boundary,
+    allowing 32-bit integer operations to be used in copying or
+    comparing values.
+
+  - A 20 byte SHA-1 hash of the SHA-1 hashes naming the objects whose
+    message digests are being stored, in the same order as they
+    appear in the corresponding idx file.
+
+  - A 20 byte SHA-1 hash of all of the above.
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/3] Add CRCDB and PACKDB modules for fast collision detection
From: Bill Zaumen @ 2011-11-30  5:59 UTC (permalink / raw)
  To: git; +Cc: gitster

The CRCDB module maintains a persistent mapping from SHA-1 hashes
to CRCs or message digests for Git objects. The current implementation
uses one file per CRC.  Documentation is in the header file crcdb.h
and there is a preprocessor directive CRCDB that should be set to 0
or 1, with the current choice being 0.

The PACKDB module (normally not turned on but can be conditionally
compiled) can be turned on for debugging/testing. This module
allows a CRC for an object to always be found, computing it from
scratch and storing it in a GDBM database.  It is intended for
use while building index files.  Testing seems to show that it is
not necessary as the needed information is always there.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 crcdb.h       |  191 +++++++++++++++++++++++++++++++++
 gdbm-packdb.c |  247 +++++++++++++++++++++++++++++++++++++++++++
 objd-crcdb.c  |  324 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 packdb.h      |  107 +++++++++++++++++++
 4 files changed, 869 insertions(+), 0 deletions(-)
 create mode 100644 crcdb.h
 create mode 100644 gdbm-packdb.c
 create mode 100644 objd-crcdb.c
 create mode 100644 packdb.h

diff --git a/crcdb.h b/crcdb.h
new file mode 100644
index 0000000..6eabb4f
--- /dev/null
+++ b/crcdb.h
@@ -0,0 +1,191 @@
+#ifndef CRCDB_H
+#define CRCDB_H
+
+/**
+ * CRC Database Support.
+ *
+ * This module uses GDBM to maintain a database mapping SHA-1 object keys
+ * to a 32-bit CRC for purposes of detecting hash collisions.  The CRCs
+ * are stored in the database in network byte order (i.e., as big-endian
+ * 32-bit unsigned integers).  The functions allow for initialization,
+ * queries, adding new entries (with a collision check), and managing
+ * access to alternate databases.
+ *
+ * The preprocessor symbol CRCDB determines the implementation of the
+ * module.
+ * Values:
+ *   0, 1 - implement using directories and files - the first byte of a
+ *       SHA1 hash determines a subdirectory of ../objects/crcs, and
+ *       the remaining bytes determine the file name, with the names
+ *       consisting of the hexadecimal representation of each byte's
+n *       value. The files then contain 32-bit CRCs stored in network
+ *       byte order.  A large number of 4-byte files is a poor use of
+ *       disk space, but may be useful for testing.  A value of 1 implies
+ *       that packdb will also be used.
+ */
+
+#include<stdint.h>
+
+#include "cache.h"
+
+#if (CRCDB == 0) || (CRCDB == 1)
+/**
+ * Opaque data type - because the typedef is for a pointer, we
+ * don't need the structure defined in files that use the pointer.
+ * We do need it defined somewhere, in this case in the file
+ * objd-crcdb.c, which is the only place the fields are used.
+ */
+typedef struct objd_crcdb *crcdb_t;
+#endif
+
+/**
+ *  Initialize the database.
+ *  This opens a database file in the objects directory named crcs,
+ *  used to store CRCS of objects (uncompressed, excluding the header)
+ *  for hash-collision detection.
+ */
+extern void crcdb_init(void);
+
+/**
+ * Check if the database has been initialized.
+ * Returns:
+ *   1 if crcdb_init has been called; false otherwise.
+ */
+extern int crcdb_initialized(void);
+
+/**
+ * Initializes alternative databases by adding them to a table with
+ * these databases closed.
+ */
+extern void crcdb_init_alts();
+
+
+/**
+ * Open a database file.
+ *
+ * The default database can be read or written. alternate database
+ * files are read-only databases.  Multiple calls without intervening
+ * calls to crcdb_close for a given argument will result in the same
+ * object being returned each successive time.  The pathname must match
+ * one stored by a call to crcdb_init_alts.
+ *
+ * Arguments:
+ *    pathname - the pathname of the file; NULL for the default db;
+ *
+ * Returns:
+ *    the database (NULL indicates the default)
+ */
+extern crcdb_t crcdb_open(char *pathname);
+
+/**
+ * Open a database file given an alterate object database pointer.
+ *
+ * The default database can be read or written. alternate database
+ * files are read-only databases.  Multiple calls without intervening
+ * calls to crcdb_close for a given argument will result in the same
+ * object being returned each successive time The argument must match
+ * an alternate object database pointer stored by a precding call to
+ * crcdb_init_alts.
+ *
+ * Arguments:
+ *    alt - an alternate object database pointer (which provides the
+ *          pathname).
+ *
+ * Returns:
+ *    the database (NULL indicates the default)
+ */
+extern crcdb_t crcdb_open_alt(struct alternate_object_database *alt);
+
+/**
+ * Lookup a CRC from a database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *  objcrc32p - a pointer to a uint32_t to store the returned value when
+ *              an entry in the database exists.
+ *
+ * Returns:
+ *   0 if no entry, 1 if there is an existing entry.
+ */
+extern int crcdb_lookup(crcdb_t dbf, const unsigned char *sha1,
+			uint32_t *objcrc32p);
+
+/**
+ * Remove a CRC from a database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *
+ * Returns:
+ *   0 on success; -1 if the entry did not exist or if an entry
+ *   could not be deleted
+ */
+extern int crcdb_remove(crcdb_t dbf, const unsigned char *sha1);
+
+/**
+ * Process a CRC for a SHA-1 key.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *   objcrc32 - the crc to store.
+ *
+ * Returns:
+ *   0 if this is a new entry; 1 if it is an existing entry, -1 if
+ *   an entry cannot be added ot the database.
+ *
+ * Errors:
+ *   Will call 'die' and exit if there is a hash collision. Will call
+ *   'error' if the value cannot be entered.
+ */
+extern int crcdb_process(crcdb_t dbf, const unsigned char *sha1,
+			 uint32_t objcrc32);
+
+/**
+ * Reorganize a CRC database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ * Returns:
+ *   0 on success; -1 on failure
+ */
+extern int crcdb_reorganize(crcdb_t dbf);
+
+
+/**
+ * Close a  database file.
+ *
+ * If the same database was opened multiple times, a reference count is
+ * decremented and the the database will not be closed until the count
+ * reaches zero.  Calls to crcdb_open or crcdb_open_alt must be balanced
+ * by calls to crcdb_close or crcdb_close_alt.
+ *
+ * Arguments:
+ *        dbf - the CRC database.
+ */
+extern void crcdb_close(crcdb_t dbf);
+
+/**
+ * Close a database file given an alternate object database pointer.
+ *
+ * If the same database was opened multiple times, a reference count is
+ * decremented and the the database will not be closed until the count
+ * reaches zero.  Calls to crcdb_open or crcdb_open_alt must be balanced
+ * by calls to crcdb_close or crcdb_close_alt.
+ *
+ * Arguments:
+ *       alt - a pointer ot an alternate object database
+ */
+extern void crcdb_close_alt(struct alternate_object_database *alt);
+
+/**
+ * Shutdown the database files.
+ * This will shut down the default database and the cached alternative
+ * databases.  All others should be closed by calling crcb_alt_close
+ * explicitly
+ */
+extern void crcdb_finish(void);
+
+#endif /*CRCDB_H */
diff --git a/gdbm-packdb.c b/gdbm-packdb.c
new file mode 100644
index 0000000..0115f87
--- /dev/null
+++ b/gdbm-packdb.c
@@ -0,0 +1,247 @@
+#include<sys/types.h>
+#include<sys/stat.h>
+#include <sys/param.h>
+#include<stdio.h>
+#include<string.h>
+#include<malloc.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <time.h>
+#include <pthread.h>
+#include <errno.h>
+#include <gdbm.h>
+
+#include "cache.h"
+#include "packdb.h"
+#include "crcdb.h"
+
+static void nsleep() {
+#if _POSIX_C_SOURCE >= 199309L
+	struct timespec ts;
+	ts.tv_sec = 0;
+	ts.tv_nsec = 100000;
+	nanosleep(&ts, NULL);
+#else
+	sleep(1);
+#endif
+}
+
+
+static int initialized = 0;
+
+static GDBM_FILE dbf = NULL;
+char *dbf_name;
+static int dbf_depth = 0;
+
+pthread_mutex_t gdbm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void packdb_close_nolock(void);
+
+void packdb_init(void) {
+	char *last;
+	pthread_mutex_lock(&gdbm_mutex);
+	if (initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return;
+	}
+	dbf_name = get_object_packdb_node();
+	last = rindex(dbf_name, '/');
+	*last = 0;
+	if (!access(dbf_name, R_OK|W_OK|X_OK)) {
+		initialized = 1;
+	}
+	*last = '/';
+	pthread_mutex_unlock(&gdbm_mutex);
+}
+
+int packdb_initialized(void) {
+  return initialized;
+}
+
+static void packdb_open_nolock(void) {
+	if (dbf_depth == 0) {
+	AGAIN_W:
+		dbf = gdbm_open(dbf_name, 0, GDBM_WRCREAT, PERM_GROUP, NULL);
+		if (dbf == NULL && gdbm_errno == GDBM_CANT_BE_WRITER) {
+			nsleep();
+			goto AGAIN_W;
+		}
+	}
+	dbf_depth++;
+}
+
+void packdb_open(void) {
+	pthread_mutex_lock(&gdbm_mutex);
+	packdb_open_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+}
+
+
+int packdb_lookup(const unsigned char *sha1, uint32_t *objcrc32p) {
+	datum key;
+	datum ovalue;
+	uint32_t oldcrc;
+	pthread_mutex_lock(&gdbm_mutex);
+
+	if (!initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+
+	packdb_open_nolock();
+	if (dbf == NULL) {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	ovalue = gdbm_fetch(dbf, key);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+
+	if (ovalue.dptr == NULL) return 0;
+	oldcrc = *(uint32_t *)(ovalue.dptr);
+	free(ovalue.dptr);
+	if (objcrc32p) *objcrc32p = (oldcrc);
+	return 1;
+}
+
+int packdb_remove(const unsigned char *sha1) {
+	datum key;
+	int result;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized)  || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+	packdb_open_nolock();
+	result = gdbm_delete(dbf, key);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+	return result;
+}
+
+
+int packdb_process(const unsigned char *sha1, uint32_t objcrc32) {
+	datum key;
+	datum nvalue;
+	datum ovalue;
+	uint32_t newcrc = (objcrc32);
+	uint32_t oldcrc;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized) || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+
+	nvalue.dptr = (char *)&newcrc;
+	nvalue.dsize = sizeof(uint32_t);
+
+	packdb_open_nolock();
+	ovalue = gdbm_fetch(dbf, key);
+	if (dbf == dbf && ovalue.dptr == NULL) {
+		int status;
+		status = gdbm_store(dbf, key, nvalue, GDBM_INSERT);
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		switch (status) {
+		case 0:
+			return 0;
+		case -1:
+		  error("could not enter crc into database - key = %s",
+		      sha1_to_hex(sha1));
+		      return -1;
+		case 1:
+			return 1;
+		}
+		return -1;	/* should not occur */
+	} else if (ovalue.dptr == NULL) {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		return 0;
+	} else {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+
+		oldcrc = *(uint32_t *)ovalue.dptr;
+		free(ovalue.dptr);
+		/*
+		 * Both oldcrc and newcrc are in network byte order.
+		 */
+		if (oldcrc != newcrc) {
+			die("SHA1  COLLISION WHEN INSERTING OBJECT %s",
+			    sha1_to_hex(sha1));
+			return -1;
+		}
+		return 1;
+	}
+}
+
+int packdb_store(unsigned char *sha1) {
+	int status;
+	uint32_t objcrc32;
+	status = crcdb_lookup(NULL, sha1, &objcrc32);
+	if (status == 1) {
+		return packdb_process(sha1, objcrc32);
+	} else if (status == 0) {
+	  return packdb_lookup(sha1, &objcrc32)? 1: -1;
+	} else {
+	  return -1;
+	}
+}
+
+int packdb_reorganize() {
+	int status;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized)  || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	packdb_open_nolock();
+	status = gdbm_reorganize(dbf);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+	return status;
+}
+
+
+static void packdb_close_nolock(void) {
+	  if (!initialized) {
+		return;
+	  }
+	  dbf_depth--;
+	  if (dbf_depth == 0 && dbf != NULL) {
+		gdbm_close(dbf);
+		dbf = NULL;
+	  }
+	  if (dbf_depth < 0) {
+		die("packdb dbf_depth %d < 0", dbf_depth);
+	  }
+	  return;
+}
+
+void packdb_close(void) {
+	  pthread_mutex_lock(&gdbm_mutex);
+	  packdb_close_nolock();
+	  pthread_mutex_unlock(&gdbm_mutex);
+}
+
+void packdb_finish(void) {
+	pthread_mutex_lock(&gdbm_mutex);
+	if (!initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return;
+	}
+	if (dbf != NULL) gdbm_close(dbf);
+	dbf = NULL;
+	dbf_depth = 0;
+	initialized = 0;
+	pthread_mutex_unlock(&gdbm_mutex);
+}
diff --git a/objd-crcdb.c b/objd-crcdb.c
new file mode 100644
index 0000000..2bf6fd9
--- /dev/null
+++ b/objd-crcdb.c
@@ -0,0 +1,324 @@
+#include<sys/types.h>
+#include "cache.h"
+#include "crcdb.h"
+
+struct objd_crcdb {
+  char *root;
+};
+
+static struct objd_crcdb db;
+
+static crcdb_t no_dbf = (crcdb_t) 4;
+
+static crcdb_t dbf = NULL;
+
+#define ALT_DBF_LIMIT  512
+
+
+struct alt_map {
+	struct objd_crcdb db;
+	struct alternate_object_database *alt;
+	struct alt_map *refer;
+};
+
+struct alt_map alt_map[ALT_DBF_LIMIT];
+static int alt_in_use = 0;
+static int initialized = 0;
+
+
+void crcdb_init(void) {
+	if (initialized) {
+		return;
+	}
+	dbf = &db;
+	db.root = get_object_crc_node();
+	initialized = 1;
+}
+
+int crcdb_initialized(void) {
+	return initialized;
+}
+
+static int setup_alt(struct alternate_object_database *alt, void *param) {
+	static char buffer[PATH_MAX];
+	int i;
+	int lim = alt->name - alt->base;
+	memcpy(buffer, alt->base, lim);
+	memcpy(buffer, alt->base, lim);
+	memcpy(buffer+lim, "crcs", 4);
+	buffer[lim+4] = 0;
+	for (i = 0; i < alt_in_use; i++) {
+		if (alt_map[i].alt == alt) {
+			/* don't put in the same entry twice */
+			return 0;
+		}
+		if (strcmp(buffer, alt_map[i].db.root) == 0) {
+			break;
+		}
+	}
+	alt_map[alt_in_use].db.root = xstrdup(buffer);
+	alt_map[alt_in_use].alt = alt;
+	if (i < alt_in_use) {
+		alt_map[alt_in_use].refer = alt_map + i;
+	} else {
+		alt_map[alt_in_use].refer = NULL;
+	}
+	alt_in_use++;
+	return 0;
+}
+
+static int alt_initialized = 0;
+
+void crcdb_init_alts(void){
+	if (alt_initialized) return;
+	foreach_alt_odb(setup_alt, NULL);
+	alt_initialized = 1;
+}
+
+
+crcdb_t crcdb_open(char *name) {
+	int i;
+	if (name == NULL) return NULL;
+	for (i = 0; i < alt_in_use; i++) {
+		if (strcmp(alt_map[i].db.root, name) == 0) {
+			if (alt_map[i].refer) {
+				i = (alt_map[i].refer - alt_map);
+			}
+			return (crcdb_t)&(alt_map[i].db);
+		}
+	}
+	return no_dbf;
+}
+
+crcdb_t crcdb_open_alt(struct alternate_object_database *alt) {
+	int i;
+	for (i = 0; i < alt_in_use; i++) {
+		if (alt_map[i].alt == alt) {
+			return (crcdb_t)&(alt_map[i].db);
+		}
+	}
+	return no_dbf;
+
+}
+/* copied from sha1_file.c */
+static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
+{
+	int i;
+	for (i = 0; i < 20; i++) {
+		static char hex[] = "0123456789abcdef";
+		unsigned int val = sha1[i];
+		char *pos = pathbuf + i*2 + (i > 0);
+		*pos++ = hex[val >> 4];
+		*pos = hex[val & 0xf];
+	}
+}
+
+/*
+ * Warning: returns a static buffer so be careful about threading.
+ */
+static char *crc32_file_name(const char *path, const unsigned char *sha1)
+{
+	static char buf[PATH_MAX];
+	const char *objcrcdir;
+	int len;
+
+	objcrcdir = path;
+	len = strlen(objcrcdir);
+
+	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
+	if (len + 43 > PATH_MAX)
+		die("insanely long object crc directory %s", objcrcdir);
+	memcpy(buf, objcrcdir, len);
+	buf[len] = '/';
+	buf[len+3] = '/';
+	buf[len+42] = '\0';
+	fill_sha1_path(buf + len + 1, sha1);
+	return buf;
+}
+
+static int crcdb_lookup_aux(char *path, uint32_t *objcrc32p)
+{
+	if (!access(path, F_OK)) {
+		if (objcrc32p) {
+			int fd = open(path, O_RDONLY);
+			if (fd < 0) {
+				return 0;
+			}
+			if(read_in_full(fd, objcrc32p, sizeof(uint32_t))
+			   != sizeof (uint32_t)) {
+				close(fd);
+				return 0;
+			}
+			close(fd);
+			*objcrc32p = (*objcrc32p);
+		}
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+
+int crcdb_lookup(crcdb_t gdbf, const unsigned char *sha1, uint32_t *objcrc32p) {
+	char *path;
+
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) gdbf = dbf;
+
+	path = crc32_file_name(gdbf->root, sha1);
+	return crcdb_lookup_aux(path, objcrc32p);
+}
+
+int crcdb_remove(crcdb_t gdbf, const unsigned char *sha1) {
+	char *path;
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+
+	if (gdbf == NULL) {
+		gdbf = dbf;
+	} else {
+		return -1;
+	}
+	path = crc32_file_name(gdbf->root, sha1);
+	return unlink(path);
+}
+
+/* copied from sha1_file.c */
+/* Size of directory component, including the ending '/' */
+static inline int directory_size(const char *filename)
+{
+	const char *s = strrchr(filename, '/');
+	if (!s)
+		return 0;
+	return s - filename + 1;
+}
+
+
+/* copied from sha1_file.c */
+static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
+{
+	int fd, dirlen = directory_size(filename);
+
+	if (dirlen + 20 > bufsiz) {
+		errno = ENAMETOOLONG;
+		return -1;
+	}
+	memcpy(buffer, filename, dirlen);
+	strcpy(buffer + dirlen, "tmp_obj_XXXXXX");
+	fd = git_mkstemp_mode(buffer, 0444);
+	if (fd < 0 && dirlen && errno == ENOENT) {
+		/* Make sure the directory exists */
+		memcpy(buffer, filename, dirlen);
+		buffer[dirlen-1] = 0;
+		if (mkdir(buffer, 0777) || adjust_shared_perm(buffer))
+			return -1;
+
+		/* Try again */
+		strcpy(buffer + dirlen - 1, "/tmp_obj_XXXXXX");
+		fd = git_mkstemp_mode(buffer, 0444);
+	}
+	return fd;
+}
+
+/* copied from sha1_file.c */
+static int write_buffer(int fd, const void *buf, size_t len)
+{
+	if (write_in_full(fd, buf, len) < 0)
+		return error("file write error (%s)", strerror(errno));
+	return 0;
+}
+
+/* copied from sha1_file.c */
+/* Finalize a file on disk, and close it. */
+static void close_sha1_file(int fd)
+{
+	if (fsync_object_files)
+		fsync_or_die(fd, "sha1 file");
+	if (close(fd) != 0)
+		die_errno("error when closing sha1 file");
+}
+
+
+int crcdb_process(crcdb_t gdbf, const unsigned char *sha1, uint32_t objcrc32) {
+	uint32_t oldcrc;
+	int has_oldcrc = 0;
+	char *path;
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) gdbf = dbf;
+	path = crc32_file_name(gdbf->root, sha1);
+	has_oldcrc = crcdb_lookup_aux(path, &oldcrc);
+	if (gdbf == dbf && !has_oldcrc) {
+		uint32_t crc;
+		static char ctmpfile[PATH_MAX];
+		int fdc = create_tmpfile(ctmpfile, sizeof(ctmpfile), path);
+		if (fdc < 0) {
+		  return -1;
+		}
+		crc = (objcrc32);
+		if (fdc >= 0 && write_buffer(fdc, &crc, sizeof (crc)) < 0) {
+			close_sha1_file(fdc);
+			return -1;
+		}
+		if (fdc >= 0) {
+			close_sha1_file(fdc);
+			return (move_temp_to_file(ctmpfile, path) == 0)?
+				0: -1;
+		}
+		return -1;
+	} else if (has_oldcrc) {
+		if (oldcrc != objcrc32) {
+			die("SHA1 COLLISION WHEN INSERTING OBJECT %s",
+			    sha1_to_hex(sha1));
+			return -1;
+		}
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+
+void crcdb_close(crcdb_t gdbf) {
+	return;
+}
+
+void crcdb_close_alt(struct alternate_object_database *alt) {
+	return;
+}
+
+
+
+int crcdb_reorganize(crcdb_t gdbf) {
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) {
+		return 0;
+	} else {
+		return -1;
+	}
+}
+
+
+
+void crcdb_finish(void) {
+	int i;
+	if (!initialized) {
+		return;
+	}
+	dbf->root = NULL;
+
+	for (i = 0; i < alt_in_use; i++) {
+		free(alt_map[i].db.root);
+		alt_map[i].db.root = NULL;
+	}
+	memset(alt_map, 0, sizeof(struct alt_map) *alt_in_use);
+	alt_in_use = 0;
+	initialized = 0;
+	alt_initialized = 0;
+}
diff --git a/packdb.h b/packdb.h
new file mode 100644
index 0000000..c4320ac
--- /dev/null
+++ b/packdb.h
@@ -0,0 +1,107 @@
+#ifndef PACKDB_H
+#define PACKDB_H
+
+#include<stdint.h>
+
+/**
+ *  Initialize the database.
+ *  This opens a database file in the objects directory named crcs,
+ *  used to store CRCS of objects (uncompressed, excluding the header)
+ *  for hash-collision detection.
+ */
+extern void packdb_init(void);
+
+/**
+ * Check if the database has been initialized.
+ * Returns:
+ *   1 if packdb_init has been called; false otherwise.
+ */
+extern int packdb_initialized(void);
+
+/**
+ * Open the persistent database to store a copy of obj CRCs in pack index files.
+ * Nested calls are allowed, but must be balanced by calls to packdb_close.
+ * For nested calls, subsequent ones merely increment a reference count.
+ *
+ * This is used to create space-efficient storage of object CRCs that
+ * are not associated with loose objects (e.g., because they are in pack
+ * files).  Intended for use when building pack files.
+ *
+ * Note:
+ *   Interacting with another process that calls this function on the
+ *   same repository may lead to deadlock unless packdb_close is
+ *   called before that interaction.
+ */
+extern void packdb_open(void);
+
+/**
+ * Store a crc in the persistent database for creating pack index files.
+ *
+ * Arguments:
+ *   sha1 - the key for the entry (a 20-byte sha1 hash)
+ *   crc - the crc to store (the crc of an object's data)
+ * Returns:
+ *   0 if we added a new entry, 1 if the entry already exists, -1 on error
+ */
+extern int packdb_process(const unsigned char *sha1, uint32_t objcrc32);
+
+/**
+ * Lookup a CRC from a database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *  objcrc32p - a pointer to a uint32_t to store the returned value when
+ *              an entry in the database exists.
+ * Returns:
+ *   0 if no entry, 1 if there is an existing entry.
+ */
+extern int packdb_lookup(const unsigned char *sha1, uint32_t *objcrc32p);
+
+/**
+ * Moves a crc into the persistent database for creating pack index files.
+ * This will delete the entry from the 'loose-object' crc database.
+ *
+ * Arguments:
+ *   sha1 - the key for the entry (a 20-byte sha1 hash)
+ * Returns:
+ *   0 if we stored an entry in the crcdb database, 1 if the entry already
+ *     existed in the packdb database, -1 on error or if there was no entry
+ *     to store.
+ */
+extern int packdb_store(unsigned char *sha1);
+
+
+/**
+ * Remove a CRC from a database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *
+ * Returns:
+ *   0 on success; -1 if the entry did not exist or if an entry
+ *   could not be deleted
+ */
+extern int packdb_remove(const unsigned char *sha1);
+
+
+/**
+ * Reorganize the database.
+ * Returns:
+ *   0 on success; -1 on failure
+ */
+extern int packdb_reorganize(void);
+
+/**
+ * Close the database file.
+ */
+extern void packdb_close(void);
+
+/**
+ * Close the database if opened and uninitialize the module.
+ * This is intended to be called when the module is no longer needed.
+ */
+extern void packdb_finish(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] Documentation update for 'git branch --list'
From: Vincent van Ravesteijn @ 2011-11-30  5:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, git
In-Reply-To: <7vobw4doey.fsf@alter.siamese.dyndns.org>

Op 22-11-2011 19:04, Junio C Hamano schreef:
> Vincent van Ravesteijn<vfr@lyx.org>  writes:
>
>> Op 21-11-2011 18:37, Junio C Hamano schreef:
>> ...
>>> It is natural to expect "git branch --merged pu vr/\*" to list branches
>>> that are contained in 'pu' whose names match the given pattern, but it
>>> seems to try creating a branch called "vr/*" and fails, for example.
>> If this is what you naturally would expect, I would expect the
>> following "git branch vr/*" to work as well.
>> What would you say if we try to interpret the argument as a pattern
>> when the argument is not a valid ref name?
> We don't, as that is inviting mistakes. "git branch vr/*" if you have a
> vr/ directory in your working tree may create vr/a branch from where the
> tip of vr/b points at by mistake.
>
> The "--merged" option is an explicit clue that the user is not interested
> in creating new branch, and the string being a pattern is additional clue.
> The "--list" option was recently added for the explicit purpose of giving
> such a clue as safety measure.

Well, that was the answer that I foresaw.

I will compose a patch implementing an at least consistent behaviour.

Vincent

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: René Scharfe @ 2011-11-30  5:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20111130032028.GA24704@sigill.intra.peff.net>

Am 30.11.2011 04:20, schrieb Jeff King:
> On Wed, Nov 30, 2011 at 12:26:20AM +0100, René Scharfe wrote:
> 
>>>>> +static int is_rfc3986_reserved(char ch)
>>>>> +{
>>>>> +	switch (ch) {
>>>>> +	case '!': case '*': case '\'': case '(': case ')': case ';':
>>>>> +	case ':': case '@': case '&': case '=': case '+': case '$':
>>>>> +	case ',': case '/': case '?': case '#': case '[': case ']':
>>>>> +		return 1;
>>>>> +	}
>> [...]
>> Sorry for my bikeshedding, but I'd paint it like this:
>>
>> 	return !!strchr("!*'();:@&=+$,/?#[]", ch);
> 
> I was always under the impression that computed jumps via "switch" would
> out-perform even an optimized strchr. Of course, I never tested. And I
> doubt performance is even relevant here, and I admit I don't care overly
> much. I find them both equally readable.
> 
> I'm going to leave it as-is unless somebody else wants to say "I
> strongly prefer version X".

Sure, the second one is significantly slower than the first one.  I just
prefer it based one its looks in case performance doesn't matter, but
that's probably just me being (sometimes too) fond of terseness. :)

René

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: Junio C Hamano @ 2011-11-30  5:40 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, git
In-Reply-To: <20111130032028.GA24704@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Nov 30, 2011 at 12:26:20AM +0100, René Scharfe wrote:
>
>> >>> +static int is_rfc3986_reserved(char ch)
>> >>> +{
>> >>> +	switch (ch) {
>> >>> +	case '!': case '*': case '\'': case '(': case ')': case ';':
>> >>> +	case ':': case '@': case '&': case '=': case '+': case '$':
>> >>> +	case ',': case '/': case '?': case '#': case '[': case ']':
>> >>> +		return 1;
>> >>> +	}
>> [...]
>> Sorry for my bikeshedding, but I'd paint it like this:
>> 
>> 	return !!strchr("!*'();:@&=+$,/?#[]", ch);
>
> I was always under the impression that computed jumps via "switch" would
> out-perform even an optimized strchr. Of course, I never tested. And I
> doubt performance is even relevant here, and I admit I don't care overly
> much. I find them both equally readable.
>
> I'm going to leave it as-is unless somebody else wants to say "I
> strongly prefer version X".

I find the switch/case one much easier to read and count, especially since
all the choices are essentially line-noise characters.

Just make sure you indent it correctly ;-)

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Bill Zaumen @ 2011-11-30  4:01 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, gitster, pclouds, peff, torvalds
In-Reply-To: <CAJo=hJtFT55Ucyij9esr3Hd9yJ6XCxatK7vjPOLMKow57HqBoQ@mail.gmail.com>

Note: for some reason my email is not showing up on the mailing list.
I'm trying a different email address - previously my 'From' field
contained a subaddress "+git" but gmail won't put that in the 'Sender'
field, so possibly the email is being filtered for that reason.

On Tue, 2011-11-29 at 09:08 -0800, Shawn Pearce wrote:

> I don't think you understand how these thin packs are processed.

I think the confusion was due to me being a bit too terse.  The
documentation clearly states that thin packs allow deltas to be
sent when the delta is based on an object that the server and client
both have in common, given the commits each already has.  If there is
one server and one client, there isn't an issue.  The case I meant is
the one in which a user does a fetch from one server, gets a forged
blob, and then fetches from another server with the original blob, and
with additional commits along the same branch. If a server bases the
delta off of the original blob, and the client applies the delta to the
forged blob, the client will most likely end up with a blob with a
different SHA-1 hash than the one expected.  Since an object in a tree
is then missing (no object with the expected SHA-1 hash), the repository
is corrupted.

The "first to arrive wins" policy isn't sufficient in one specific case:
multiple remote repositories where new commits are added asynchronously,
with the repositories out of sync possibly for days at a time (e.g.,
over a 3-day weekend).  In this case, the first to arrive at one
repository may not be the first to arrive at another, so what happens at
a particular client in the presence of hash collisions is dependent on
the sequence of remotes from which updates were fetched.  The risk
occurs in the window where the repositories are out of sync.

Regarding the kernel.org problem that you used as a separate example,
while it was fortunately possible to rebuild things (and git provided
significant advantages), earlier detection of the problem might have
reduced the time for which kernel.org was down.  Early detection of
errors in general is a good practice if it can be done at a reasonable
cost.

> Trust. Review. Verify.

While good advice in principle, you should keep in mind that there are
a lot of people out there working at various companies who are not as
capable as you are.  Some of them are overworked and make mistakes
because they've been working 16 hour days for weeks trying to meet a
deadline. Given that, extra checks to catch problems early
are probably a good idea if they don't impact performance significantly.

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: Jeff King @ 2011-11-30  3:20 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git
In-Reply-To: <4ED56A1C.9050800@lsrfire.ath.cx>

On Wed, Nov 30, 2011 at 12:26:20AM +0100, René Scharfe wrote:

> >>> +static int is_rfc3986_reserved(char ch)
> >>> +{
> >>> +	switch (ch) {
> >>> +	case '!': case '*': case '\'': case '(': case ')': case ';':
> >>> +	case ':': case '@': case '&': case '=': case '+': case '$':
> >>> +	case ',': case '/': case '?': case '#': case '[': case ']':
> >>> +		return 1;
> >>> +	}
> [...]
> Sorry for my bikeshedding, but I'd paint it like this:
> 
> 	return !!strchr("!*'();:@&=+$,/?#[]", ch);

I was always under the impression that computed jumps via "switch" would
out-perform even an optimized strchr. Of course, I never tested. And I
doubt performance is even relevant here, and I admit I don't care overly
much. I find them both equally readable.

I'm going to leave it as-is unless somebody else wants to say "I
strongly prefer version X".

-Peff

^ permalink raw reply

* Auto update submodules after merge and reset
From: Max Krasnyansky @ 2011-11-30  0:55 UTC (permalink / raw)
  To: git

Does anyone have a pointer to a thread/discussion that explains why git 
submodules are not auto
updated when the superproject is updated (merge, reset, etc) by default?

Assuming a simple and default setup where submodule update policy is set 
to "checkout".
It seems that the default and sane behavior should be to update 
(checkout) corresponding submodule
commit to track the superproject.
I can't seem to find convincing explanation why it's not the case :). 
Having to manually update
submodules after pull or reset has been error prone and confusing for 
the devs I work with.

I'm thinking about adding a config option that would enable automatic 
submodule update but wanted
to see if there is some fundamental reason why it would not be accepted.

Thanx
Max

^ permalink raw reply

* Re: [PATCH] git-gui: Set both 16x16 and 32x32 icons on X to pacify Xming.
From: Samuel Bronson @ 2011-11-30  0:02 UTC (permalink / raw)
  To: git; +Cc: Pat Thoyts, Shawn O. Pearce, Samuel Bronson
In-Reply-To: <1321640015-6663-1-git-send-email-naesten@gmail.com>

On Fri, Nov 18, 2011 at 1:13 PM, Samuel Bronson <naesten@gmail.com> wrote:
> It would be better if the 32x32 icon was equivalent to the one used on
> Windows (in git-gui.ico), but I'm not sure how that would best be done,
> so I copied this code from gitk instead.
> ---
>  git-gui.sh |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/git-gui.sh b/git-gui.sh
> index c190cbe..9d01039 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -729,7 +729,12 @@ if {[is_Windows]} {
>                gitlogo put gray26  -to  5 15 11 16
>                gitlogo redither
>
> -               wm iconphoto . -default gitlogo
> +               # TODO: should use something equivalent to the 32x32 image in
> +               # the .ico file
> +               image create photo gitlogo32    -width 32 -height 32
> +               gitlogo32 copy gitlogo -zoom 2 2
> +
> +               wm iconphoto . -default gitlogo gitlogo32
>        }
>  }

Hmm. Nothing seems to have happened with this patch yet. Any
suggestions on how to bring it to the attention of the git-gui people?

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: René Scharfe @ 2011-11-29 23:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20111129211950.GD1793@sigill.intra.peff.net>

Am 29.11.2011 22:19, schrieb Jeff King:
> On Tue, Nov 29, 2011 at 10:19:00AM -0800, Junio C Hamano wrote:
> 
>>> +static int is_rfc3986_reserved(char ch)
>>> +{
>>> +	switch (ch) {
>>> +	case '!': case '*': case '\'': case '(': case ')': case ';':
>>> +	case ':': case '@': case '&': case '=': case '+': case '$':
>>> +	case ',': case '/': case '?': case '#': case '[': case ']':
>>> +		return 1;
>>> +	}
>>> +	return 0;
>>> +}
>>
>> Part of me wonders if we still have extra bits in sane_ctype[] array but
>> that one is cumbersome to update, and the above should be easier to read
>> and maintain.
> 
> We have 2 bits left. I did consider it, but it just seemed excessively
> cumbersome for something that really doesn't need to be that fast (if it
> is indeed any faster than this case statement).

Sorry for my bikeshedding, but I'd paint it like this:

	return !!strchr("!*'();:@&=+$,/?#[]", ch);

René

^ permalink raw reply

* Re: [PATCH] gitweb: Call to_utf8() on input string in chop_and_escape_str()
From: Jürgen Kreileder @ 2011-11-29 22:14 UTC (permalink / raw)
  To: Jakub Narebski, git
In-Reply-To: <201111292250.04800.jnareb@gmail.com>

On Tue, Nov 29, 2011 at 22:50, Jakub Narebski <jnareb@gmail.com> wrote:
> Jürgen Kreileder wrote:
>
>> a) To fix the comparison with the chopped string
>> b) To give the title attribute correct encoding
>>
>> Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
>> ---
>>  gitweb/gitweb.perl |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 4f0c3bd..4237ea6 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -1695,11 +1695,11 @@ sub chop_and_escape_str {
>>       my ($str) = @_;
>
> Why not simply
>
>        my $str = to_utf8(shift);
>

Good question.  Because I thought it broke something when I tested it.
I can't reproduce that now, though.  Might have been something unrelated.
So:

-- >8 --
Subject: [PATCH] gitweb: Call to_utf8() on input string in
 chop_and_escape_str()

a) To fix the comparison with the chopped string
b) To give the title attribute correct encoding

Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
---
 gitweb/gitweb.perl |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index bfada0e..036ae46 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1699,6 +1699,7 @@ sub chop_and_escape_str {
 	my ($str) = @_;

 	my $chopped = chop_str(@_);
+	$str = to_utf8($str);
 	if ($chopped eq $str) {
 		return esc_html($chopped);
 	} else {
-- 
1.7.5.4

^ permalink raw reply related

* Re: Re: [RFC/PATCH] add update to branch support for "floating submodules"
From: Heiko Voigt @ 2011-11-29 22:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr51htbsy.fsf@alter.siamese.dyndns.org>

Hi,

On Wed, Nov 09, 2011 at 10:01:33AM -0800, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
> > This is almost ready but I would like to know what users of the
> > "floating submodule" think about this.
> 
> Thanks for working on this.
> 
> I do like to hear from potential users as well, because the general
> impression we got was that floating submodules is not a real need of
> anybody, but it is merely an inertia of people who (perhaps mistakenly)
> thought svn externals that are not anchored to a particular revision is a
> feature when it is just a limitation in reality. During the GitTogether'11
> we learned that Android that uses floating model does not really have to.

Since we did not get any reply from potential floating submodule users I
do not mind to drop this patch for now. It is archived in the mailing list
and it should be easy to revive once there is real world need for it.

Once we have the "exact" model support for checkout and friends this
might be a handy tool to update submodules before releases and such. But
currently I would like to focus on the "exact" front first.

Cheers Heiko

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Jeff King @ 2011-11-29 22:05 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Bill Zaumen, git, gitster, pclouds, torvalds
In-Reply-To: <CAJo=hJtFT55Ucyij9esr3Hd9yJ6XCxatK7vjPOLMKow57HqBoQ@mail.gmail.com>

On Tue, Nov 29, 2011 at 09:08:27AM -0800, Shawn O. Pearce wrote:

> As Peff pointed out elsewhere in this thread, the odds of a SHA-1
> collision in a project are low, on the order of 1/(2^80).

Minor nit: it's actually way less than that. You have to do on the order
of 2^80 operations to get a 50% chance of a collision. But that's not
the probability for a collision given a particular number of
operations[1].

The probability for a SHA-1 collision on 10 million hashes (where
linux-2.6 will be in a decade or two) is about 1/(2^115).

That doesn't change the validity of any of your points, of course. 1 in
2^80 and 1 in 2^115 are both in the range of "impossibly small enough
not to care about".

To continue our astronomy analogies, NASA estimates[2] the impact
probability of most tracked asteroids in the 10^6 range (around 2^20).
So getting a collision in linux-2.6 in the next decade has roughly the
same odds as the Earth being hit by 5 or 6 large asteroids.

-Peff

[1] http://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem

[2] http://neo.jpl.nasa.gov/risk/

^ permalink raw reply

* Re: Re: Git Submodule Problem - Bug?
From: Heiko Voigt @ 2011-11-29 22:03 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Manuel Koller, Fredrik Gustafsson, Thomas Rast, git
In-Reply-To: <4ED5196B.5030200@web.de>

Hi,

On Tue, Nov 29, 2011 at 06:42:03PM +0100, Jens Lehmann wrote:
> Am 29.11.2011 11:41, schrieb Fredrik Gustafsson:
> > On Tue, Nov 29, 2011 at 11:25:41AM +0100, Thomas Rast wrote:
> >> So maybe the right questions to ask would be: what's the *official*
> >> way of removing a submodule completely?  Do we support overwriting
> >> submodules in the way Manuel wanted to?  Why not? :-)
> > 
> > I suggest that we add a command for that;
> > git submodule remove <submodule>
> 
> Hmm, to me it looks like the problem is in "git submodule add". It
> doesn't check if the submodule repo it finds in .git/modules matches
> the one the user wants to create. So we end up reviving the first
> submodule although the user wants to use a completely different repo.
> 
> One solution could be to only let "git submodule update" revive
> submodules from .git/modules and make "git submodule add" error out
> if it finds the git directory of a submodule with the same name in
> .git/modules. But currently there is no way to tell "git submodule add"
> to use a different submodule name (it always uses the path as a name),
> so we might have to add an option to do that and tell the user in the
> error message how he can add a different submodule under the same
> path.

I think this is the way to go. We teached submodule add to revive a
local submodule. Further thinking about it this is probably not what the
users wants in most cases. For update its the right thing but for add we
should probably tell the user that there is already a local submodule in
the way and give him the option to take it or that he should remove it.

> Another solution could be that "git submodule add" detects that a
> submodule with the name "sub" did exist and chooses a different name
> (say "sub2") for the the new one. Then the user wouldn't have to
> cope with the problem himself.

In my opinion this is too much automatism. We could prompt for a new
name to support the user but I do not think this mechanism should be
automatic.

How about this:

The user issues 'git submodule add foo' and we discover that there is
already a local clone under the name foo. Git then asks something like
this

	Error when adding: There is already a local submodule under the
	name 'foo'.

	You can either rename the submodule to be added to a different
	name or manually remove the local clone underneath
	.git/modules/foo. If you want to remove the local clone please
	quit now.

	We strongly suggest that you give each submodule a unique name.
	Note: This name is independent from the path it is bound to.

	What do you want me to do ([r]ename it, [Q]uit) ?

When the user chooses 'rename' git will prompt for a new name.

If we are going to support the remove use case with add we additionally
need some logic to deal with it during update (which is not supported
yet AFAIK). But we probably need this support anyway since between
removal and adding a new submodule under the same can be a long time.
If users switch between such ancient history and the new history we
would have the same conflict.

We could of course just error out and tell the user that he has to give
the submodule an uniqe name. If the user does not do so leave it to him
to deal with the situation manually.

What do you think?

Cheers Heiko

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Bill Zaumen @ 2011-11-29 21:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, pclouds, spearce, torvalds
In-Reply-To: <20111129090733.GA22046@sigill.intra.peff.net>

Thanks for mentioning the 100K limit, which I didn't know about.  Will
have to try to see how to split it into two patches.

The intent is to increase the cost of a malicious attack, which requires
generating two different files with the same SHA-1 value, detect such an
attack early, and to slow such an attack down - because of Git's rule
that the first object with a SHA-1 value is the one the repository has,
if it takes longer to generate a collision than the time it takes to get
the original object into all repositories (which is done manually by
multiple individuals), the forged file will never appear in any
"official" repository.

The additional CRC (easily changed to whatever message digest one might
prefer) makes a malicious attack far more difficult: the modified file
has to have both the same SHA-1 hash (including the Git header) and 
the same CRC (not including the Git header).  An efficient algorithm to
do both simultaneously does not yet exist.  So, if we could generate a
SHA-1 collision in one second, it would presumably take billions of
seconds (many decades of continuous computation) to generate a SHA-1
hash with the same CRC, and well before a year has elapsed, the original
object should have been in all the repositories, preventing a forged
object from being inserted. Of course, eventually you might need a
real message digest.

The weakness of a CRC as an integrity check is not an issue since it is
never used alone: it's use is more analogous to the few extra bits added
to a data stream when error-detecting codes are used.  I used a CRC in
the initial implementation rather than a message digest because it is
faster, and because the initial goal was to get things to work
correctly.  In any case, the patch does not eliminate any code in which
Git already does a byte-by-byte comparison.  In cases where Git
currently assumes that two objects are the same because the SHA-1 hashes
are the same, the patch compares CRCs as an additional test.

Regarding your [Jeff's] second concern, "how does this alternative
digest have any authority?" there are two things to keep in mind. First,
it is a supplement to the existing digest.  Second, any value of the CRC
that is stored permanently (baring bugs, in my implementation, of
course) is computed locally - when a loose object is created or when a
pack file's index is created.  At no point is a CRC that was obtained
from another repository trusted. While the patch modifies Git so that it
can send CRCs when using the git protocol, these CRCs are never stored,
but are instead used only for cross checks.  If one side or the other
"lies", you get an error.  

To give a concrete example, during a fetch, the git protocol currently
sends "have" messages that contain the SHA-1 hashes of commits.  The
extension allows two CRCs to be sent along with each hash.  If these do
not match the local values (tested only if the local values exist),
something is wrong and you get an error report that the server sends to
the client, but the server never uses these CRCs for any other purpose
and the server never sends its CRCs to the client because of
backwards-compatibility issues. For objects that are transferred, you
end up with a pack file, with index-pack called to build the index (and
with the patch, the corresponding MDS file), but index-pack already does
a byte-by-byte comparison to detect collisions - the comparison is much
faster than the SHA-1 computation index-pack has to do anyway.

Where this helps is when one is using multiple repositories. If you
fetch a commit from repository B, which we'll assume has a forged blob
(different content, but the original SHA-1 hash), and then run fetch
using repository A, which has has the same commit with the original
blob, the forged blob will not be transferred from Server A and the
client will not be notified that there is an inconsistency - the
protocol is "smart" enough to know that the client already has the
commit and assumes there is nothing to do regarding it.

BTW, regarding your [Jeff's] discussion about putting an additional
header in commit messages - I tried that.  The existing versions of
Git didn't like it: barring a bug in my test code, it seems that Git
expects headers in commit messages to be in a particular order and
treats deviations from that to be an error.  I even tried appending
blank lines at the end of a commit, with spaces and tabs encoding an
additional CRC, and that didn't work either - at least it never got
through all the test programs, failing in places like the tests
involving notes. In any case, you'd have to phase in such a change
gradually, first putting in the code to read the new header if it is
there, and subsequently (after ample time so that everyone is running
a sufficiently new version) enabling the code to create the new
header.

Also, regarding "At that point, I really wonder if a flag day to switch
to a new repository format is all that bad," if that turns out to be
the decision, I'd recommend doing it sooner rather than later. The
reason is cost, which grows with the number of git users and the
number and size of Git repositories.

Bill

^ permalink raw reply

* Re: [PATCH] gitweb: Call to_utf8() on input string in chop_and_escape_str()
From: Jakub Narebski @ 2011-11-29 21:50 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git
In-Reply-To: <CAKD0Uuy8y7Dc6gfvYVe-FJ=Reiu0M3wOY4r4VVPtEYmahZcdwA@mail.gmail.com>

Jürgen Kreileder wrote:

> a) To fix the comparison with the chopped string
> b) To give the title attribute correct encoding
> 
> Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..4237ea6 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1695,11 +1695,11 @@ sub chop_and_escape_str {
>  	my ($str) = @_;

Why not simply

  	my $str = to_utf8(shift);
 
>  	my $chopped = chop_str(@_);
> -	if ($chopped eq $str) {
> +	if ($chopped eq to_utf8($str)) {
>  		return esc_html($chopped);
>  	} else {
>  		$str =~ s/[[:cntrl:]]/?/g;
> -		return $cgi->span({-title=>$str}, esc_html($chopped));
> +		return $cgi->span({-title => to_utf8($str)}, esc_html($chopped));
>  	}
>  }
> 
> -- 
> 1.7.5.4
> 

-- 
Jakub Narebski
Poland

^ permalink raw reply

* [PATCH] gitweb: esc_html() site name for title in OPML
From: Jürgen Kreileder @ 2011-11-29 21:45 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

This escapes the site name in OPML.  Also fixes encoding issues
because esc_html() uses to_utf().

Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
---
 gitweb/gitweb.perl |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4f0c3bd..df747c1 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -7699,11 +7699,12 @@ sub git_opml {
 		-charset => 'utf-8',
 		-content_disposition => 'inline; filename="opml.xml"');

+	my $title = esc_html($site_name);
 	print <<XML;
 <?xml version="1.0" encoding="utf-8"?>
 <opml version="1.0">
 <head>
-  <title>$site_name OPML Export</title>
+  <title>$title OPML Export</title>
 </head>
 <body>
 <outline text="git RSS feeds">
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH] gitweb: Call to_utf8() on input string in chop_and_escape_str()
From: Jürgen Kreileder @ 2011-11-29 21:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

a) To fix the comparison with the chopped string
b) To give the title attribute correct encoding

Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
---
 gitweb/gitweb.perl |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4f0c3bd..4237ea6 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1695,11 +1695,11 @@ sub chop_and_escape_str {
 	my ($str) = @_;

 	my $chopped = chop_str(@_);
-	if ($chopped eq $str) {
+	if ($chopped eq to_utf8($str)) {
 		return esc_html($chopped);
 	} else {
 		$str =~ s/[[:cntrl:]]/?/g;
-		return $cgi->span({-title=>$str}, esc_html($chopped));
+		return $cgi->span({-title => to_utf8($str)}, esc_html($chopped));
 	}
 }

-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH 12/13] credentials: add "store" helper
From: Jeff King @ 2011-11-29 21:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vsjl6ssf9.fsf@alter.siamese.dyndns.org>

On Tue, Nov 29, 2011 at 10:19:06AM -0800, Junio C Hamano wrote:

> > +	while (strbuf_getline(&line, fh, '\n') != EOF) {
> > +		credential_from_url(&entry, line.buf);
> > +		if (entry.username && entry.password &&
> > +		    credential_match(c, &entry)) {
> 
> This looks curious; isn't checking .username and .password part of the
> responsibility of credential_match()? And even if entry lacks password
> (which won't happen in the context of this program, given the
> implementation of store_credential() below) shouldn't it still be
> considered a match?

credential_match will check .username, if the pattern mentions it. It
will never check .password. My intent here was to enforce well-formed
entries in the credential file. So you could add:

  http://example.com/

to the credential file, but it's just meaningless noise. It
doesn't actually tell us a username or password.

The helper won't add such an entry itself, but given the simplicity of
the format, I wanted to leave the door open for curious hackers to
populate it manually if they choose.

I think you're right that:

  http://user@example.com/

is potentially meaningful, and this would skip that. OTOH, you would be
much better served to just do:

  git config credential.http://example.com.username user

So I consider it a slight abuse of this helper in the first place.

> > +static void rewrite_credential_file(const char *fn, struct credential *c,
> > +				    struct strbuf *extra)
> > +{
> > +	umask(077);
> 
> Curious placement of umask(). I would expect a function that has its own
> call to umask() restore it before it returns, and a stand-alone program
> whose sole purpose is to work with a private file, setting a tight umask
> upfront at the beginning of main() may be easier to understand.

I think that is largely a holdover from the original implementation,
which set the umask and did other black magic before calling
git_config_set. I agree it would make more sense at the beginning of the
program. Will change.

> > +	if (hold_lock_file_for_update(&credential_lock, fn, 0) < 0)
> > +		die_errno("unable to get credential storage lock");
> > +	parse_credential_file(fn, c, NULL, print_line);
> > +	if (extra)
> > +		print_line(extra);
> 
> An entry for a newly updated password comes at the end of the file,
> instead of replacing an entry already in the file in-place? Given that
> parse_credential_file() when processing a look-up request (which is the
> majority of the case) stops upon finding a match, it might make more sense
> to have the new one (which may be expected to be used often) at the
> beginning instead, no?

Yeah. It's a linear search. Your worst-case is always going to be O(n),
but I just assumed n would remain relatively small and we wouldn't care
(if it isn't, the right solution is probably a smarter data structure).

But your optimization is trivial to implement, so it's probably worth
doing.

> > +	if (commit_lock_file(&credential_lock) < 0)
> > +		die_errno("unable to commit credential store");
> > +}
> > +
> > +static void store_credential(const char *fn, struct credential *c)
> > +{
> > +	struct strbuf buf = STRBUF_INIT;
> > +
> > +	if (!c->protocol || !(c->host || c->path) ||
> > +	    !c->username || !c->password)
> > +		return;
> [...]
> > +static void remove_credential(const char *fn, struct credential *c)
> > +{
> > +	if (!c->protocol || !(c->host || c->path))
> > +		return;
> 
> The choice of the fields looks rather arbitrary. I cannot say "remove all
> the credentials whose username is 'gitster' at 'github.com' no matter what
> protocol is used", but I can say "remove all credentials under any name
> for any host as long as the transfer goes over 'https' and accesses a
> repository at 'if/xyzzy' path", it seems.

It is kind of arbitrary. The storage format is URLs, which is why
store_credential is a little pedantic. We can't store something that
doesn't have a protocol part, as that is a required part of the URL
(actually, in URL-speak this is the "scheme"; I wonder if we should use
the same term here).

I was thinking we need a protocol for the same reason in
remove_credential, but I think you are right. We never actually convert
it to a URL, so in theory you could do:

  git credential-store erase <<\EOF
  username=gitster
  host=github.com
  EOF

Again, not an operation that git will ever perform, but I guess
something that people might want to do (I had always assumed the
"$EDITOR ~/.git-credentials" was going to be the preferred way of doing
such operations :) ).

I don't think there's any harm in loosening that condition.

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox