Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 4/4] Suppress "statement not reached" warnings under Sun Studio
From: Junio C Hamano @ 2011-12-21 18:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Elijah Newren, Jason Evans, David Barr
In-Reply-To: <1324430302-22441-5-git-send-email-avarab@gmail.com>

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/read-cache.c b/read-cache.c
> index a51bba1..0a4e895 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -758,7 +758,13 @@ int verify_path(const char *path)
>  		return 0;
>  
>  	goto inside;
> +#ifdef __sun
> +#	pragma error_messages (off, E_STATEMENT_NOT_REACHED)
> +#endif
>  	for (;;) {
> +#ifdef __sun
> +#	pragma error_messages (on, E_STATEMENT_NOT_REACHED)
> +#endif
>  		if (!c)
>  			return 1;

Patches 1-3 makes sense, but this one is too ugly to live.

Wouldn't something like this be equivalent and have the same effect
without sacrificing the readablity?

diff --git a/read-cache.c b/read-cache.c
index a51bba1..73af797 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -757,12 +757,12 @@ int verify_path(const char *path)
 	if (has_dos_drive_prefix(path))
 		return 0;
 
-	goto inside;
+	/* we are at the beginning of a path component */
+	c = '/';
 	for (;;) {
 		if (!c)
 			return 1;
 		if (is_dir_sep(c)) {
-inside:
 			c = *path++;
 			if ((c == '.' && !verify_dotfile(path)) ||
 			    is_dir_sep(c) || c == '\0')

^ permalink raw reply related

* [PATCH] clone: don't say <branch> when we mean <remote>
From: Carlos Martín Nieto @ 2011-12-21 18:14 UTC (permalink / raw)
  To: git

Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
---

The manpage says <name> which might actually be a better word to use
everywhere, but having <branch> instead of <remote> can only lead to
confusion.

Looking through blame, the second line survived a typo fix and was
introduced in 2008 when clone was made a builtin. The script used to
say <name>. So it's clearly nothing urgent, but it bugged me, so I'm
sending a patch.

 builtin/clone.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index efe8b6c..e85ee69 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -84,8 +84,8 @@ static struct option builtin_clone_options[] = {
 		   "directory from which templates will be used"),
 	OPT_CALLBACK(0 , "reference", &option_reference, "repo",
 		     "reference repository", &opt_parse_reference),
-	OPT_STRING('o', "origin", &option_origin, "branch",
-		   "use <branch> instead of 'origin' to track upstream"),
+	OPT_STRING('o', "origin", &option_origin, "remote",
+		   "use <remote> instead of 'origin' to track upstream"),
 	OPT_STRING('b', "branch", &option_branch, "branch",
 		   "checkout <branch> instead of the remote's HEAD"),
 	OPT_STRING('u', "upload-pack", &option_upload_pack, "path",
-- 
1.7.8.352.g876a6f

^ permalink raw reply related

* Re: git 1.7.7.5
From: Thomas Jarosch @ 2011-12-21 17:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, git, schacon
In-Reply-To: <7viplckt2m.fsf@alter.siamese.dyndns.org>

On Tuesday, 20. December 2011 01:03:29 Junio C Hamano wrote:
> but anyway I've uploaded both 1.7.7.5 and 1.7.6.5 tarballs.

Thanks!

May be Scott Chacon can provide you commit access to git-scm.com ;)
If you are interested in this of course.

For example wikipedia lists git-scm.com as website for git.
(http://en.wikipedia.org/wiki/Git_%28software%29)

Cheers,
Thomas

^ permalink raw reply

* Re: Re* How to generate pull-request with info of signed tag
From: Aneesh Kumar K.V @ 2011-12-21 16:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7vzkemh0de.fsf@alter.siamese.dyndns.org>

On Tue, 20 Dec 2011 23:03:57 -0800, Junio C Hamano <gitster@pobox.com> wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > Also can we make .git/config remote stanza to have something like below
> >
> >
> >      fetch = +refs/tags/*:refs/tags/abc/*
> >
> > so that one can do
> >
> >    git fetch t-remote tag-name
> >
> > and that get stored to abc/tag-name 
> 
> You can do whatever you want to your own config file without asking anybody.
> 
> Having said that, the point of the recent change to allow you to pull this
> way (notice the lack of "tag")
> 
>     $ git pull $url $signed_tag_name
> 
> is so that you do not have to contaminate your own ref namespace with tags
> that are used to leave audit trails in the history graph.
> 

With an entry like below

[remote "github"]
        fetch = +refs/tags/*:refs/tags/origin/*
        url = git://github.com/kvaneesh/QEMU.git

when i do git fetch github for-anthony i get the below error

[master@QEMU]$ git fetch github for-anthony
From git://github.com/kvaneesh/QEMU
 * tag               for-anthony -> FETCH_HEAD
[master@QEMU]$ less .git/config 

Also trying to do

[master@QEMU]$ git fetch git://github.com/kvaneesh/QEMU.git  for-anthony:aneesh/for-anthony
error: Trying to write non-commit object 12916047784615b7d8b879d9d39be6c1559e1b1b to branch refs/heads/aneesh/for-anthony
From git://github.com/kvaneesh/QEMU
 ! [new branch]      for-anthony -> aneesh/for-anthony  (unable to update local ref)
 * [new tag]         for-anthony -> for-anthony


I understand that replacing the above with below works. But we should
not be required to specify refs/tags there right ?

[master@QEMU]$ git fetch git://github.com/kvaneesh/QEMU.git  refs/tags/for-anthony:refs/tags/aneesh/for-anthony

-aneesh

^ permalink raw reply

* Re: [PATCH] git-commit: add option --date-now
From: Matthieu Moy @ 2011-12-21 16:24 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: Michael Schubert, git
In-Reply-To: <20111221153837.GC2160@beez.lab.cmartin.tk>

Carlos Martín Nieto <cmn@elego.de> writes:

> I was surpised when I tried 'git commit --amend --date=now' that git
> didn't understand 'now' as a date, which seems like a more obvious
> place to fix it.

+1

I really don't think Git wants yet-another-option for each use-case we
find, and accepting "now" as a date (either by hardcoding "now" as an
accepted value, or by running approxidate on the argument of --date).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply

* Re: [PATCH] Use Python's "print" as a function, not as a keyword
From: Sverre Rabbelier @ 2011-12-21 16:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Sebastian Morr, git
In-Reply-To: <CACBZZX7PVyCFfHTJN_QZfyt5wAcr4UAiJSmo54PSi=8pgv3sYA@mail.gmail.com>

On Tue, Dec 20, 2011 at 20:48, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> I'm running Debian unstable and it has Python 2.7. Most people are
> still using Python 2.x as their default system Python since 3.x breaks
> backwards compatibility for common constructs like print.
>
> Does this only break Python 2.6, or all 2.x versions of Python?
>
> What's our currently supported Python version for the Python code in
> Git? It's 5.8.0 for Perl, do we have any particular aim for a
> supported Python version?

Python 2.4.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply

* [PATCH] bash completion: use read -r everywhere
From: Thomas Rast @ 2011-12-21 15:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kevin Ballard

POSIX specifies

  The read utility shall read a single line from standard input.
  By default, unless the -r option is specified, backslash ('\')
  shall act as an escape character...

Our omission of -r breaks the loop reading refnames from
git-for-each-ref in __git_refs() if there are refnames such as
"foo'bar", in which case for-each-ref helpfully quotes them as in

  $ git update-ref "refs/remotes/test/foo'bar" HEAD
  $ git for-each-ref --shell --format="ref=%(refname:short)" "refs/remotes"
  ref='test/foo'\''bar'

Interpolating the \' here will read "ref='test/foo'''bar'" instead,
and eval then chokes on the unbalanced quotes.

However, since none of the read loops _want_ to have backslashes
interpolated, it's much safer to use read -r everywhere.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 contrib/completion/git-completion.bash |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 78257ae..e7a39ef 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -111,7 +111,7 @@ __git_ps1_show_upstream ()
 
 	# get some config options from git-config
 	local output="$(git config -z --get-regexp '^(svn-remote\..*\.url|bash\.showupstream)$' 2>/dev/null | tr '\0\n' '\n ')"
-	while read key value; do
+	while read -r key value; do
 		case "$key" in
 		bash.showupstream)
 			GIT_PS1_SHOWUPSTREAM="$value"
@@ -589,7 +589,7 @@ __git_refs ()
 			local ref entry
 			git --git-dir="$dir" for-each-ref --shell --format="ref=%(refname:short)" \
 				"refs/remotes/" | \
-			while read entry; do
+			while read -r entry; do
 				eval "$entry"
 				ref="${ref#*/}"
 				if [[ "$ref" == "$cur"* ]]; then
@@ -602,7 +602,7 @@ __git_refs ()
 	case "$cur" in
 	refs|refs/*)
 		git ls-remote "$dir" "$cur*" 2>/dev/null | \
-		while read hash i; do
+		while read -r hash i; do
 			case "$i" in
 			*^{}) ;;
 			*) echo "$i" ;;
@@ -611,7 +611,7 @@ __git_refs ()
 		;;
 	*)
 		git ls-remote "$dir" HEAD ORIG_HEAD 'refs/tags/*' 'refs/heads/*' 'refs/remotes/*' 2>/dev/null | \
-		while read hash i; do
+		while read -r hash i; do
 			case "$i" in
 			*^{}) ;;
 			refs/*) echo "${i#refs/*/}" ;;
@@ -636,7 +636,7 @@ __git_refs_remotes ()
 {
 	local i hash
 	git ls-remote "$1" 'refs/heads/*' 2>/dev/null | \
-	while read hash i; do
+	while read -r hash i; do
 		echo "$i:refs/remotes/$1/${i#refs/heads/}"
 	done
 }
@@ -1863,7 +1863,7 @@ __git_config_get_set_variables ()
 	done
 
 	git --git-dir="$(__gitdir)" config $config_file --list 2>/dev/null |
-	while read line
+	while read -r line
 	do
 		case "$line" in
 		*.*=*)
-- 
1.7.8.484.gdad4270

^ permalink raw reply related

* Re: [PATCH] git-commit: add option --date-now
From: Carlos Martín Nieto @ 2011-12-21 15:38 UTC (permalink / raw)
  To: Michael Schubert; +Cc: git
In-Reply-To: <4EF1F3AB.5080607@elegosoft.com>

[-- Attachment #1: Type: text/plain, Size: 4129 bytes --]

On Wed, Dec 21, 2011 at 03:56:43PM +0100, Michael Schubert wrote:
> Currently, Git doesn't provide an easy way to use the current date when
> amending a commit or reusing an existing commmit with -C/-c. Therefore,
> add --date-now.

The option --reset-author also resets the date. So 'git commit
--ammend --reset-author' does what 'git commit --amend --date-now'
would do in most cases. I was surpised when I tried 'git commit
--amend --date=now' that git didn't understand 'now' as a date, which
seems like a more obvious place to fix it.

> 
> Signed-off-by: Michael Schubert <mschub@elegosoft.com>
> ---
>  Documentation/git-commit.txt |    7 +++++--
>  builtin/commit.c             |    9 ++++++++-
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
> index 5cc84a1..b7c6f0d 100644
> --- a/Documentation/git-commit.txt
> +++ b/Documentation/git-commit.txt
> @@ -12,8 +12,8 @@ SYNOPSIS
>  	   [--dry-run] [(-c | -C | --fixup | --squash) <commit>]
>  	   [-F <file> | -m <msg>] [--reset-author] [--allow-empty]
>  	   [--allow-empty-message] [--no-verify] [-e] [--author=<author>]
> -	   [--date=<date>] [--cleanup=<mode>] [--status | --no-status]
> -	   [-i | -o] [--] [<file>...]
> +	   [--date=<date> | --date-now] [--cleanup=<mode>]
> +	   [--status | --no-status] [-i | -o] [--] [<file>...]
>  
>  DESCRIPTION
>  -----------
> @@ -126,6 +126,9 @@ OPTIONS
>  --date=<date>::
>  	Override the author date used in the commit.
>  
> +--date-now
> +	Override the author date used in the commit with the current local time.
> +
>  -m <msg>::
>  --message=<msg>::
>  	Use the given <msg> as the commit message.
> diff --git a/builtin/commit.c b/builtin/commit.c
> index be1ab2e..28fdf1a 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -82,6 +82,7 @@ static const char *author_message, *author_message_buffer;
>  static char *edit_message, *use_message;
>  static char *fixup_message, *squash_message;
>  static int all, also, interactive, patch_interactive, only, amend, signoff;
> +static int date_now;
>  static int edit_flag = -1; /* unspecified */
>  static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
>  static int no_post_rewrite, allow_empty_message;
> @@ -134,6 +135,7 @@ static struct option builtin_commit_options[] = {
>  	OPT_FILENAME('F', "file", &logfile, "read message from file"),
>  	OPT_STRING(0, "author", &force_author, "author", "override author for commit"),
>  	OPT_STRING(0, "date", &force_date, "date", "override date for commit"),
> +	OPT_BOOLEAN(0, "date-now", &date_now, "override date for commit with current local time"),
>  	OPT_CALLBACK('m', "message", &message, "message", "commit message", opt_parse_m),
>  	OPT_STRING('c', "reedit-message", &edit_message, "commit", "reuse and edit message from specified commit"),
>  	OPT_STRING('C', "reuse-message", &use_message, "commit", "reuse message from specified commit"),
> @@ -557,7 +559,9 @@ static void determine_author_info(struct strbuf *author_ident)
>  					(lb - strlen(" ") -
>  					 (a + strlen("\nauthor "))));
>  		email = xmemdupz(lb + strlen("<"), rb - (lb + strlen("<")));
> -		date = xmemdupz(rb + strlen("> "), eol - (rb + strlen("> ")));
> +		if (!date_now)
> +			date = xmemdupz(rb + strlen("> "),
> +					eol - (rb + strlen("> ")));
>  	}
>  
>  	if (force_author) {
> @@ -1018,6 +1022,9 @@ static int parse_and_validate_options(int argc, const char *argv[],
>  	if (force_author && renew_authorship)
>  		die(_("Using both --reset-author and --author does not make sense"));
>  
> +	if (force_date && date_now)
> +		die(_("Using both --date and --date-now does not make sense"));
> +
>  	if (logfile || message.len || use_message || fixup_message)
>  		use_editor = 0;
>  	if (0 <= edit_flag)
> -- 
> 1.7.8.521.g64725
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH] git-commit: add option --date-now
From: Michael Schubert @ 2011-12-21 14:56 UTC (permalink / raw)
  To: git

Currently, Git doesn't provide an easy way to use the current date when
amending a commit or reusing an existing commmit with -C/-c. Therefore,
add --date-now.

Signed-off-by: Michael Schubert <mschub@elegosoft.com>
---
 Documentation/git-commit.txt |    7 +++++--
 builtin/commit.c             |    9 ++++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index 5cc84a1..b7c6f0d 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -12,8 +12,8 @@ SYNOPSIS
 	   [--dry-run] [(-c | -C | --fixup | --squash) <commit>]
 	   [-F <file> | -m <msg>] [--reset-author] [--allow-empty]
 	   [--allow-empty-message] [--no-verify] [-e] [--author=<author>]
-	   [--date=<date>] [--cleanup=<mode>] [--status | --no-status]
-	   [-i | -o] [--] [<file>...]
+	   [--date=<date> | --date-now] [--cleanup=<mode>]
+	   [--status | --no-status] [-i | -o] [--] [<file>...]
 
 DESCRIPTION
 -----------
@@ -126,6 +126,9 @@ OPTIONS
 --date=<date>::
 	Override the author date used in the commit.
 
+--date-now
+	Override the author date used in the commit with the current local time.
+
 -m <msg>::
 --message=<msg>::
 	Use the given <msg> as the commit message.
diff --git a/builtin/commit.c b/builtin/commit.c
index be1ab2e..28fdf1a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -82,6 +82,7 @@ static const char *author_message, *author_message_buffer;
 static char *edit_message, *use_message;
 static char *fixup_message, *squash_message;
 static int all, also, interactive, patch_interactive, only, amend, signoff;
+static int date_now;
 static int edit_flag = -1; /* unspecified */
 static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
 static int no_post_rewrite, allow_empty_message;
@@ -134,6 +135,7 @@ static struct option builtin_commit_options[] = {
 	OPT_FILENAME('F', "file", &logfile, "read message from file"),
 	OPT_STRING(0, "author", &force_author, "author", "override author for commit"),
 	OPT_STRING(0, "date", &force_date, "date", "override date for commit"),
+	OPT_BOOLEAN(0, "date-now", &date_now, "override date for commit with current local time"),
 	OPT_CALLBACK('m', "message", &message, "message", "commit message", opt_parse_m),
 	OPT_STRING('c', "reedit-message", &edit_message, "commit", "reuse and edit message from specified commit"),
 	OPT_STRING('C', "reuse-message", &use_message, "commit", "reuse message from specified commit"),
@@ -557,7 +559,9 @@ static void determine_author_info(struct strbuf *author_ident)
 					(lb - strlen(" ") -
 					 (a + strlen("\nauthor "))));
 		email = xmemdupz(lb + strlen("<"), rb - (lb + strlen("<")));
-		date = xmemdupz(rb + strlen("> "), eol - (rb + strlen("> ")));
+		if (!date_now)
+			date = xmemdupz(rb + strlen("> "),
+					eol - (rb + strlen("> ")));
 	}
 
 	if (force_author) {
@@ -1018,6 +1022,9 @@ static int parse_and_validate_options(int argc, const char *argv[],
 	if (force_author && renew_authorship)
 		die(_("Using both --reset-author and --author does not make sense"));
 
+	if (force_date && date_now)
+		die(_("Using both --date and --date-now does not make sense"));
+
 	if (logfile || message.len || use_message || fixup_message)
 		use_editor = 0;
 	if (0 <= edit_flag)
-- 
1.7.8.521.g64725

^ permalink raw reply related

* [PATCH] builtin/commit: add missing '/' in help message
From: Michael Schubert @ 2011-12-21 14:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Signed-off-by: Michael Schubert <mschub@elegosoft.com>
---
 builtin/commit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 626036a..be1ab2e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -139,7 +139,7 @@ static struct option builtin_commit_options[] = {
 	OPT_STRING('C', "reuse-message", &use_message, "commit", "reuse message from specified commit"),
 	OPT_STRING(0, "fixup", &fixup_message, "commit", "use autosquash formatted message to fixup specified commit"),
 	OPT_STRING(0, "squash", &squash_message, "commit", "use autosquash formatted message to squash specified commit"),
-	OPT_BOOLEAN(0, "reset-author", &renew_authorship, "the commit is authored by me now (used with -C-c/--amend)"),
+	OPT_BOOLEAN(0, "reset-author", &renew_authorship, "the commit is authored by me now (used with -C/-c/--amend)"),
 	OPT_BOOLEAN('s', "signoff", &signoff, "add Signed-off-by:"),
 	OPT_FILENAME('t', "template", &template_file, "use specified template file"),
 	OPT_BOOL('e', "edit", &edit_flag, "force edit of commit"),
-- 
1.7.8.521.g64725

^ permalink raw reply related

* Re: [PATCH] Specify a precision for the length of a subject string
From: Nathan Panike @ 2011-12-21 14:53 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Nathan W. Panike, git
In-Reply-To: <m2liq6go7y.fsf@igel.home>

On Wed, Dec 21, 2011 at 12:26:25PM +0100, Andreas Schwab wrote:
> "Nathan W. Panike" <nathan.panike@gmail.com> writes:
> 
> > $ git log --pretty='%h %30s' d165204 -1
> 
> In C's formatted output this syntax denotes a minimum field width, not a
> precision, so it will probably be surprising to many people.

C semantics are already broken because (from git-log(1))

"If you add a - (minus sign) after % of a placeholder, line-feeds that
immediately precede the expansion are deleted if and only if the placeholder
expands to an empty string."

rather than indicating justification of the field.
> 
> Andreas.
> 
> -- 
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

^ permalink raw reply

* Re: [PATCH] Specify a precision for the length of a subject string
From: Nathan Panike @ 2011-12-21 14:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Nathan W. Panike, git
In-Reply-To: <20111221043843.GA20714@sigill.intra.peff.net>

On Tue, Dec 20, 2011 at 11:38:43PM -0500, Jeff King wrote:
> On Tue, Dec 20, 2011 at 04:07:54PM -0600, Nathan W. Panike wrote:
> 
> > We can specify the precision of a subject string, so that length the subjects
> > viewed by the user do not grow beyond a bound set by the user, in a pretty
> > formatted string
> > 
> > This makes it possible to do, e.g., 
> > 
> > $ git log --pretty='%h %s' d165204 -1
> > d165204 git-p4: fix skipSubmitEdit regression
> > 
> > With this patch, the user can do
> > 
> > $ git log --pretty='%h %30s' d165204 -1
> > d165204 git-p4: fix skipSubmitEdit reg
> 
> Hmm. I think the idea of limiting is OK (though personally, I would just
> pipe through a filter that truncates long lines). But I'm a bit negative
> on adding a tweak like this that only affects the subject. Is there a
> reason I couldn't do %30gs, or %30f, or even some other placeholder?

The ones that make sense to limit are all those that depend on the subject, as the
above; it does not make sense to limit other fields that don't depend on the
subject, as they are fixed width, or have small variance. And it does not make
sense to me to limit the length of the body.

> 
> Also, we already have %w to handle wrapping. Could this be handled in a
> similar way (perhaps it could even be considered a special form of
> wrapping)?

I'll look at the wrapping code and see. Thanks for the idea.
> 
> -Peff

^ permalink raw reply

* Re: Escape character for .gitconfig
From: demerphq @ 2011-12-21 13:59 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Erik Blake, Git Mailing List
In-Reply-To: <alpine.DEB.2.00.1112211352580.17957@ds9.cixit.se>

On 21 December 2011 13:54, Peter Krefting <peter@softwolves.pp.se> wrote:
> Erik Blake:
>
>
>> As you can see, I'm running git on a Win7 64 machine. Is there any way to
>> escape the brackets? Or do I need to reinstall notepad++ on a different
>> path?
>
>
> Just use the 8.3 path instead, using either "C:/Progra~1" or "C:/Progra~2"
> (depending on how the system got installed). You can mix 8.3 and long paths
> in the same command (so keeping the "Notepad++" component is fine).

Or use a junction to make an alias of the name without strange chars....

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply

* Re: Rewriting history and public-private-ish branches
From: Peter Krefting @ 2011-12-21 12:58 UTC (permalink / raw)
  To: Jay Levitt; +Cc: git@vger.kernel.org
In-Reply-To: <4EF08086.6080606@gmail.com>

Jay Levitt:

> As long as I'm the only one who's seen this "published" history, am I doing 
> anything bad?

I do things like that too, and as long as you know what you are doing, it 
usually works fine.

> Do I leave any residue behind in the repo?

As long as you run "git gc" on the repos regularly, it shouldn't really 
matter much. Your abandoned changes will be available through the reflog 
until that expires, and when that has happened "git gc" should remove them 
from the repositories altogether.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* Re: Escape character for .gitconfig
From: Peter Krefting @ 2011-12-21 12:54 UTC (permalink / raw)
  To: Erik Blake; +Cc: Git Mailing List
In-Reply-To: <4EEC6A9D.1060005@icefield.yk.ca>

Erik Blake:

> As you can see, I'm running git on a Win7 64 machine. Is there any way to 
> escape the brackets? Or do I need to reinstall notepad++ on a different path?

Just use the 8.3 path instead, using either "C:/Progra~1" or "C:/Progra~2" 
(depending on how the system got installed). You can mix 8.3 and long paths 
in the same command (so keeping the "Notepad++" component is fine).

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* [PATCH] builtin/log: remove redundant initialization
From: Michael Schubert @ 2011-12-21 12:05 UTC (permalink / raw)
  To: git

"abbrev" and "commit_format" in struct rev_info get initialized in
init_revisions - no need to reinit in cmd_log_init_defaults.

Signed-off-by: Michael Schubert <mschub@elegosoft.com>
---
 builtin/log.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 89d0cc0..7d1f6f8 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -73,8 +73,6 @@ static int decorate_callback(const struct option *opt, const char *arg, int unse
 
 static void cmd_log_init_defaults(struct rev_info *rev)
 {
-	rev->abbrev = DEFAULT_ABBREV;
-	rev->commit_format = CMIT_FMT_DEFAULT;
 	if (fmt_pretty)
 		get_commit_format(fmt_pretty, rev);
 	rev->verbose_header = 1;
-- 
1.7.8.400.g03f4

^ permalink raw reply related

* Re: [PATCH] Specify a precision for the length of a subject string
From: Andreas Schwab @ 2011-12-21 11:26 UTC (permalink / raw)
  To: Nathan W. Panike; +Cc: git
In-Reply-To: <20111220220754.GC21353@llunet.cs.wisc.edu>

"Nathan W. Panike" <nathan.panike@gmail.com> writes:

> $ git log --pretty='%h %30s' d165204 -1

In C's formatted output this syntax denotes a minimum field width, not a
precision, so it will probably be surprising to many people.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] Use Python's "print" as a function, not as a keyword
From: Frans Klaver @ 2011-12-21  8:43 UTC (permalink / raw)
  To: Sebastian Morr; +Cc: git, srabbelier, Ævar Arnfjörð Bjarmason
In-Reply-To: <20111221021930.GA31364@thinkpad>

On Wed, Dec 21, 2011 at 3:19 AM, Sebastian Morr <sebastian@morr.cc> wrote:

> This has changed from Version 2.6 to Version 3.0. Replace all occurrences of
>
>    print ...
>
> with
>
>    print(...)
>
> and all occurrences of
>
>    print >> output, ...
>
> with
>
>    output.write(... + "\n")

While it's good to look forward, you shouldn't forget about testing on
python 2.6. Lots of people still stick to that and maybe even to
earlier versions.


>  if len(argv) < 2:
> -       print 'Usage:', argv[0], '<zipfile>...'
> +       print('Usage:', argv[0], '<zipfile>...')
>        exit(1)
>

Here I would use the % notation:
print('Usage: %s <zipfile>...' % argv[0])

Python 2.x would print a tuple:

>>> argv = ('import-zips.py',)
>>> print("Usage:", argv[0], '<zipfile>...')
('Usage:', 'import-zips.py', '<zipfile>...')

You could probably get away with

print('Usage: ' + argv[0] + ' <zipfile>...')

But that could probably become a readability issue. I would even
wonder if it's considered pythonic.

It happens a few times more:

>  if verbose:
> -    print 'tip is', tip
> +    print('tip is', tip)

> @@ -176,27 +176,27 @@ for cset in range(int(tip) + 1):
>     os.write(fdcomment, csetcomment)
>     os.close(fdcomment)
>
> -    print '-----------------------------------------'
> -    print 'cset:', cset
> -    print 'branch:', hgbranch[str(cset)]
> -    print 'user:', user
> -    print 'date:', date
> -    print 'comment:', csetcomment
> +    print('-----------------------------------------')
> +    print('cset:', cset)
> +    print('branch:', hgbranch[str(cset)])
> +    print('user:', user)
> +    print('date:', date)
> +    print('comment:', csetcomment)
>     if parent:
> -       print 'parent:', parent
> +       print('parent:', parent)
>     if mparent:
> -        print 'mparent:', mparent
> +        print('mparent:', mparent)
>     if tag:
> -        print 'tag:', tag
> -    print '-----------------------------------------'
> +        print('tag:', tag)
> +    print('-----------------------------------------')


>
>     # checkout the parent if necessary
>     if cset != 0:
>         if hgbranch[str(cset)] == "branch-" + str(cset):
> -            print 'creating new branch', hgbranch[str(cset)]
> +            print('creating new branch', hgbranch[str(cset)])
>             os.system('git checkout -b %s %s' % (hgbranch[str(cset)], hgvers[parent]))
>         else:
> -            print 'checking out branch', hgbranch[str(cset)]
> +            print('checking out branch', hgbranch[str(cset)])
>             os.system('git checkout %s' % hgbranch[str(cset)])
>
>     # merge
> @@ -205,7 +205,7 @@ for cset in range(int(tip) + 1):
>             otherbranch = hgbranch[mparent]
>         else:
>             otherbranch = hgbranch[parent]
> -        print 'merging', otherbranch, 'into', hgbranch[str(cset)]
> +        print('merging', otherbranch, 'into', hgbranch[str(cset)])
>         os.system(getgitenv(user, date) + 'git merge --no-commit -s ours "" %s %s' % (hgbranch[str(cset)], otherbranch))
>
>     # remove everything except .git and .hg directories
> @@ -229,12 +229,12 @@ for cset in range(int(tip) + 1):
>
>     # delete branch if not used anymore...
>     if mparent and len(hgchildren[str(cset)]):
> -        print "Deleting unused branch:", otherbranch
> +        print("Deleting unused branch:", otherbranch)
>         os.system('git branch -d %s' % otherbranch)
>
>     # retrieve and record the version
>     vvv = os.popen('git show --quiet --pretty=format:%H').read()
> -    print 'record', cset, '->', vvv
> +    print('record', cset, '->', vvv)
>     hgvers[str(cset)] = vvv
>
>  if hgnewcsets >= opt_nrepack and opt_nrepack != -1:
> @@ -243,7 +243,7 @@ if hgnewcsets >= opt_nrepack and opt_nrepack != -1:
>  # write the state for incrementals
>  if state:
>     if verbose:
> -        print 'Writing state'
> +        print('Writing state')
>     f = open(state, 'w')
>     pickle.dump(hgvers, f)
>
> diff --git a/contrib/p4import/git-p4import.py b/contrib/p4import/git-p4import.py
> index b6e534b..144fafc 100644
> --- a/contrib/p4import/git-p4import.py
> +++ b/contrib/p4import/git-p4import.py
> @@ -26,11 +26,11 @@ if s != default_int_handler:
>  def die(msg, *args):
>     for a in args:
>         msg = "%s %s" % (msg, a)
> -    print "git-p4import fatal error:", msg
> +    print("git-p4import fatal error:", msg)
>     sys.exit(1)
>

I think that's it for the print(...,...) stuff. I might have misssed
one or two though.


> diff --git a/git-remote-testgit.py b/git-remote-testgit.py
> index 3dc4851..9803214 100644
> --- a/git-remote-testgit.py
> +++ b/git-remote-testgit.py
> @@ -81,9 +81,9 @@ def do_capabilities(repo, args):
>     """Prints the supported capabilities.
>     """
>
> -    print "import"
> -    print "export"
> -    print "refspec refs/heads/*:%s*" % repo.prefix
> +    print("import")
> +    print("export")
> +    print("refspec refs/heads/*:%s*" % repo.prefix)
>
>     dirname = repo.get_base_path(repo.gitdir)
>
> @@ -92,11 +92,11 @@ def do_capabilities(repo, args):
>
>     path = os.path.join(dirname, 'testgit.marks')
>
> -    print "*export-marks %s" % path
> +    print("*export-marks %s" % path)
>     if os.path.exists(path):
> -        print "*import-marks %s" % path
> +        print("*import-marks %s" % path)
>
> -    print # end capabilities
> +    print() # end capabilities

print("") here. 2.x:

>>> print()
()



>
>
>  def do_list(repo, args):
> @@ -109,16 +109,16 @@ def do_list(repo, args):
>
>     for ref in repo.revs:
>         debug("? refs/heads/%s", ref)
> -        print "? refs/heads/%s" % ref
> +        print("? refs/heads/%s" % ref)
>
>     if repo.head:
>         debug("@refs/heads/%s HEAD" % repo.head)
> -        print "@refs/heads/%s HEAD" % repo.head
> +        print("@refs/heads/%s HEAD" % repo.head)
>     else:
>         debug("@refs/heads/master HEAD")
> -        print "@refs/heads/master HEAD"
> +        print("@refs/heads/master HEAD")
>
> -    print # end list
> +    print() # end list

print("")

Lots more to do, I'm afraid.

Cheers,
Frans

^ permalink raw reply

* Patches for message-digest support.
From: Bill Zaumen @ 2011-12-21  7:51 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

I just sent a series of 6 patches, roughly similar to the ones I sent a
few weeks ago, but allowing a choice of message digests in addition to
a CRC (kept for testing purposes) - SHA-1, SHA-256, and SHA-512 with
more added easily.  The current default is SHA-256.  The use of SHA-1
for git object IDs is unchanged. Unlike the object-ID digest, the
additional digests do not include the Git object-header.  I also changed
a number of function names, using "digest" or "mdigest" in them.
Searching for the string "digest" is a good way of finding things.
Finally, I added a header to commit messages (conditionally compiled so
this can be turned off) that contains a digest of digests plus some
other fields.  I also broke it up into a series of smaller patches.

Just as a summary:

The first patch contains several new files.  It uses a data structure
for message digests that keeps the bytes of a digest aligned on
32 or 64 bit boundaries to allow fast comparisons. The digests are
stored long with a one-byte code indicating the digest type. The code
handles storing and looking up the digests, including support for
alternate object databases.

The second patch modifies some of the existing git files (the major
changes are in sha1_file.c and pack-write.c) for storing message digests
when an object or a pack index file is created.

The third patch modifies the files in the builtin directory that contain
the implementation of git commands for packing and pruning objects, and
for verifying pack files and counting objects.  The code does some
checks for hash collisions by comparing the digests.  At this point,
each git object will have a digest that can be looked up given the
object's ID.  This mapping is maintained as pack files are
created.

The fourth patch adds a digest header to commit messages.  This header
contains a digest of the digests for the commit's parents and for each
object in the commit's tree, and of the other fields in the commit.
The digest header, like the rest of the commit, is used in computing the
commit's object ID and matching digest.  A function verify_commit
checks the digest header by recomputing it and can be used as desired
for authentication or other purposes.

The fifth patch transfers the message digest corresponding to a SHA-1
ID during fetch or push operations to allow early detection of
collisions.  This is a fast test - a simple lookup - and can be turned
off by removing the "mds-check" capability.

The sixth patch contains documentation.

Bill

^ permalink raw reply

* [PATCH 6/6] Provide documentation for git message digest extensions
From: Bill Zaumen @ 2011-12-21  7:13 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

The documentation includes API documentation for commands
and technical documentation for the message-digest-related
changes.  The technical documentation is in

* Documentation/technical/collision-detect.txt
* Documentation/technical/pack-format.txt

The modified commands are documented in

* Documentation/git-count-objects.txt
* Documentation/git-index-pack.txt
* Documentation/git-verify-pack.txt

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 Documentation/git-count-objects.txt          |   12 +-
 Documentation/git-index-pack.txt             |   17 +-
 Documentation/git-verify-pack.txt            |   27 ++
 Documentation/technical/collision-detect.txt |  342 ++++++++++++++++++++++++++
 Documentation/technical/pack-format.txt      |   47 ++++
 5 files changed, 439 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/technical/collision-detect.txt

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 23c80ce..4cdbaf5 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -8,7 +8,7 @@ git-count-objects - Count unpacked number of objects and their disk consumption
 SYNOPSIS
 --------
 [verse]
-'git count-objects' [-v]
+'git count-objects' [-v] [-M]
 
 DESCRIPTION
 -----------
@@ -25,6 +25,16 @@ OPTIONS
 	objects, number of packs, disk space consumed by those packs,
 	and number of objects that can be removed by running
 	`git prune-packed`.
+-M::
+--count-md::
+	Report the number of loose objects with no stored message digests.
+	With the -v option, the number of missing "mds" files (these
+	contain the message digests for the SHA1 hashes in the corresponding
+	"idx" files) is reported, along with a count of the number of
+	mds files whose size is wrong (e.g., an index was created but the
+	existing MDS file was not updated) and a count of the number of
+	objects in pack files that do not have a stored message digest.
+	Values that are zero are not shown.
 
 GIT
 ---
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..c9389f4 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -11,14 +11,14 @@ SYNOPSIS
 [verse]
 'git index-pack' [-v] [-o <index-file>] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
-                 [<pack-file>]
+		 [-m <mds-file>] [<pack-file>]
 

 DESCRIPTION
 -----------
-Reads a packed archive (.pack) from the specified file, and
-builds a pack index file (.idx) for it.  The packed archive
-together with the pack index can then be placed in the
+Reads a packed archive (.pack) from the specified file, and builds a
+pack index file (.idx) and a pack mds file (.mds) for it.  The packed
+archive together with the pack index can then be placed in the
 objects/pack/ directory of a git repository.
 

@@ -35,6 +35,14 @@ OPTIONS
 	fails if the name of packed archive does not end
 	with .pack).
 
+-m <mds-file>::
+	Write the generated pack mds file into the specified.
+	file Without this option, the name of the pack mds
+	file is constructed from the name of packed archive
+	file by replacing .pack with .idx (and the program
+	fails if the name of packed archive does not end
+	with .pack).
+
 --stdin::
 	When this flag is provided, the pack is read from stdin
 	instead and a copy is then written to <pack-file>. If
@@ -74,7 +82,6 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
-
 Note
 ----
 
diff --git a/Documentation/git-verify-pack.txt b/Documentation/git-verify-pack.txt
index cd23076..f69ed3f 100644
--- a/Documentation/git-verify-pack.txt
+++ b/Documentation/git-verify-pack.txt
@@ -33,6 +33,19 @@ OPTIONS
 	Do not verify the pack contents; only show the histogram of delta
 	chain length.  With `--verbose`, list of objects is also shown.
 
+-M::
+--show-mds::
+	Show the message digests along with the 40-character object names
+	(SHA1 value in hexidecimal). Ignored if --stat-only is set. If
+	--verbose is not set, only the table indexed by object names is
+	shown, although the files will be verified.  The message digests
+	printed are the actual ones - if the MDS file does not contain these,
+	the verification will fail.  The message digests will be
+	prefaced with a two-byte code indicating the type of digest.
+	The values (n hexadecimal) are 01 for a CRC, 05 for SHA-1, 08
+	for SHA-256, and 10 for SHA-512.  If the digest stored does
+	not match the actual digest, the actual one is printed as well.
+
 \--::
 	Do not interpret any more arguments as options.
 
@@ -48,6 +61,20 @@ for objects that are not deltified in the pack, and
 
 for objects that are deltified.
 
+When the -M option is used, the offset-in-pack field is followed by an
+entry giving the message digest.  The format used is:
+
+      md=0xHEX_VALUE
+
+when a message digest exists, and
+
+     <no md>
+
+when a message digest does not exist.  These entries precede the depth
+entry for deltified objects.  A non-existent message digest will be shown
+only if the MDS file is missing - while the MDS-file format allows missing
+entries, the file will not be considered valid.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/collision-detect.txt b/Documentation/technical/collision-detect.txt
new file mode 100644
index 0000000..d5a4364
--- /dev/null
+++ b/Documentation/technical/collision-detect.txt
@@ -0,0 +1,342 @@
+Hash-Collision Detection using Message Digests
+=============================
+
+Initially Git used a SHA-1 hash as an object ID under the assumption
+that a hash collision would never occur in practice. While an
+accidental SHA-1 collision is extremely unlikely, it is possible,
+although very expensive, to generate multiple files with the same
+SHA-1 value in under 2^57 operations.  With computer performance
+increasing significantly from one year to the next, Git's assumptions
+about SHA-1 will eventually not hold in the case of a malicious
+attempt to damage a project.  One should note that just because the
+probability of a SHA-1 collision occurring accidentally is extremely
+low does not mean a priori that SHA-1 provides an adequate safety
+margin for preventing a malicious attempt to damage repositories and a
+discussion below outlines some of the issues regarding this
+possibility.
+
+While one could modify Git to use SHA-224, SHA-256, SHA-384, or
+SHA-512 instead of SHA-1, the change would have to support the
+original format as well (in order to deal with existing Git
+repositories). While one could convert an existing repository to use
+the new hash function, this would require rewriting every object,
+including trees and commits.  The outcome would be problematic given
+the existence of email and documentation that might name commits by
+their SHA-1 hashes. One should note that Git performs a byte-by-byte
+check for hash collisions when a pack file is indexed.  Unfortunately,
+during fetch or pull operations, Git tries to avoid copying objects
+when a peer already has a copy, and this is determined solely on the
+basis of SHA-1 hashes.
+
+The following describes a modification to Git's initial design that is
+(a) relatively easy to implement, (b) is compatible with and can
+interoperate with older versions of Git (both the program and the
+repositories) (c) has a small computational overhead, and (d)
+increases security substantially, with a goal of detecting hash
+collisions early and automatically.  Because the implementation is
+relatively simple and the overhead very low, it makes sense to
+incorporate this change (or some alternative) before the security
+issue becomes a serious problem.
+
+Although Git generally uses that assumption that there will never be a
+hash collision using SHA-1 in practice, under some circumstances, Git
+will detect collisions via a byte-by-byte comparison as objects are
+added to the repository or as pack files are indexed.  This test is
+performed when an index is built (via the Git pack-index command), but
+a byte-by-byte comparison was deemed too computationally expensive to
+use in all circumstances: with pack files in particular, simply
+extracting an object can require not only decompressing it, but
+handling a series of delta encodings.
+
+Collision detection has been extended by computing a message digest of
+the object's contents (i.e., excluding the Git header). These message
+digests are stored separately from Git objects and are used for an
+independent collision test - looking up the message digests using the
+SHA-1 IDs as a key can be done quickly, and comparing them is fast as
+well (the digests are aligned to allow 32-bit or 64-bit integer
+comparisions).  This extension is computationally cheap (timing the
+Git test suite (run via 'make test') showed only a small increase in
+running time and the extension is backwards compatible with existing
+Git repositories - if a MD is not available for a SHA-1 value, the
+implementation reverts to its former behavior and simply compares
+SHA-1 values.  The implementation allows message digests to be easily
+added.
+
+The implementation creates a directory in .git/objects named "mdsd",
+which contains sub-directories and file names identical to the
+sub-directories in objects used to store loose objects: a two
+character directory name, with a 38-character file name, the
+concatenation of which gives the SHA-1 hash for the object.  the files
+in sub-directories of "mdsd", however, simply contain a one-byte code
+indicating the type of message digest, followed by the digest in its
+binary representation as a sequence of bytes.  In addition, for each
+pack file (.../objects/pack/FILE.pack), there is a corresponding file
+named .../objects/pack/FILE.mds in addition to
+.../objects/pack/FILE.idx.  The MDS file contains the MDs, stored in
+the same order as the SHA-1 hashes in .../objects/pack/FILE.idx.  The
+format of the MDS file is described in pack-format.txt.
+
+Thus, the directory structure (only part of it is shown) is as
+follows:
+
+ .git---.
+	|
+	|-objects-.
+	|	  |--XX--.
+	.	  |	 |--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+	.	  |	 .
+	.	  |	 .
+		  |	 .
+		  .
+		  .
+		  .
+		  |-mdsd-.
+		  |	 |--XX--.
+		  |	 |	|--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+		  |	 .	.
+		  |	 .	.
+		  |	 .	.
+		  |
+		  |-pack-.
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.pack
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.idx
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.mds
+		  |	 .
+		  |	 .
+		  |	 .
+		  |
+		  `-info-.
+			 .
+			 .
+
+The mds files are relatively short - an average of N+1 bytes per MD
+of length N (provided N is a multiple of 4)
+plus some fixed overhead due to a header and trailer, with the MDs
+listed in the same order as the SHA-1 values in the matching idx file
+(a function named nth_packed_object_digest has the same signature as
+the previously-defined function nth_packed_object_offset, so the
+procedure to look up the MD value from a pack file is the same).
+
+For fetch and push operations, the commands fetch-pack, send-pack,
+receive-pack, and upload-pack were modified so that various object IDs
+can have any one of the following formats, with each number
+represented in hexadecimal:
+
+		SHA1
+		SHA1-MD
+
+where SHA1 is the SHA-1 hash of a commit and MD the message digest
+(prefixed by a code indicating the type of digest) of the commit
+(uncompressed, not including the Git object header).
+
+Both receive-pack and upload-pack send a capability named "mds-check"
+to allow the two longer object IDs.  When the MDs are available, the
+longer formats are used, but are generated only by fetch-pack and
+send-pack: because of backwards-compatibility constraints,
+receive-pack and upload-pack cannot determine the capabilities of
+fetch-pack and send-pack when connected to a remote repository).  The
+collision checks during a fetch, push, or pull command are done by
+receive-pack and upload pack because send-pack and fetch-pack do not
+receive their peers' MD values - send-pack and fetch-pack cannot determine
+their peers' capabilities given the current design.
+
+Changes to the Commit Format
+----------------------------
+
+A new header is available, positioned as the last header. It's name is
+"digest" and its value is a hex representation of a message digest in
+which the first two bytes name the algorithm used to generate the
+digest and the remaining bytes are the digest itself.  This digest is
+a digest of other digests or SHA-1 values.  It starts by including the
+digests of each object in the commit's tree with the exception of a
+commit from a submodule - in that case, the SHA-1 value itself is
+used.  The tree is searched depth first, including the tree's root,
+which provides the first digest.  This is followed by the digest for
+each parent commit in the order listed in the commit.  Finally, the
+digest covers the author, committer, encoding, and the commit message,
+all excluding the terminating newline character in the header.
+
+Note that the digest header complicates any attempt at generate a hash
+collision for a commit message - if you change a field, you also have
+to change the digest in a specific way.
+
+Configuration
+-------------
+
+There are several variables in the Makefile:
+
+      * MDSDB should have the value 0. If an alternative implementation
+	is defined, other values will be available.
+
+      * MDIGEST_DEFAULT should be defined to explicitly set the
+	default message digest.
+
+      * PACKDB should be defined to turn on the packdb functions.
+
+      * PACKDB_TEST should be defined only for debugging.  It uses some
+	packdb functions where not necessary to verify that these work
+	properly.  Turning this test on decreases performance.
+
+      * COMMIT_DIGEST should be defined to add a digest header to
+	commit messages as described above
+
+      * COMMIT_DIGEST_TEST should be defined to force commit_tree to
+	compute the digest even when COMMIT_DIGEST is undefined, in
+	which case the digest will be computed but not included in a
+	commit.
+
+Key Functions
+-------------
+
+For general use, the functions documented in mdigest.h can extract
+data from message digests and compare them.  The function verify_commit
+will test a commit to make sure that its digest field matches the
+repository.  The function has_sha1_file_digest allows one to determine
+if a digest exists and obtain a copy of that digest.
+
+To find functions that are used in the implementation (e.g., if changes
+to the pack-file format become necessary), search for the type
+mdigest_t or variables width digest in their names (e.g., digestp).
+
+
+Implementation Details
+----------------------
+
+Functions to manipulate the message digest/MD database are declared
+in the file mdsdb.h.  The implementation as described above is in the
+file objd-mdsdb.c: it is thus easy to change the implementation of how
+these objects are stored with minimal impact on the rest of the source
+code.  The message digest structure and functions to manipulate it are
+declared in mdigest.h, with the implementation in mdigest.c.
+
+In pack-write.c, there is a new function named write_mds_file with the
+same function signature as write_idx_file.  Both are called in pairs
+(write_idx_file first) so that the idx file and mds file for the
+corresponding pack file will always be created.
+
+In commit.c, there is a new function that recursively traverses the
+tree associated with a commit and finds the "blob" entries and looks
+up those entries' message digests in order to compute a message digest
+of these message digests (which is faster than computing a message
+digest of all the bytes in the blobs associated with a commit).
+
+Various function names signatures in sha1_file.c were changed to take
+two additional arguments, the first a pointer to an int used as a flag
+to indicate whether a MD exists, and the second a pointer to a
+uint32_t containing the MD.  For backwards compatibility with
+previously existing functions, those functions had there names changed
+by adding "_extended" to them, with macros in cache.h defined so that
+existing code that does not need to obtain a MD would not be
+changed. There are a few additional functions added to sha1_file.c
+such as one to determine if there is an MD for a given SHA-1
+value. Many changes in the rest of Git that result from this simply
+change the arguments to these functions.  As a convention, most such
+arguments use names like objcrc32, objcrc32p, has_objcrc32 and
+has_objcrc32p in order to make it easy to find areas of the code
+implementing hash-collision detection using the git-grep command.
+
+A few data structures (notably struct pack_idx_entry and struct
+packed_git) contain fields used to store has_objcrc32 and objcrc32
+values or data associated with MDS files.  These are used while
+building new MDS files.
+
+Some of the Git commands (count-objects, index-pack, and verify-pack)
+have additional command-line options related to the MDs and mds
+files. This makes it possible to explicitly name an mds file being
+created and to request that various listings show both the MD
+values in addition to SHA-1 hashes (the MD values are not listed
+by default in case user-defined scripts assume the current behavior).
+
+For C files, changes were made to the following files (compared to
+commit f56564968 - v1.7.8-rc4) for the initial collision-detection
+implementation:
+
+       * builtin/count-objects.c
+       * builtin/fetch-pack.c
+       * builtin/index-pack.c
+       * builtin/init-db.c
+       * builtin/pack-objects.c
+       * builtin/pack-redundant.c
+       * builtin/prune-packed.c
+       * builtin/prune.c
+       * builtin/receive-pack.c
+       * builtin/send-pack.c
+       * builtin/verify-pack.c
+       * commit.c
+       * environment.c
+       * fast-import.c
+       * gdbm-packdb.c (new file)
+       * git.c
+       * hex.c
+       * http.c
+       * mdigest.c (new file)
+       * objd-mdsdb.c (new file)
+       * pack-write.c
+       * sha1_file.c
+       * upload-pack.c
+
+The other files had changes that reflected changes to function
+signatures.
+
+The header files that were modified are
+
+    * cache.h
+    * commit.h
+    * mdigest.h (new file)
+    * mdsdb.h (new file)
+    * pack.h
+    * packdb.h (new file)
+
+where the changes are mostly new function declarations, a few macros
+for backwards-compatibility, and a few additional fields in some
+data structures.
+
+Minor changes were made to the test suite: t0000-basic.sh,
+t5300-pack-object.sh, t5304-prune.sh, t5500-fetch-pack.sh, and
+t5510-fetch.sh.
+
+The packdb functions are conditionally compiled and by default are not
+used.  When used, these use GDBM to store MDs for SHA-1 hashes in
+cases in which the hash was not available - in this case the hash will
+be recomputed and stored for future use.  Testing indicates that
+packdb is not needed. It may be worth turning on during debugging to
+verify if a problem is discovered involving a missing MD. (As an
+aside, the packdb code is based on a test to see if GDBM would be
+efficient enough to store the MD values in general, thus avoiding
+the need to create "mds" files and reducing the number of files in the
+"mdsd" directory, but it turned out that performance was not
+acceptable.)
+
+Security-Issue Details
+----------------------
+
+Without hash-collision detection, Git has a possible risk of data
+corruption due to the obvious hash-collision vulnerability, so the
+issue is really whether a usable vulnerability exists. In the case
+of a single shared repository (one common repository shared by
+multiple developers), this risk is mitigated by Git's rule that
+an existing entry is never overridden. The situation is more complex
+when multiple repositories are used, as a race condition may exist.
+Also, the risk depends on whether or not developers exchange source
+files by some out-of-band mechanism (e.g., email) - when programming
+in Java, for example, it is customary to use the javadocs program to
+create API documentation from stylized comments in source files. A
+programmer might send a peer a copy of a source file and to review
+and correct the java-doc comments, providing an opporunity for the
+second developer to insert a modified file into the respository. The
+first developer would presumably check that the source code has not
+changed (an automated test for this might use the java compiler's
+"-g:none" option so that line-number data does not appear in the object
+file), and review the comments, but would not care about formatting
+(location of line breaks, etc., as the javadocs program generates HTML).
+
+In any event, recent research has shown that SHA-1 collisions can be
+found in 2^63 operations or less.  While one result claimed 2^53
+operations, the paper claiming that value was withdrawn from
+publication due to an error in the estimate. Another result claimed a
+complexity of between 2^51 and 2^57 operations, and still another
+claimed a complexity of 2^57.5 SHA-1 computations. A summary is
+available at
+<http://hackipedia.org/Checksums/SHA/html/SHA-1.htm#SHA-1>.
+This is within or close to the number of computations that can be
+managed by a well-funded organization.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1803e64..a2aad5f 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -158,3 +158,50 @@ Pack file entry: <+
     corresponding packfile.
 
     20-byte SHA1-checksum of all of the above.
+
+
+= pack-*.mds files contain message digests for objects.  The digests
+  are stored in the same order as the sha-1 values in the matching idx
+  file.  These files have the following format:
+
+  - A 6-byte magic number consisting the the characters "PKMDS" followed
+    by a NULL character (0).
+
+  - A one-byte version number (= 1)
+
+  - A one-byte field-length value for message digest fields, in units
+    of 4-byte words.  (The length of the message digest fields in bytes
+    is denoted as wbsize below)
+
+  - A set of blocks, each of which contains 4 entries encoded as follows:
+
+      * four one-byte fields (wcode fields), one per entry, for which
+	a zero value indicates that a matching entry does not exist
+	and a non-zero value indicates the type of message digest
+	encoded as follows:
+
+	   + 1 for a CRC  (used as a trivial case for performance testing)
+
+	   + 5 for SHA-1
+
+	   + 8 for SHA-256
+
+	   + 16 for SHA-512
+
+      * 4 wbsize-byte fields, each containing a message digest (by
+	convention which must be all NULL characters if the MD does
+	not exist).  For each field, the data it contains should start
+	at the first byte, padded with NULL characters if the field is
+	longer than the digest it stores.
+
+    For the set of all blocks, the nth one-byte field and the nth
+    wcode field store the values for the nth entry in the
+    file. The format ensures that each message digest starts on a
+    32-bit boundary, allowing 32-bit integer operations to be used in
+    copying or comparing values.
+
+  - A 20 byte SHA-1 hash of the SHA-1 hashes naming the objects whose
+    message digests are being stored, in the same order as they
+    appear in the corresponding idx file.
+
+  - A 20 byte SHA-1 hash of all of the above.
-- 
1.7.1

^ permalink raw reply related

* [PATCH 5/6] Add MD support for fetch, pull, and push.
From: Bill Zaumen @ 2011-12-21  7:12 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

A new capability, "mds-check" is defined. When present, a client will
(when possible and useful) send the server a SHA-1 value and a message
digest, separated by a '-'.  This is used to detect hash collisions,
with a goal of finding problems early if a malicious attempt is made
to forge commits with different commits with the same SHA-1 value in
different repositories.  It is a simple, fast test - a look-up and a
comparison.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 builtin/fetch-pack.c   |   29 +++++++++++-
 builtin/receive-pack.c |  117 +++++++++++++++++++++++++++++++++++++++++++-----
 builtin/send-pack.c    |   26 ++++++++++-
 http.c                 |   19 ++++++--
 t/t5500-fetch-pack.sh  |   10 +++--
 t/t5510-fetch.sh       |   12 ++++-
 upload-pack.c          |   11 ++++-
 7 files changed, 196 insertions(+), 28 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 6207ecd..2f5b7ef 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -18,6 +18,8 @@ static int prefer_ofs_delta = 1;
 static int no_done;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
+static int mds_check = 0;
+
 static struct fetch_pack_args args = {
 	/* .uploadpack = */ "git-upload-pack",
 };
@@ -390,9 +392,25 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 	flushes = 0;
 	retval = -1;
 	while ((sha1 = get_rev())) {
-		packet_buf_write(&req_buf, "have %s\n", sha1_to_hex(sha1));
-		if (args.verbose)
-			fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		if (mds_check) {
+			mdigest_t digest;
+			if (has_sha1_file_digest(sha1, &digest)) {
+				packet_buf_write(&req_buf, "have %s\n",
+						 sha1_to_hex_digest(sha1,
+								    &digest));
+
+			} else {
+				packet_buf_write(&req_buf, "have %s\n",
+						 sha1_to_hex(sha1));
+			}
+			if (args.verbose)
+				fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		} else {
+			packet_buf_write(&req_buf, "have %s\n",
+					 sha1_to_hex(sha1));
+			if (args.verbose)
+				fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		}
 		in_vain++;
 		if (flush_at <= ++count) {
 			int ack;
@@ -807,6 +825,11 @@ static struct ref *do_fetch_pack(int fd[2],
 			fprintf(stderr, "Server supports ofs-delta\n");
 	} else
 		prefer_ofs_delta = 0;
+	if (server_supports("mds-check")) {
+		if (args.verbose)
+			fprintf(stderr, "Server supports mds-check\n");
+		mds_check = 1;
+	}
 	if (everything_local(&ref, nr_match, match)) {
 		packet_flush(fd[1]);
 		goto all_done;
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d2dcb7e..b9d1c1f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -122,7 +122,8 @@ static int show_ref(const char *path, const unsigned char *sha1, int flag, void
 	else
 		packet_write(1, "%s %s%c%s%s\n",
 			     sha1_to_hex(sha1), path, 0,
-			     " report-status delete-refs side-band-64k",
+			     " report-status delete-refs side-band-64k"
+			     " mds-check",
 			     prefer_ofs_delta ? " ofs-delta" : "");
 	sent_capabilities = 1;
 	return 0;
@@ -709,37 +710,131 @@ static struct command *read_head_info(void)
 	struct command *commands = NULL;
 	struct command **p = &commands;
 	for (;;) {
-		static char line[1000];
+		static char line[1500];
 		unsigned char old_sha1[20], new_sha1[20];
 		struct command *cmd;
 		char *refname;
 		int len, reflen;
+		int has_old_sha1_digest = 0;
+		int has_new_sha1_digest = 0;
+		mdigest_t old_sha1_digest;
+		mdigest_t new_sha1_digest;
+		mdigest_t digest;
+		int old_hashlen = 40;
+		int new_hashlen = 40;
+		int hashlen = 81; /* includes blank between two hashes */
+		int digest_field_len = 0;
+
+		mdigest_clear(&old_sha1_digest);
+		mdigest_clear(&new_sha1_digest);
 
 		len = packet_read_line(0, line, sizeof(line));
 		if (!len)
 			break;
 		if (line[len-1] == '\n')
 			line[--len] = 0;
-		if (len < 83 ||
-		    line[40] != ' ' ||
-		    line[81] != ' ' ||
-		    get_sha1_hex(line, old_sha1) ||
-		    get_sha1_hex(line + 41, new_sha1))
+		if (len > (old_hashlen + 1) && line[old_hashlen] == '-') {
+			digest_field_len = 1
+				+ get_hex_field_size(line+old_hashlen+1);
+		}
+		if (len > (old_hashlen + digest_field_len) &&
+		    line[old_hashlen] == '-') {
+			old_hashlen += digest_field_len;
+			hashlen += digest_field_len;
+			digest_field_len = 0;
+			if (len > (old_hashlen + 1)
+			    && line[old_hashlen] == '-') {
+				digest_field_len = 1 +
+					get_hex_field_size(line+old_hashlen+1);
+			}
+			if (len > (old_hashlen + digest_field_len + 1) &&
+			    line[old_hashlen] == '-') {
+				old_hashlen += digest_field_len;
+				hashlen += digest_field_len;
+			}
+		}
+		if (line[old_hashlen] != ' ') {
+			die("protocol error: expected old/new/ref, got '%s'",
+			    line);
+		}
+		digest_field_len = 0;
+		if (len > hashlen + 1 && line[hashlen] == '-') {
+			digest_field_len = 1 +
+			  get_hex_field_size(line+hashlen+1);
+		}
+		if (len > (hashlen + digest_field_len + 1) &&
+		    line[hashlen] == '-') {
+			new_hashlen += digest_field_len;
+			hashlen += digest_field_len;
+			digest_field_len = 0;
+			if (len > (hashlen + 1)
+			    && line[hashlen] == '-') {
+				digest_field_len = 1 +
+					get_hex_field_size(line+hashlen+1);
+			}
+			if (len > (hashlen + digest_field_len + 1) &&
+			    line[hashlen] == '-') {
+				new_hashlen += digest_field_len;
+				hashlen += digest_field_len;
+			}
+		}
+		if (line[hashlen] != ' ') {
 			die("protocol error: expected old/new/ref, got '%s'",
 			    line);
+		}
+
+		if (len < hashlen + 1 ||
+		    line[old_hashlen] != ' ' ||
+		    line[hashlen] != ' ' ||
+		    get_sha1_hex_digest(line, old_sha1,
+				     &has_old_sha1_digest, &old_sha1_digest) ||
+		    get_sha1_hex_digest(line + old_hashlen + 1, new_sha1,
+					&has_new_sha1_digest,
+					&new_sha1_digest)) {
+			die("protocol error: expected old/new/ref, got '%s'",
+			    line);
+		}
+
+		if (has_old_sha1_digest &&
+		    has_sha1_file_digest(old_sha1, &digest)) {
+			if (mdigest_tst(&old_sha1_digest, &digest)) {
+				die("hash collision for %s",
+				    sha1_to_hex(old_sha1));
+			}
+		}
+
+
+		if (has_new_sha1_digest &&
+		    has_sha1_file_digest(new_sha1, &digest)) {
+			if (mdigest_tst(&new_sha1_digest, &digest)) {
+				die("hash collision for %s",
+				    sha1_to_hex(new_sha1));
+			}
+		}
 
-		refname = line + 82;
+		refname = line + hashlen + 1;
 		reflen = strlen(refname);
-		if (reflen + 82 < len) {
+		if (reflen + hashlen + 1 < len) {
 			if (strstr(refname + reflen + 1, "report-status"))
 				report_status = 1;
 			if (strstr(refname + reflen + 1, "side-band-64k"))
 				use_sideband = LARGE_PACKET_MAX;
 		}
-		cmd = xcalloc(1, sizeof(struct command) + len - 80);
+		/*
+		 * Without the additional digests,
+		 *   old_hashlen + new_hashlen = 80
+		 *   hashlen = 81,
+		 *   hashlen + 1 = 82
+		 * which puts the same numeric values into the last argument
+		 * of xcalloc, and the second & third argument of memcpy
+		 * that were used in commit
+		 * fc14b89a7e6899b8ac3b5751ec2d8c98203ea4c2.
+		 */
+		cmd = xcalloc(1, sizeof(struct command) + len
+			      - (old_hashlen + new_hashlen));
 		hashcpy(cmd->old_sha1, old_sha1);
 		hashcpy(cmd->new_sha1, new_sha1);
-		memcpy(cmd->ref_name, line + 82, len - 81);
+		memcpy(cmd->ref_name, line + hashlen + 1, len - (hashlen));
 		*p = cmd;
 		p = &cmd->next;
 	}
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index cd1115f..1eb9704 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -250,6 +250,7 @@ int send_pack(struct send_pack_args *args,
 	int allow_deleting_refs = 0;
 	int status_report = 0;
 	int use_sideband = 0;
+	int mds_check = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -263,6 +264,8 @@ int send_pack(struct send_pack_args *args,
 		args->use_ofs_delta = 1;
 	if (server_supports("side-band-64k"))
 		use_sideband = 1;
+	if (server_supports("mds-check"))
+		mds_check = 1;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
@@ -298,8 +301,27 @@ int send_pack(struct send_pack_args *args,
 		if (args->dry_run) {
 			ref->status = REF_STATUS_OK;
 		} else {
-			char *old_hex = sha1_to_hex(ref->old_sha1);
-			char *new_hex = sha1_to_hex(ref->new_sha1);
+			char *old_hex, *new_hex;
+			if (mds_check) {
+				mdigest_t digest;
+				if (has_sha1_file_digest(ref->old_sha1,
+						      &digest)) {
+					old_hex = sha1_to_hex_digest
+						(ref->old_sha1, &digest);
+				} else {
+					old_hex = sha1_to_hex(ref->old_sha1);
+				}
+				if (has_sha1_file_digest(ref->new_sha1,
+						      &digest)) {
+					new_hex = sha1_to_hex_digest
+						(ref->new_sha1, &digest);
+				} else {
+					new_hex = sha1_to_hex(ref->new_sha1);
+				}
+			} else {
+				old_hex = sha1_to_hex(ref->old_sha1);
+				new_hex = sha1_to_hex(ref->new_sha1);
+			}
 
 			if (!cmds_sent && (status_report || use_sideband)) {
 				packet_buf_write(&req_buf, "%s %s %s%c%s%s",
diff --git a/http.c b/http.c
index 0ffd79c..e4e3ec7 100644
--- a/http.c
+++ b/http.c
@@ -1014,8 +1014,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	struct packed_git **lst;
 	struct packed_git *p = preq->target;
 	char *tmp_idx;
+	char *tmp_mds;
 	struct child_process ip;
-	const char *ip_argv[8];
+	const char *ip_argv[10];
 
 	close_pack_index(p);
 
@@ -1028,14 +1029,20 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	*lst = (*lst)->next;
 
 	tmp_idx = xstrdup(preq->tmpfile);
+	tmp_mds = xstrdup(preq->tmpfile);
 	strcpy(tmp_idx + strlen(tmp_idx) - strlen(".pack.temp"),
 	       ".idx.temp");
+	strcpy(tmp_mds + strlen(tmp_mds) - strlen(".pack.temp"),
+	       ".mds.temp");
+
 
 	ip_argv[0] = "index-pack";
 	ip_argv[1] = "-o";
 	ip_argv[2] = tmp_idx;
-	ip_argv[3] = preq->tmpfile;
-	ip_argv[4] = NULL;
+	ip_argv[3] = "-m";
+	ip_argv[4] = tmp_mds;
+	ip_argv[5] = preq->tmpfile;
+	ip_argv[6] = NULL;
 
 	memset(&ip, 0, sizeof(ip));
 	ip.argv = ip_argv;
@@ -1046,20 +1053,24 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	if (run_command(&ip)) {
 		unlink(preq->tmpfile);
 		unlink(tmp_idx);
+		unlink(tmp_mds);
 		free(tmp_idx);
+		free(tmp_mds);
 		return -1;
 	}
 
 	unlink(sha1_pack_index_name(p->sha1));
 
 	if (move_temp_to_file(preq->tmpfile, sha1_pack_name(p->sha1))
-	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))) {
+	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))
+	 || move_temp_to_file(tmp_mds, sha1_pack_mds_name(p->sha1))) {
 		free(tmp_idx);
 		return -1;
 	}
 
 	install_packed_git(p);
 	free(tmp_idx);
+	free(tmp_mds);
 	return 0;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 9bf69e9..b6632d2 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -53,8 +53,8 @@ pull_to_client () {
 			git symbolic-ref HEAD refs/heads/`echo $heads \
 				| sed -e "s/^\(.\).*$/\1/"` &&
 
-			git fsck --full &&
-
+			git fsck --full  &&
+			test -z "`git count-objects -v -M | grep MD`" &&
 			mv .git/objects/pack/pack-* . &&
 			p=`ls -1 pack-*.pack` &&
 			git unpack-objects <$p &&
@@ -142,7 +142,8 @@ test_expect_success 'fsck in shallow repo' '
 test_expect_success 'simple fetch in shallow repo' '
 	(
 		cd shallow &&
-		git fetch
+		git fetch &&
+		test -z "`git count-objects -v -M | grep MD`"
 	)
 '
 
@@ -245,7 +246,8 @@ test_expect_success 'clone shallow object count' '
 		cd shallow &&
 		git count-objects -v
 	) > count.shallow &&
-	grep "^count: 52" count.shallow
+	grep "^count: 52" count.shallow  &&
+	test -z "`git count-objects -v -M | grep MD`"
 '
 
 test_done
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index e88dbd5..5e3b8c6 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -14,6 +14,12 @@ test_bundle_object_count () {
 	test "$2" = $(grep '^[0-9a-f]\{40\} ' verify.out | wc -l)
 }
 
+test_bundle_mds_count () {
+	git verify-pack -v -M "$1" >verify.out &&
+	test "$2" = $(grep '^[0-9a-f]\{40\} ' verify.out | grep -v "<no md>" | wc -l)
+}
+
+
 test_expect_success setup '
 	echo >file original &&
 	git add file &&
@@ -214,7 +220,8 @@ test_expect_success 'bundle 1 has only 3 files ' '
 		cat
 	) <bundle1 >bundle.pack &&
 	git index-pack bundle.pack &&
-	test_bundle_object_count bundle.pack 3
+	test_bundle_object_count bundle.pack 3 &&
+	test_bundle_mds_count bundle.pack 3
 '
 
 test_expect_success 'unbundle 2' '
@@ -237,7 +244,8 @@ test_expect_success 'bundle does not prerequisite objects' '
 		cat
 	) <bundle3 >bundle.pack &&
 	git index-pack bundle.pack &&
-	test_bundle_object_count bundle.pack 3
+	test_bundle_object_count bundle.pack 3 &&
+	test_bundle_mds_count bundle.pack 3
 '
 
 test_expect_success 'bundle should be able to create a full history' '
diff --git a/upload-pack.c b/upload-pack.c
index 6f36f62..1e77826 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -320,11 +320,18 @@ static int got_sha1(char *hex, unsigned char *sha1)
 {
 	struct object *o;
 	int we_knew_they_have = 0;
+	int has_sha1_digest, has_digest;
+	mdigest_t sha1_digest, digest;
 
-	if (get_sha1_hex(hex, sha1))
+	if (get_sha1_hex_digest(hex, sha1, &has_sha1_digest, &sha1_digest))
 		die("git upload-pack: expected SHA1 object, got '%s'", hex);
 	if (!has_sha1_file(sha1))
 		return -1;
+	has_digest = has_sha1_file_digest(sha1, &digest);
+	if (has_sha1_digest && has_digest
+	    && mdigest_tst(&digest, &sha1_digest)) {
+		die("git upload-pack: SHA1 collision on MD for %s", hex);
+	}
 
 	o = lookup_object(sha1);
 	if (!(o && o->parsed))
@@ -719,7 +726,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed" " mds-check";
 	struct object *o = parse_object(sha1);
 	const char *refname_nons = strip_namespace(refname);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 4/6] Add digests to commit objects.
From: Bill Zaumen @ 2011-12-21  7:12 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

When COMMIT_DIGEST is defined in the Makefile, a new
header is added to commits. The header is named 'digest'
and is a digest of the digests associated with the
commit's tree (computed recursively) and parents, and
of the other fields. This digest is included in the
SHA-1 hash computation and the commit's digest.

A function named verify_commit allows the digest to
be recomputed and checked.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 commit.c |  436 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 commit.h |   11 ++
 2 files changed, 443 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index b781274..ac0d492 100644
--- a/commit.c
+++ b/commit.c
@@ -6,11 +6,395 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
 
+struct commit_mds_context {
+	unsigned long missing;
+	mdigest_context_t context;
+#ifdef PACKDB
+	int packdb_opened;
+#endif /* PACKDB */
+};
+
+#if defined(COMMIT_DIGEST)||defined(PACKDB_TEST)||defined(COMMIT_DIGEST_TEST)
+static int get_objects_mds_f(const unsigned char *sha1,
+			  const char *basebuf, int baselen,
+			  const char *path, unsigned int mode, int stage,
+			  void *context)
+{
+	struct commit_mds_context *c = (struct commit_mds_context *)context;
+	mdigest_t digest;
+	unsigned long size;
+	int type;
+
+	if (S_ISGITLINK(mode)) {
+		/*
+		 * Submodule entry - SHA-1 of a commit in the submodule
+		 * with no entry in our repository.
+		 */
+		mdigest_Update(&c->context, sha1, 20);
+		return -1;
+	}
+	if (!has_sha1_file(sha1)) {
+		c->missing++;
+		return -1;
+	}
+	if (has_sha1_file_digest(sha1, &digest)) {
+		int wcode = get_mdigest_wcode(&digest);
+		unsigned char ucwcode = (unsigned char) (wcode & 0xff);
+		mdigest_Update(&c->context, &ucwcode, 1);
+		mdigest_Update(&c->context, get_mdigest_buffer(&digest),
+			       get_mdigest_len(&digest));
+#if defined(PACKDB) && defined (PACKDB_TEST)
+		{
+		  static int firsttime = 1;
+		  mdigest_t xdigest;
+		  if (!c->packdb_opened) {
+		    packdb_open();
+		    c->packdb_opened = 1;
+		    firsttime = 0;
+		  }
+		  if (firsttime) {
+		    if (!packdb_lookup(sha1, &xdigest)) {
+		      packdb_process(sha1, &digest);
+		      packdb_lookup(sha1, &xdigest);
+		      if (mdigest_tst(&digest, &xdigest)) {
+			die("digest for %s failed with packdb\n",
+			    sha1_to_hex(sha1));
+		      }
+		    }
+		  }
+		}
+#endif
+	} else {
+		enum object_type xtype;
+		unsigned long xsize;
+		mdigest_t xdigest;
+		unsigned char xsha1[20];
+		int wcode;
+		unsigned char ucwcode;
+#ifdef PACKDB
+		if (!c->packdb_opened) {
+			packdb_open();
+			c->packdb_opened = 1;
+		}
+		if (!packdb_lookup(sha1, &xdigest)) {
+#endif /* PACKDB */
+			void *data = read_sha1_file(sha1, &xtype, &xsize);
+			hash_sha1_file_extended(data, xsize,
+						typename(xtype),
+						xsha1,
+						&xdigest);
+			free(data);
+#ifdef PACKDB
+			packdb_process(sha1, &xdigest);
+		  }
+#endif /* PACKDB */
+		wcode = get_mdigest_wcode(&xdigest);
+		ucwcode = (unsigned char) (wcode & 0xff);
+		mdigest_Update(&c->context, &ucwcode, 1);
+		mdigest_Update(&c->context, get_mdigest_buffer(&xdigest),
+			       get_mdigest_len(&xdigest));
+	}
+	type = sha1_object_info(sha1, &size);
+	switch(type) {
+	case OBJ_TREE:
+	  return (S_ISDIR(mode))? READ_TREE_RECURSIVE: 0;
+	case OBJ_BLOB:
+		return 0;
+	default:
+		if (type <= OBJ_NONE) {
+		  c->missing++;
+		}
+		return 0;
+	}
+}
+
+/*
+ * Works with a tree or a commit sha1 - recursively traverses the trees
+ * and computes the CRC of each blob's CRC. With a commit sha1, the
+ * caller must provide the 'parents' list - we sometimes call this
+ * function with the commit-object's tree before the sha1 value is
+ * computed.
+ */
+static int get_objects_mds(const unsigned char *sha1,
+			   struct commit_list *parents,
+			   const char *author, size_t author_len,
+			   const char *committer, size_t committer_len,
+			   const char *encoding, size_t encoding_len,
+			   struct commit_extra_header *extra,
+			   const char *msg, size_t msg_len,
+			   mdigest_t *digestp)
+{
+	struct commit_mds_context context;
+	struct tree *tree = parse_tree_indirect(sha1);
+	struct pathspec ps;
+	mdigest_t xdigest;
+	struct commit_extra_header *extra_head = extra;
+	mdigest_Init(&context.context, MDIGEST_DEFAULT);
+	context.missing = 0;
+#ifdef PACKDB
+	context.packdb_opened = 0;
+#endif /* PACKDB */
+
+	if (tree == NULL) {
+		return -1;
+	} else {
+		init_pathspec(&ps, NULL);
+		parse_tree(tree);
+		if (has_sha1_file_digest(tree->object.sha1, &xdigest)) {
+			int wcode = get_mdigest_wcode(&xdigest);
+			unsigned char ucwcode =
+				(unsigned char) (wcode & 0xff);
+			mdigest_Update(&context.context, &ucwcode, 1);
+			mdigest_Update(&context.context,
+				       get_mdigest_buffer(&xdigest),
+				       get_mdigest_len(&xdigest));
+		} else {
+			enum object_type xtype;
+			unsigned long xsize;
+			mdigest_t xdigest;
+			unsigned char xsha1[20];
+			int wcode;
+			unsigned char ucwcode;
+#ifdef PACKDB
+			if (!context.packdb_opened) {
+				packdb_open();
+				context.packdb_opened = 1;
+			}
+			if (!packdb_lookup(sha1, &xdigest)) {
+#endif /* PACKDB */
+
+				void *data = read_sha1_file(sha1, &xtype,
+							    &xsize);
+				hash_sha1_file_extended(data, xsize,
+							typename(xtype),
+							xsha1,
+							&xdigest);
+				free(data);
+#ifdef PACKDB
+				packdb_process(sha1, &xdigest);
+			}
+#endif /* PACKDB */
+			wcode = get_mdigest_wcode(&xdigest);
+			ucwcode = (unsigned char)(wcode & 0xff);
+			mdigest_Update(&context.context, &ucwcode, 1);
+			mdigest_Update(&context.context,
+				       get_mdigest_buffer(&xdigest),
+				       get_mdigest_len(&xdigest));
+		}
+		read_tree_recursive(tree, "", 0, 0, &ps, get_objects_mds_f,
+				    &context);
+		while (parents) {
+			/*
+			 * Include the message digests of the parent commits.
+			 */
+			struct commit_list *next = parents->next;
+			if (!has_sha1_file(parents->item->object.sha1)) {
+				return -1;
+			} else if (has_sha1_file_digest
+				   (parents->item->object.sha1, &xdigest)) {
+				int wcode = get_mdigest_wcode(&xdigest);
+				unsigned char ucwcode =
+					(unsigned char) (wcode & 0xff);
+				mdigest_Update(&context.context, &ucwcode, 1);
+				mdigest_Update(&context.context,
+					       get_mdigest_buffer(&xdigest),
+					       get_mdigest_len(&xdigest));
+			} else {
+				enum object_type xtype;
+				unsigned long xsize;
+				mdigest_t xdigest;
+				unsigned char xsha1[20];
+				int wcode;
+				unsigned char ucwcode;
+#ifdef PACKDB
+				if (!context.packdb_opened) {
+					packdb_open();
+					context.packdb_opened = 1;
+				}
+				if (!packdb_lookup(sha1, &xdigest)) {
+#endif /* PACKDB */
+
+					void *data = read_sha1_file(sha1,
+								    &xtype,
+								    &xsize);
+					hash_sha1_file_extended(data, xsize,
+								typename(xtype),
+								xsha1,
+								&xdigest);
+					free(data);
+#ifdef PACKDB
+					packdb_process(sha1, &xdigest);
+				}
+#endif /* PACKDB */
+				wcode = get_mdigest_wcode(&xdigest);
+				ucwcode = (unsigned char)(wcode & 0xff);
+				mdigest_Update(&context.context, &ucwcode, 1);
+				mdigest_Update(&context.context,
+					       get_mdigest_buffer(&xdigest),
+					       get_mdigest_len(&xdigest));
+			}
+			parents = next;
+		}
+#ifdef PACKDB
+		if (context.packdb_opened) packdb_close();
+#endif /* PACKDB */
+		if (msg && author && committer && digestp) {
+			mdigest_Update(&context.context, author, author_len);
+			mdigest_Update(&context.context, committer,
+				       committer_len);
+			if (encoding) mdigest_Update(&context.context, encoding,
+						     encoding_len);
+			while (extra) {
+			  mdigest_Update(&context.context, extra->key,
+					 strlen(extra->key));
+			  mdigest_Update(&context.context, " ", 1);
+			  mdigest_Update(&context.context, extra->value,
+					 extra->len);
+			  mdigest_Update(&context.context, "\n", 1);
+			  extra = extra->next;
+			}
+			free_commit_extra_headers(extra_head);
+			mdigest_Update(&context.context, msg, msg_len);
+		}
+		if (digestp) mdigest_Final(digestp, &context.context);
+		return ((context.missing == 0)? 0: -1);
+	}
+}
+#endif /* defined(COMMIT_DIGEST)||defined(PACKDB_TEST)||defined(COMMIT_DIGEST_TEST) */
+
+int verify_commit(struct commit *commit) {
+#ifdef COMMIT_DIGEST
+	if (save_commit_buffer) {
+		mdigest_t edigest;
+		mdigest_t digest;
+		const char *author = NULL;
+		size_t author_len = 0;
+		const char *committer = NULL;
+		size_t committer_len = 0;
+		const char *encoding = NULL;
+		size_t encoding_len = 0;
+		const char *msg = NULL;
+		size_t msg_len = 0;
+		const char *mdstring = NULL;
+		size_t mdstring_len = 0;
+		const char *bufptr;
+		const char *tail;
+		struct commit_extra_header *extra = NULL;
+
+		parse_commit(commit);
+		extra = read_commit_extra_header_lines(commit->buffer,
+						       commit->buffer_len);
+		bufptr = commit->buffer;
+		tail = bufptr + commit->buffer_len;
+		while (*bufptr != '\n' && bufptr < tail) {
+			if (*bufptr == 'a') {
+				if ((bufptr + 7) < tail) {
+					if (memcmp(bufptr, "author ", 7) == 0) {
+						bufptr += 7;
+						author = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						author_len = bufptr - author;
+					} else while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else if (*bufptr == 'c') {
+				if ((bufptr + 10) < tail) {
+					if (memcmp(bufptr,
+						   "committer ", 10) == 0) {
+						bufptr += 10;
+						committer = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						committer_len =
+							bufptr - committer;
+					} else  while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else if(*bufptr == 'e') {
+				if ((bufptr + 9) < tail) {
+					if (memcmp(bufptr,
+						   "encoding ", 9) == 0) {
+						bufptr += 9;
+						encoding = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						encoding_len =
+							bufptr - encoding;
+					} else  while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else if (*bufptr == 'd') {
+				if ((bufptr + 7) < tail) {
+					if (memcmp(bufptr, "digest ", 7) == 0) {
+						bufptr += 7;
+						mdstring = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						mdstring_len =
+							bufptr - mdstring;
+					} else  while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else {
+				while (*bufptr != '\n' && bufptr < tail)
+					bufptr++;
+				if (bufptr < tail) bufptr++;
+			}
+		}
+		if (*bufptr == '\n' && bufptr < tail) {
+			bufptr++;
+			msg = bufptr;
+			msg_len = tail - bufptr;
+		}
+		if (mdstring &&
+		    get_mdigest_from_external_hex(&edigest, mdstring) < 0) {
+			return -1;
+		}
+		if (author && committer && msg) {
+			if (mdstring == NULL) return 0;
+			if (get_objects_mds(commit->object.sha1,
+					    commit->parents,
+					    author, author_len,
+					    committer, committer_len,
+					    encoding, encoding_len,
+					    extra,
+					    msg, msg_len,
+					    &digest)) {
+				return -1;
+			}
+			return mdigest_tst(&edigest, &digest);
+		} else {
+			return -1;
+		}
+	} else {
+	  return 0;
+	}
+#else /* COMMIT_DIGEST */
+	return 0;
+#endif /* COMMIT_DIGEST */
+}
+
 static struct commit *check_commit(struct object *obj,
 				   const unsigned char *sha1,
 				   int quiet)
@@ -325,6 +709,9 @@ int parse_commit(struct commit *item)
 	ret = parse_commit_buffer(item, buffer, size);
 	if (save_commit_buffer && !ret) {
 		item->buffer = buffer;
+#ifdef COMMIT_DIGEST
+		item->buffer_len = (size_t) size;
+#endif
 		return 0;
 	}
 	free(buffer);
@@ -916,6 +1303,9 @@ static inline int standard_header_field(const char *field, size_t len)
 {
 	return ((len == 4 && !memcmp(field, "tree ", 5)) ||
 		(len == 6 && !memcmp(field, "parent ", 7)) ||
+#ifdef COMMIT_DIGEST
+		(len == 6 && !memcmp(field, "digest ", 7)) ||
+#endif
 		(len == 6 && !memcmp(field, "author ", 7)) ||
 		(len == 9 && !memcmp(field, "committer ", 10)) ||
 		(len == 8 && !memcmp(field, "encoding ", 9)));
@@ -998,12 +1388,43 @@ int commit_tree_extended(const char *msg, unsigned char *tree,
 	int result;
 	int encoding_is_utf8;
 	struct strbuf buffer;
+	static char committer[1000];
+	const char *encoding = NULL;
+#if defined(COMMIT_DIGEST) || defined (COMMIT_DIGEST_TEST)
+	mdigest_t digest;
+#endif /* defined(COMMIT_DIGEST) || defined(PACKDB_TEST) */
+	/*
+	 * git_committer_info returns a static buffer of size 1000, so
+	 * we have to copy it - assume git_committer_info does necessary
+	 * buffer-overflow tests.
+	 */
+	strcpy (committer, git_committer_info(IDENT_ERROR_ON_NO_NAME));
 
 	assert_sha1_type(tree, OBJ_TREE);
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
+	if (!encoding_is_utf8)
+		encoding = git_commit_encoding;
 
+	/* Person/date information setup*/
+	if (!author)
+		author = git_author_info(IDENT_ERROR_ON_NO_NAME);
+#if defined(COMMIT_DIGEST) || defined(PACKDB_TEST) || defined(COMMIT_DIGEST_TEST)
+	/*
+	 * Have all the pieces so compute the message digest. We do it here
+	 * because the list 'parents' will be destroyed by the following loop.
+	 */
+	if (get_objects_mds(tree, parents,
+			    author, (author? strlen(author): 0),
+			    committer, strlen(committer),
+			    encoding, (encoding? strlen(encoding): 0),
+			    extra,
+			    msg, (msg? strlen(msg): 0),
+			    &digest)) {
+		die("could not compute message digest for commit");
+	}
+#endif /* defined(COMMIT_DIGEST) || defined(PACKDB_TEST)  || defined(COMMIT_DIGEST_TEST)*/
 	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
 	strbuf_addf(&buffer, "tree %s\n", sha1_to_hex(tree));
 
@@ -1023,17 +1444,19 @@ int commit_tree_extended(const char *msg, unsigned char *tree,
 	}
 
 	/* Person/date information */
-	if (!author)
-		author = git_author_info(IDENT_ERROR_ON_NO_NAME);
 	strbuf_addf(&buffer, "author %s\n", author);
-	strbuf_addf(&buffer, "committer %s\n", git_committer_info(IDENT_ERROR_ON_NO_NAME));
+	strbuf_addf(&buffer, "committer %s\n", committer);
+
 	if (!encoding_is_utf8)
 		strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
-
 	while (extra) {
 		add_extra_header(&buffer, extra);
 		extra = extra->next;
 	}
+#ifdef COMMIT_DIGEST
+	strbuf_addf(&buffer, "digest %s\n", mdigest_to_external_hex(&digest));
+#endif /* COMMIT_DIGEST */
+
 	strbuf_addch(&buffer, '\n');
 
 	/* And add the comment */
@@ -1045,6 +1468,11 @@ int commit_tree_extended(const char *msg, unsigned char *tree,
 
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
+#if defined(COMMIT_DIGEST) && defined (COMMIT_DIGEST_TEST)
+	if (verify_commit(lookup_commit(ret))) {
+	    die("commit verification failed for %s\n", sha1_to_hex(ret));
+	  }
+#endif
 	return result;
 }
 
diff --git a/commit.h b/commit.h
index 3745f12..7a91519 100644
--- a/commit.h
+++ b/commit.h
@@ -5,6 +5,7 @@
 #include "tree.h"
 #include "strbuf.h"
 #include "decorate.h"
+#include "mdigest.h"
 
 struct commit_list {
 	struct commit *item;
@@ -19,6 +20,9 @@ struct commit {
 	struct commit_list *parents;
 	struct tree *tree;
 	char *buffer;
+#ifdef COMMIT_DIGEST
+	size_t buffer_len;
+#endif
 };
 
 extern int save_commit_buffer;
@@ -218,4 +222,11 @@ struct merge_remote_desc {
  */
 struct commit *get_merge_parent(const char *name);
 
+/*
+ * Returns 0 if OK or if save_commit_buffer == 0 or if COMMIT_DIGEST was
+ * not defined during compilation; non-zero otherwise.  If a commit does
+ * not have a digest field, 0 is returned.
+ */
+extern int verify_commit(struct commit *commit);
+
 #endif /* COMMIT_H */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/6] Add MD support for packfiles, fast-import, and pruning.
From: Bill Zaumen @ 2011-12-21  7:11 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

The utilities for creating and querying pack files,
for the fast-import of files, and for pruning a
git repository were modified to support message digests
(either in individual files or in 'mds' files that
parallel pack index files).

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 builtin/count-objects.c   |   92 +++++++++++++++++++++++++++++-
 builtin/index-pack.c      |  139 +++++++++++++++++++++++++++++++++++++++++----
 builtin/pack-objects.c    |   81 ++++++++++++++++++++++++++-
 builtin/pack-redundant.c  |   14 ++++-
 builtin/prune-packed.c    |   21 ++++++-
 builtin/prune.c           |    1 +
 builtin/verify-pack.c     |   14 ++++-
 fast-import.c             |   77 ++++++++++++++++++++++++-
 git-repack.sh             |   12 +++-
 t/t5300-pack-object.sh    |   17 +++++-
 t/t5301-sliding-window.sh |   14 ++++-
 t/t5302-pack-index.sh     |    6 +-
 t/t5304-prune.sh          |   13 +++--
 t/t9300-fast-import.sh    |    8 ++-
 14 files changed, 467 insertions(+), 42 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index c37cb98..47135e1 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -8,6 +8,12 @@
 #include "dir.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "mdsdb.h"
+
+int mdsmode = 0;
+unsigned long has_loose_mds = 0;
+unsigned long loose_mds_missing = 0;
+
 
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
@@ -53,20 +59,37 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 			continue;
 		}
 		(*loose)++;
-		if (!verbose)
+		if (!verbose) {
+			if (mdsmode) {
+				if (get_sha1_hex(hex, sha1)) {
+					die("internal error");
+				} else if (mdsdb_lookup(NULL, sha1, NULL) > 0) {
+					has_loose_mds++;
+				} else {
+					loose_mds_missing++;
+				}
+			}
 			continue;
+		}
 		memcpy(hex, path+len, 2);
 		memcpy(hex+2, ent->d_name, 38);
 		hex[40] = 0;
 		if (get_sha1_hex(hex, sha1))
 			die("internal error");
+		if (mdsmode) {
+			if (mdsdb_lookup(NULL, sha1, NULL) > 0) {
+				has_loose_mds++;
+			} else {
+				loose_mds_missing++;
+			}
+		}
 		if (has_sha1_pack(sha1))
 			(*packed_loose)++;
 	}
 }
 
 static char const * const count_objects_usage[] = {
-	"git count-objects [-v]",
+	"git count-objects [-v] [-M]",
 	NULL
 };
 
@@ -80,6 +103,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	off_t loose_size = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, "be verbose"),
+		OPT_BOOLEAN('M', "count-md", &mdsmode,
+			    "count MDs (Message Digests)"),
 		OPT_END(),
 	};
 
@@ -90,6 +115,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	memcpy(path, objdir, len);
 	if (len && objdir[len-1] != '/')
 		path[len++] = '/';
+	mdsdb_open(NULL);
 	for (i = 0; i < 256; i++) {
 		DIR *d;
 		sprintf(path + len, "%02x", i);
@@ -100,10 +126,16 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 			      &loose, &loose_size, &packed_loose, &garbage);
 		closedir(d);
 	}
+	mdsdb_close(NULL);
 	if (verbose) {
 		struct packed_git *p;
 		unsigned long num_pack = 0;
 		off_t size_pack = 0;
+		unsigned long mds_mismatched = 0;
+		unsigned long missing_mdsfile_count = 0;
+		unsigned long mds_count = 0;
+		int wsize = 0;
+		mdigest_t digest;
 		if (!packed_git)
 			prepare_packed_git();
 		for (p = packed_git; p; p = p->next) {
@@ -114,6 +146,40 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 			packed += p->num_objects;
 			size_pack += p->pack_size + p->index_size;
 			num_pack++;
+			if (!mdsmode)
+				continue;
+			if (open_pack_mds(p)) {
+				missing_mdsfile_count++;
+				continue;
+			}
+			/*
+			 * Assume mds version 1 for now. We check that
+			 * the mds file has the right size and record if it
+			 * doesn't.  If it is the right size, we go through
+			 * all the entries and count the number of sha1 hashes
+			 * for which there is a recorded CRC.  We do not
+			 * check if the CRC is the right one for the
+			 * corresponding object: run git pack-verify to do
+			 * that.
+			 */
+			if (p->mds_size > 7) {
+			  wsize = ((unsigned char *)(p->mds_data))[7] * 4;
+			}
+			if (p->mds_size == (size_t)8 +
+			    (((size_t)
+			      ((p->num_objects)/4 + (p->num_objects % 4 != 0))
+			      * (size_t)4 * (size_t)(1 + wsize)) +
+			     (size_t)(20 * 2))) {
+				for (i = 0; i < p->num_objects; i++) {
+					mds_count +=
+					  (nth_packed_object_mdigest(p,
+								      i,
+								      &digest)
+					   == 1);
+				}
+			} else {
+			  mds_mismatched++;
+			}
 		}
 		printf("count: %lu\n", loose);
 		printf("size: %lu\n", (unsigned long) (loose_size / 1024));
@@ -122,9 +188,31 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		printf("size-pack: %lu\n", (unsigned long) (size_pack / 1024));
 		printf("prune-packable: %lu\n", packed_loose);
 		printf("garbage: %lu\n", garbage);
+		if (mdsmode) {
+			if (missing_mdsfile_count) {
+				printf("missing MD (Message Digest) "
+				       "files: %lu\n",
+					missing_mdsfile_count);
+			}
+			if (mds_mismatched)
+				printf("MD (Message Digest) files with"
+				       " wrong size: %lu "
+				       "(file extension = .mds)\n",
+					mds_mismatched);
+			if (packed != mds_count) {
+				printf("missing MD (Message Digest)"
+				       " count: %lu\n",
+				       packed - mds_count);
+			}
+		}
 	}
 	else
 		printf("%lu objects, %lu kilobytes\n",
 		       loose, (unsigned long) (loose_size / 1024));
+	if (mdsmode && loose_mds_missing) {
+		assert(loose == (loose_mds_missing + has_loose_mds));
+		printf("%lu loose objects with no MD (Message Digest)\n",
+		       loose_mds_missing);
+	}
 	return 0;
 }
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 98025da..127f879 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1,3 +1,4 @@
+#include <unistd.h>
 #include "builtin.h"
 #include "delta.h"
 #include "pack.h"
@@ -23,6 +24,14 @@ struct object_entry {
 	int base_object_no;
 };
 
+static int sha1_compare(const void *_a, const void *_b)
+{
+	struct object_entry *a = (struct object_entry *)_a;
+	struct object_entry *b = (struct object_entry *)_b;
+	return hashcmp(a->idx.sha1, b->idx.sha1);
+}
+
+
 union delta_base {
 	unsigned char sha1[20];
 	off_t offset;
@@ -447,9 +456,10 @@ static void find_delta_children(const union delta_base *base,
 }
 
 static void sha1_object(const void *data, unsigned long size,
-			enum object_type type, unsigned char *sha1)
+			enum object_type type, unsigned char *sha1,
+			mdigest_t *digestp)
 {
-	hash_sha1_file(data, size, typename(type), sha1);
+	hash_sha1_file_extended(data, size, typename(type), sha1, digestp);
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
@@ -549,7 +559,8 @@ static void resolve_delta(struct object_entry *delta_obj,
 	if (!result->data)
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
-		    delta_obj->idx.sha1);
+		    delta_obj->idx.sha1, &(delta_obj->idx.digest));
+	delta_obj->idx.has_digest = 1;
 	nr_resolved_deltas++;
 }
 
@@ -643,8 +654,12 @@ static void parse_pack_objects(unsigned char *sha1)
 			nr_deltas++;
 			delta->obj_no = i;
 			delta++;
-		} else
-			sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+		} else {
+		  sha1_object(data, obj->size, obj->type, obj->idx.sha1,
+			      &(obj->idx.digest));
+		  obj->idx.has_digest = 1;
+		}
+
 		free(data);
 		display_progress(progress, i+1);
 	}
@@ -804,6 +819,7 @@ static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved)
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
+		  const char *final_mds_name, const char *curr_mds_name,
 		  const char *keep_name, const char *keep_msg,
 		  unsigned char *sha1)
 {
@@ -866,6 +882,18 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	} else
 		chmod(final_index_name, 0444);
 
+	if (final_mds_name != curr_mds_name) {
+		if (!final_mds_name) {
+			snprintf(name, sizeof(name), "%s/pack/pack-%s.mds",
+				 get_object_directory(), sha1_to_hex(sha1));
+			final_mds_name = name;
+		}
+		if (move_temp_to_file(curr_mds_name, final_mds_name))
+			die("cannot store mds file");
+	} else
+		chmod(final_mds_name, 0444);
+
+
 	if (!from_stdin) {
 		printf("%s\n", sha1_to_hex(sha1));
 	} else {
@@ -972,18 +1000,46 @@ static void read_idx_option(struct pack_idx_option *opts, const char *pack_name)
 	free(p);
 }
 
-static void show_pack_info(int stat_only)
+static void show_pack_info(int stat, int stat_only, int show_mds,
+			   int mds_file_exists, const char *path)
 {
 	int i, baseobjects = nr_objects - nr_deltas;
 	unsigned long *chain_histogram = NULL;
+	void *data = NULL;
+	size_t mds_size = 0;
+	struct packed_git pg;
+
+	if (mds_file_exists) {
+		int fd = git_open_noatime(path);
+		size_t required_size = 0;
+		struct stat st;
+		if (fd >= 0) {
+			if (fstat(fd, &st)) {
+				close(fd);
+			} else {
+				mds_size = xsize_t(st.st_size);
+				data = xmmap(NULL, mds_size,
+						PROT_READ, MAP_PRIVATE, fd, 0);
+				close(fd);
+				required_size = required_git_packed_mds_size
+					(path, data, nr_objects, mds_size);
+				if (required_size == 0) {
+					munmap(data, mds_size);
+					data = NULL;
+				}
+			}
+		}
+		if (data == NULL) mds_file_exists = 0;
+		pg.mds_data = data;
+	}
 
-	if (deepest_delta)
+	if (stat && deepest_delta)
 		chain_histogram = xcalloc(deepest_delta, sizeof(unsigned long));
 
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
 
-		if (is_delta_type(obj->type))
+		if (chain_histogram && is_delta_type(obj->type))
 			chain_histogram[obj->delta_depth - 1]++;
 		if (stat_only)
 			continue;
@@ -992,12 +1048,41 @@ static void show_pack_info(int stat_only)
 		       typename(obj->real_type), obj->size,
 		       (unsigned long)(obj[1].idx.offset - obj->idx.offset),
 		       (uintmax_t)obj->idx.offset);
+		if (show_mds) {
+			if (mds_file_exists) {
+				mdigest_t digest;
+				int has_digest = nth_packed_object_mdigest
+					(&pg, i, &digest);
+				if (has_digest) {
+					printf(" md=%s",
+					       mdigest_to_external_hex
+					       (&digest));
+					if (obj->idx.has_digest) {
+						if (mdigest_tst
+						    (&digest,
+						     &(obj->idx.digest))) {
+							printf
+							(" (should be %s) ",
+							 mdigest_to_external_hex
+							 (&(obj->idx.digest)));
+						}
+					}
+				} else {
+					printf(" <no md>      ");
+				}
+			} else {
+				printf(" <no md>      ");
+			}
+		}
 		if (is_delta_type(obj->type)) {
 			struct object_entry *bobj = &objects[obj->base_object_no];
 			printf(" %u %s", obj->delta_depth, sha1_to_hex(bobj->idx.sha1));
 		}
 		putchar('\n');
 	}
+	if (data) munmap(data, mds_size);
+	if (!stat)
+		return;
 
 	if (baseobjects)
 		printf("non delta: %d object%s\n",
@@ -1015,10 +1100,12 @@ static void show_pack_info(int stat_only)
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, stat = 0;
-	const char *curr_pack, *curr_index;
-	const char *index_name = NULL, *pack_name = NULL;
+	int show_mds = 0;
+	const char *curr_pack, *curr_index, *curr_mds;
+	const char *index_name = NULL, *pack_name = NULL, *mds_name = NULL;;
 	const char *keep_name = NULL, *keep_msg = NULL;
 	char *index_name_buf = NULL, *keep_name_buf = NULL;
+	char *mds_name_buf = NULL;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
@@ -1052,6 +1139,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				verify = 1;
 				stat = 1;
 				stat_only = 1;
+			} else if (!strcmp(arg, "-M") ||
+				   !strcmp(arg, "--show-mds")) {
+				verify = 1;
+				show_mds = 1;
 			} else if (!strcmp(arg, "--keep")) {
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
@@ -1075,6 +1166,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				if (index_name || (i+1) >= argc)
 					usage(index_pack_usage);
 				index_name = argv[++i];
+			} else if (!strcmp(arg, "-m")) {
+				if (mds_name || (i+1) >= argc)
+					usage(index_pack_usage);
+				mds_name = argv[++i];
 			} else if (!prefixcmp(arg, "--index-version=")) {
 				char *c;
 				opts.version = strtoul(arg + 16, &c, 10);
@@ -1108,6 +1203,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		strcpy(index_name_buf + len - 5, ".idx");
 		index_name = index_name_buf;
 	}
+	if (!mds_name && pack_name) {
+		int len = strlen(pack_name);
+		if (!has_extension(pack_name, ".pack"))
+			die("packfile name '%s' does not end with '.pack'",
+			    pack_name);
+		mds_name_buf = xmalloc(len);
+		memcpy(mds_name_buf, pack_name, len - 5);
+		strcpy(mds_name_buf + len - 5, ".mds");
+		mds_name = mds_name_buf;
+	}
 	if (keep_msg && !keep_name && pack_name) {
 		int len = strlen(pack_name);
 		if (!has_extension(pack_name, ".pack"))
@@ -1170,24 +1275,34 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		check_objects();
 
-	if (stat)
-		show_pack_info(stat_only);
+	if (stat || show_mds) {
+		int mds_file_exists = !access(mds_name, R_OK);
+		if (mds_file_exists && show_mds) {
+		  qsort (objects, nr_objects, sizeof (struct object_entry),
+			 sha1_compare);
+		}
+		show_pack_info(stat, stat_only, show_mds, mds_file_exists,
+			       mds_name);
+	}
 
 	idx_objects = xmalloc((nr_objects) * sizeof(struct pack_idx_entry *));
 	for (i = 0; i < nr_objects; i++)
 		idx_objects[i] = &objects[i].idx;
 	curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_sha1);
+	curr_mds = write_mds_file(mds_name, idx_objects, nr_objects, &opts,pack_sha1);
 	free(idx_objects);
 
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
+		      mds_name, curr_mds,
 		      keep_name, keep_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
 	free(objects);
 	free(index_name_buf);
+	free(mds_name_buf);
 	free(keep_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 96c1680..ccfe824 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -17,6 +17,7 @@
 #include "progress.h"
 #include "refs.h"
 #include "thread-utils.h"
+#include "mdsdb.h"
 
 static const char pack_usage[] =
   "git pack-objects [ -q | --progress | --all-progress ]\n"
@@ -560,6 +561,8 @@ static struct object_entry **compute_write_order(void)
 		objects[i].filled = 0;
 		objects[i].delta_child = NULL;
 		objects[i].delta_sibling = NULL;
+		objects[i].idx.has_digest = 0;
+		mdigest_clear(&objects[i].idx.digest);
 	}
 
 	/*
@@ -684,8 +687,28 @@ static void write_pack_file(void)
 
 		if (!pack_to_stdout) {
 			struct stat st;
+#if 1
 			char tmpname[PATH_MAX];
 
+#else
+			const char *idx_tmp_name;
+			const char *mds_tmp_name;
+			char tmpname[PATH_MAX];
+
+			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
+						      &pack_idx_opts, sha1);
+			mds_tmp_name = write_mds_file(NULL, written_list, nr_written,
+						      &pack_idx_opts, sha1);
+
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
+				 base_name, sha1_to_hex(sha1));
+			free_pack_by_name(tmpname);
+			if (adjust_shared_perm(pack_tmp_name))
+				die_errno("unable to make temporary pack file readable");
+			if (rename(pack_tmp_name, tmpname))
+				die_errno("unable to rename temporary pack file");
+#endif
+
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -707,6 +730,7 @@ static void write_pack_file(void)
 						tmpname, strerror(errno));
 			}
 
+#if 1
 			/* Enough space for "-<sha-1>.pack"? */
 			if (sizeof(tmpname) <= strlen(base_name) + 50)
 				die("pack base name '%s' too long", base_name);
@@ -714,6 +738,25 @@ static void write_pack_file(void)
 			finish_tmp_packfile(tmpname, pack_tmp_name,
 					    written_list, nr_written,
 					    &pack_idx_opts, sha1);
+#else
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.idx",
+				 base_name, sha1_to_hex(sha1));
+			if (adjust_shared_perm(idx_tmp_name))
+				die_errno("unable to make temporary index file readable");
+			if (rename(idx_tmp_name, tmpname))
+				die_errno("unable to rename temporary index file");
+
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.mds",
+				 base_name, sha1_to_hex(sha1));
+			if (adjust_shared_perm(mds_tmp_name))
+				die_errno("unable to make temporary mds file readable");
+			if (rename(mds_tmp_name, tmpname))
+				die_errno("unable to rename temporary mds file");
+
+
+			free((void *) idx_tmp_name);
+			free((void *) mds_tmp_name);
+#endif
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
@@ -830,6 +873,8 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 	off_t found_offset = 0;
 	int ix;
 	unsigned hash = name_hash(name);
+	int hasdigest;
+	mdigest_t digest;
 
 	ix = nr_objects ? locate_object_entry_hash(sha1) : -1;
 	if (ix >= 0) {
@@ -846,7 +891,11 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 		return 0;
 
 	for (p = packed_git; p; p = p->next) {
-		off_t offset = find_pack_entry_one(sha1, p);
+		off_t offset;
+		hasdigest = 0;
+		mdigest_clear(&digest);
+		offset = find_pack_entry_one_extended(sha1, p,
+						   &hasdigest, &digest);
 		if (offset) {
 			if (!found_pack) {
 				if (!is_pack_valid(p)) {
@@ -874,7 +923,37 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 
 	entry = objects + nr_objects++;
 	memset(entry, 0, sizeof(*entry));
+	if (hasdigest == 0) {
+		/*
+		 * We pick up MDs for local objects (we already checked the
+		 * pack files).  If that doesn't work, we compute it from
+		 * scratch (which should occur rarely if at all).
+		 */
+		mdsdb_open(NULL);
+		switch (mdsdb_lookup(NULL, sha1, &digest)) {
+		case 1:
+			hasdigest = 1;
+			break;
+		default:
+			hasdigest = 0;
+		}
+		mdsdb_close(NULL);
+		if (!hasdigest) {
+		  enum object_type type;
+		  unsigned long size;
+		  unsigned char sbuf[20];
+		  void *buf = read_sha1_file(sha1, &type, &size);
+		  if (buf) {
+			const char *stype = typename(type);
+			hash_sha1_file_extended(buf, size, stype, sbuf,
+						&digest);
+			hasdigest = 1;
+		  }
+		}
+	}
 	hashcpy(entry->idx.sha1, sha1);
+	entry->idx.has_digest = hasdigest;
+	entry->idx.digest = digest;
 	entry->hash = hash;
 	if (type)
 		entry->type = type;
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index f5c6afc..c09397c 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -6,6 +6,7 @@
 *
 */
 
+#include <unistd.h>
 #include "builtin.h"
 
 #define BLKSIZE 512
@@ -682,9 +683,16 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 	}
 	pl = red = pack_list_difference(local_packs, min);
 	while (pl) {
-		printf("%s\n%s\n",
-		       sha1_pack_index_name(pl->pack->sha1),
-		       pl->pack->pack_name);
+		char *mdsfile = sha1_pack_mds_name(pl->pack->sha1);
+		if (!access(mdsfile, F_OK)) {
+		  printf("%s\n%s\n%s\n", mdsfile,
+			       sha1_pack_index_name(pl->pack->sha1),
+			       pl->pack->pack_name);
+		} else {
+			printf("%s\n%s\n",
+			       sha1_pack_index_name(pl->pack->sha1),
+			       pl->pack->pack_name);
+		}
 		pl = pl->next;
 	}
 	if (verbose)
diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index f9463de..ec5dfe8 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -43,8 +43,11 @@ void prune_packed_objects(int opts)
 {
 	int i;
 	static char pathname[PATH_MAX];
+	static char mds_pathname[PATH_MAX];
 	const char *dir = get_object_directory();
+	const char *mdsdir = get_object_mds_directory();
 	int len = strlen(dir);
+	int mdslen = strlen(mdsdir);
 
 	if (opts == VERBOSE)
 		progress = start_progress_delay("Removing duplicate objects",
@@ -55,16 +58,26 @@ void prune_packed_objects(int opts)
 	memcpy(pathname, dir, len);
 	if (len && pathname[len-1] != '/')
 		pathname[len++] = '/';
+	memcpy(mds_pathname, mdsdir, mdslen);
+	if (mdslen && mds_pathname[mdslen-1] != '/')
+		mds_pathname[mdslen++] = '/';
 	for (i = 0; i < 256; i++) {
 		DIR *d;
+		DIR *mds_d;
 
 		display_progress(progress, i + 1);
 		sprintf(pathname + len, "%02x/", i);
 		d = opendir(pathname);
-		if (!d)
-			continue;
-		prune_dir(i, d, pathname, len + 3, opts);
-		closedir(d);
+		sprintf(mds_pathname + len, "%02x/", i);
+		mds_d = opendir(mds_pathname);
+		if (d) {
+			prune_dir(i, d, pathname, len + 3, opts);
+			closedir(d);
+		}
+		if (mds_d) {
+			prune_dir(i, mds_d, mds_pathname, mdslen + 3, opts);
+			closedir(mds_d);
+		}
 	}
 	stop_progress(&progress);
 }
diff --git a/builtin/prune.c b/builtin/prune.c
index 58d7cb8..25dde51 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -165,6 +165,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	mark_reachable_objects(&revs, 1, progress);
 	stop_progress(&progress);
 	prune_object_dir(get_object_directory());
+	prune_object_dir(get_object_mds_directory());
 
 	prune_packed_objects(show_only);
 	remove_temporary_files(get_object_directory());
diff --git a/builtin/verify-pack.c b/builtin/verify-pack.c
index e841b4a..b94a11e 100644
--- a/builtin/verify-pack.c
+++ b/builtin/verify-pack.c
@@ -5,14 +5,16 @@
 
 #define VERIFY_PACK_VERBOSE 01
 #define VERIFY_PACK_STAT_ONLY 02
+#define SHOW_MDS 04
 
 static int verify_one_pack(const char *path, unsigned int flags)
 {
 	struct child_process index_pack;
-	const char *argv[] = {"index-pack", NULL, NULL, NULL };
+	const char *argv[] = {"index-pack", NULL, NULL, NULL, NULL };
 	struct strbuf arg = STRBUF_INIT;
 	int verbose = flags & VERIFY_PACK_VERBOSE;
 	int stat_only = flags & VERIFY_PACK_STAT_ONLY;
+	int show_mds = ((flags & SHOW_MDS) != 0)  && !stat_only;
 	int err;
 
 	if (stat_only)
@@ -22,6 +24,8 @@ static int verify_one_pack(const char *path, unsigned int flags)
 	else
 		argv[1] = "--verify";
 
+	if (show_mds) argv[2] = "-M";
+
 	/*
 	 * In addition to "foo.pack" we accept "foo.idx" and "foo";
 	 * normalize these forms to "foo.pack" for "index-pack --verify".
@@ -31,7 +35,7 @@ static int verify_one_pack(const char *path, unsigned int flags)
 		strbuf_splice(&arg, arg.len - 3, 3, "pack", 4);
 	else if (!has_extension(arg.buf, ".pack"))
 		strbuf_add(&arg, ".pack", 5);
-	argv[2] = arg.buf;
+	argv[2 + show_mds] = arg.buf;
 
 	memset(&index_pack, 0, sizeof(index_pack));
 	index_pack.argv = argv;
@@ -46,6 +50,10 @@ static int verify_one_pack(const char *path, unsigned int flags)
 			if (!stat_only)
 				printf("%s: ok\n", arg.buf);
 		}
+	} else if (show_mds) {
+		printf("%s: listed (%s)\n-----------------\n", arg.buf,
+		       (err? "bad": "ok"));
+
 	}
 	strbuf_release(&arg);
 
@@ -67,6 +75,8 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
 			VERIFY_PACK_VERBOSE),
 		OPT_BIT('s', "stat-only", &flags, "show statistics only",
 			VERIFY_PACK_STAT_ONLY),
+		OPT_BIT('M', "show-mds", &flags, "show message digests / CRCs",
+			SHOW_MDS),
 		OPT_END()
 	};
 
diff --git a/fast-import.c b/fast-import.c
index 4b9c4b7..f672a76 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -165,6 +165,11 @@ Format of STDIN stream:
 #include "exec_cmd.h"
 #include "dir.h"
 
+#ifdef PACKDB
+#include "packdb.h"
+#endif
+
+
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
 #define DEPTH_BITS 13
@@ -558,6 +563,8 @@ static struct object_entry *new_object(unsigned char *sha1)
 
 	e = blocks->next_free++;
 	hashcpy(e->idx.sha1, sha1);
+	e->idx.has_digest = 0;
+	mdigest_clear(&e->idx.digest);
 	return e;
 }
 
@@ -904,9 +911,34 @@ static const char *create_index(void)
 	return tmpfile;
 }
 
-static char *keep_pack(const char *curr_index_name)
+static const char *create_mds(void)
+{
+	const char *tmpfile;
+	struct pack_idx_entry **mds, **c, **last;
+	struct object_entry *e;
+	struct object_entry_pool *o;
+
+	/* Build the table of object IDs. */
+	mds = xmalloc(object_count * sizeof(*mds));
+	c = mds;
+	for (o = blocks; o; o = o->next_pool)
+		for (e = o->next_free; e-- != o->entries;)
+			if (pack_id == e->pack_id)
+				*c++ = &e->idx;
+	last = mds + object_count;
+	if (c != last)
+		die("internal consistency error creating the mds file");
+
+	tmpfile = write_mds_file(NULL, mds, object_count, &pack_idx_opts, pack_data->sha1);
+	free(mds);
+	return tmpfile;
+}
+
+
+static char *keep_pack(const char *curr_index_name, const char *curr_mds_name)
 {
 	static char name[PATH_MAX];
+	static char rname[PATH_MAX];
 	static const char *keep_msg = "fast-import";
 	int keep_fd;
 
@@ -927,6 +959,13 @@ static char *keep_pack(const char *curr_index_name)
 	if (move_temp_to_file(curr_index_name, name))
 		die("cannot store index file");
 	free((void *)curr_index_name);
+
+	snprintf(rname, sizeof(rname), "%s/pack/pack-%s.mds",
+		 get_object_directory(), sha1_to_hex(pack_data->sha1));
+	if (move_temp_to_file(curr_mds_name, rname))
+		die("cannot store index file");
+	free((void *)curr_mds_name);
+
 	return name;
 }
 
@@ -951,6 +990,7 @@ static void end_packfile(void)
 	if (object_count) {
 		unsigned char cur_pack_sha1[20];
 		char *idx_name;
+		const char *n1, *n2;
 		int i;
 		struct branch *b;
 		struct tag *t;
@@ -961,7 +1001,9 @@ static void end_packfile(void)
 				    pack_data->pack_name, object_count,
 				    cur_pack_sha1, pack_size);
 		close(pack_data->pack_fd);
-		idx_name = keep_pack(create_index());
+		n1 = create_index();
+		n2 = create_mds();
+		idx_name = keep_pack(n1, n2);
 
 		/* Register the packfile with core git's machinery. */
 		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
@@ -1021,6 +1063,8 @@ static int store_object(
 	unsigned long hdrlen, deltalen;
 	git_SHA_CTX c;
 	git_zstream s;
+	mdigest_t digest;
+	mdigest_context_t mdc;
 
 	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
 		(unsigned long)dat->len) + 1;
@@ -1028,10 +1072,28 @@ static int store_object(
 	git_SHA1_Update(&c, hdr, hdrlen);
 	git_SHA1_Update(&c, dat->buf, dat->len);
 	git_SHA1_Final(sha1, &c);
+	mdigest_Init(&mdc, MDIGEST_DEFAULT);
+	mdigest_Update(&mdc, (unsigned char *)(dat->buf), dat->len);
+	mdigest_Final(&digest, &mdc);
+
+	if (has_sha1_file(sha1)) {
+		mdigest_t old_digest;
+		if (has_sha1_file_digest(sha1, &old_digest)) {
+		  if (mdigest_tst(&digest,&old_digest)) {
+				die("hash collision on %s [fast-import]",
+				    sha1_to_hex(sha1));
+			}
+		}
+	}
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
 	e = insert_object(sha1);
+	e->idx.has_digest = 1;
+	e->idx.digest = digest;
+#ifdef PACKDB
+	packdb_process(sha1, &digest);
+#endif
 	if (mark)
 		insert_mark(mark, e);
 	if (e->idx.offset) {
@@ -1157,6 +1219,8 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	unsigned char *out_buf = xmalloc(out_sz);
 	struct object_entry *e;
 	unsigned char sha1[20];
+	mdigest_t digest;
+	mdigest_context_t mdc;
 	unsigned long hdrlen;
 	off_t offset;
 	git_SHA_CTX c;
@@ -1177,6 +1241,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 		die("impossibly large object header");
 
 	git_SHA1_Init(&c);
+	mdigest_Init(&mdc, MDIGEST_DEFAULT);
 	git_SHA1_Update(&c, out_buf, hdrlen);
 
 	crc32_begin(pack_file);
@@ -1199,6 +1264,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 				die("EOF in data (%" PRIuMAX " bytes remaining)", len);
 
 			git_SHA1_Update(&c, in_buf, n);
+			mdigest_Update(&mdc, in_buf, n);
 			s.next_in = in_buf;
 			s.avail_in = n;
 			len -= n;
@@ -1225,11 +1291,14 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	}
 	git_deflate_end(&s);
 	git_SHA1_Final(sha1, &c);
+	mdigest_Final(&digest, &mdc);
 
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
 	e = insert_object(sha1);
+	e->idx.has_digest = 1;
+	e->idx.digest =  digest;
 
 	if (mark)
 		insert_mark(mark, e);
@@ -1828,6 +1897,8 @@ static void read_marks(void)
 			if (type < 0)
 				die("object not found: %s", sha1_to_hex(sha1));
 			e = insert_object(sha1);
+			e->idx.has_digest =
+				has_sha1_file_digest(sha1, &e->idx.digest);
 			e->type = type;
 			e->pack_id = MAX_PACK_ID;
 			e->idx.offset = 1; /* just not zero! */
@@ -2896,6 +2967,8 @@ static struct object_entry *dereference(struct object_entry *oe,
 			die("object not found: %s", sha1_to_hex(sha1));
 		/* cache it! */
 		oe = insert_object(sha1);
+		oe->idx.has_digest =
+			has_sha1_file_digest(sha1, &oe->idx.digest);
 		oe->type = type;
 		oe->pack_id = MAX_PACK_ID;
 		oe->idx.offset = 1;
diff --git a/git-repack.sh b/git-repack.sh
index 624feec..7602853 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -91,6 +91,7 @@ if [ -z "$names" ]; then
 	say Nothing new to pack.
 fi
 
+
 # Ok we have prepared all new packfiles.
 
 # First see if there are packs of the same name and if so
@@ -100,7 +101,7 @@ rollback=
 failed=
 for name in $names
 do
-	for sfx in pack idx
+	for sfx in pack idx mds
 	do
 		file=pack-$name.$sfx
 		test -f "$PACKDIR/$file" || continue
@@ -148,15 +149,22 @@ do
 	fullbases="$fullbases pack-$name"
 	chmod a-w "$PACKTMP-$name.pack"
 	chmod a-w "$PACKTMP-$name.idx"
+	(chmod a-w "$PACKTMP-$name.mds" 2>/dev/null || exit 0 )
 	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
 	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
 	exit
+	if test -f "$PACKTMP-$name.mds"
+	then
+		mv -f "$PACKTMP-$name.mds"  "$PACKDIR/pack-$name.mds" \
+		    2>/dev/null || exit
+	fi
 done
 
 # Remove the "old-" files
 for name in $names
 do
 	rm -f "$PACKDIR/old-pack-$name.idx"
+	rm -f "$PACKDIR/old-pack-$name.mds"
 	rm -f "$PACKDIR/old-pack-$name.pack"
 done
 
@@ -172,7 +180,7 @@ then
 		  do
 			case " $fullbases " in
 			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+			*)	rm -f "$e.pack" "$e.idx" "$e.mds" "$e.keep" ;;
 			esac
 		  done
 		)
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 602806d..1b72d46 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -54,7 +54,7 @@ cd "$TRASH/.git2"
 
 test_expect_success \
     'check unpack without delta' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f  -print) | grep -v mdsd | grep -v packdb |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -84,7 +84,7 @@ unset GIT_OBJECT_DIRECTORY
 cd "$TRASH/.git2"
 test_expect_success \
     'check unpack with REF_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f -print) | grep -v mdsd | grep -v packdb |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -114,7 +114,7 @@ unset GIT_OBJECT_DIRECTORY
 cd "$TRASH/.git2"
 test_expect_success \
     'check unpack with OFS_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f -print) | grep -v mdsd | grep -v packdb |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -211,6 +211,17 @@ test_expect_success \
 			test-3-${packname_3}.idx'
 
 test_expect_success \
+    'verify pack -v -M' \
+    'test -z "`git verify-pack -v -M test-1-${packname_1}.idx \
+			test-2-${packname_2}.idx \
+			test-3-${packname_3}.idx | grep \<no\ md\>`" &&
+     test 0 != `git verify-pack -v -M test-1-${packname_1}.idx | grep md= | wc -l` &&
+     test -z "`git verify-pack -v -M test-1-${packname_1}.idx | grep "should be"`" &&
+     (x=`git verify-pack -v -M test-1-${packname_1}.idx | wc -l`
+     y=`git verify-pack -v -M test-1-${packname_1}.idx |grep -v \<no\ md\> | wc -l`
+     test $x = $y)'
+
+test_expect_success \
     'verify-pack catches mismatched .idx and .pack files' \
     'cat test-1-${packname_1}.idx >test-3.idx &&
      cat test-2-${packname_2}.pack >test-3.pack &&
diff --git a/t/t5301-sliding-window.sh b/t/t5301-sliding-window.sh
index 2fc5af6..ec0d72f 100755
--- a/t/t5301-sliding-window.sh
+++ b/t/t5301-sliding-window.sh
@@ -22,13 +22,19 @@ test_expect_success \
      git repack -a -d &&
      test "`git count-objects`" = "0 objects, 0 kilobytes" &&
      pack1=`ls .git/objects/pack/*.pack` &&
-     test -f "$pack1"'
+     test -f "$pack1"  &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'verify-pack -v, defaults' \
     'git verify-pack -v "$pack1"'
 
 test_expect_success \
+    'verify-pack -v -M, defaults' \
+    'git verify-pack -v -M "$pack1" | grep "<no md>" > tmp
+     test -z "`cat tmp`"'
+
+test_expect_success \
     'verify-pack -v, packedGitWindowSize == 1 page' \
     'git config core.packedGitWindowSize 512 &&
      git verify-pack -v "$pack1"'
@@ -49,12 +55,14 @@ test_expect_success \
      test "`git count-objects`" = "0 objects, 0 kilobytes" &&
      pack2=`ls .git/objects/pack/*.pack` &&
      test -f "$pack2" &&
-     test "$pack1" \!= "$pack2"'
+     test "$pack1" \!= "$pack2" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'verify-pack -v, defaults' \
     'git config --unset core.packedGitWindowSize &&
      git config --unset core.packedGitLimit &&
-     git verify-pack -v "$pack2"'
+     git verify-pack -v "$pack2" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_done
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index f8fa924..da10200 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -37,12 +37,14 @@ test_expect_success \
 test_expect_success \
     'pack-objects with index version 1' \
     'pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
-     git verify-pack -v "test-1-${pack1}.pack"'
+     git verify-pack -v "test-1-${pack1}.pack" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'pack-objects with index version 2' \
     'pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
-     git verify-pack -v "test-2-${pack2}.pack"'
+     git verify-pack -v "test-2-${pack2}.pack" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'both packs should be identical' \
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index d645328..86075a7 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -37,7 +37,8 @@ test_expect_success 'prune stale packs' '
 	git prune --expire 1.day &&
 	test -f $orig_pack &&
 	test -f .git/objects/tmp_2.pack &&
-	! test -f .git/objects/tmp_1.pack
+	! test -f .git/objects/tmp_1.pack  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -50,7 +51,8 @@ test_expect_success 'prune --expire' '
 	test-chmtime =-86500 $BLOB_FILE &&
 	git prune --expire 1.day &&
 	test $before = $(git count-objects | sed "s/ .*//") &&
-	! test -f $BLOB_FILE
+	! test -f $BLOB_FILE  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -64,7 +66,8 @@ test_expect_success 'gc: implicit prune --expire' '
 	test-chmtime =-$((2*$week+1)) $BLOB_FILE &&
 	git gc &&
 	test $before = $(git count-objects | sed "s/ .*//") &&
-	! test -f $BLOB_FILE
+	! test -f $BLOB_FILE  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -78,8 +81,8 @@ test_expect_success 'gc: refuse to start with invalid gc.pruneExpire' '
 test_expect_success 'gc: start with ok gc.pruneExpire' '
 
 	git config gc.pruneExpire 2.days.ago &&
-	git gc
-
+	git gc &&
+	test -z "`git count-objects -v -M | grep MD`"
 '
 
 test_expect_success 'prune: prune nonsense parameters' '
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 438aaf6..63b4a13 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -109,6 +109,12 @@ test_expect_success \
 	'A: verify pack' \
 	'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
 
+test_expect_success \
+	'A: verify pack -v -M --- all objects have CRCs' \
+	'for p in .git/objects/pack/*.pack;
+	do git verify-pack -v -M $p | grep "<no md>" > tmp;
+	   test -z "`cat tmp`" || exit; done'
+
 cat >expect <<EOF
 author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -1504,7 +1510,7 @@ INPUT_END
 test_expect_success \
 	'O: blank lines not necessary after other commands' \
 	'git fast-import <input &&
-	 test 8 = `find .git/objects/pack -type f | wc -l` &&
+	 test 8 = `find .git/objects/pack -type f | grep -v .mds | wc -l` &&
 	 test `git rev-parse refs/tags/O3-2nd` = `git rev-parse O3^` &&
 	 git log --reverse --pretty=oneline O3 | sed s/^.*z// >actual &&
 	 test_cmp expect actual'
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/6] Add caching of message digests for objects.
From: Bill Zaumen @ 2011-12-21  7:10 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

Message digests are created when git objects are created.
The digests are stored in either their own files or in
an "mds" file that goes with a pack file's index file.
Most of the changes are in sha1_file.c, with a function
to create an mds file in pack-write.c. Macros in cache.h
allow the previous function calls to be used - some now
take a pointer to a digest as an argument. Hex.c was
modified to print message digests in hexadecimal, and
a test script was modified to account for a new directory
in objects.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 Makefile          |  121 ++++++++++++
 builtin/init-db.c |   17 ++
 cache.h           |   72 ++++++-
 environment.c     |   57 ++++++
 git.c             |   14 ++-
 hex.c             |  106 ++++++++++-
 pack-write.c      |  120 ++++++++++++
 pack.h            |    3 +
 sha1_file.c       |  560 +++++++++++++++++++++++++++++++++++++++++++++++------
 t/t0000-basic.sh  |   13 +-
 10 files changed, 1012 insertions(+), 71 deletions(-)

diff --git a/Makefile b/Makefile
index 9470a10..759df5c 100644
--- a/Makefile
+++ b/Makefile
@@ -278,6 +278,92 @@ all::
 # dependency rules.
 #
 # Define NATIVE_CRLF if your platform uses CRLF for line endings.
+#
+#
+# Set MDSDB to indicate the database type for the DB mapping SHA1
+# values to the MDs (Message Digests) of the objects git stores.
+# Valid values are:
+#
+#   0 for storing each local-object MD in its own file.
+#
+# [more to be added as needed - a legal value is mandatory].
+#
+# Note: the values for MDSDB are determined by preprocessor directives
+# defined in mdsdb.h This constant must be defined so that necessary
+# files are compiled.
+#
+MDSDB = 0
+
+#
+# Define MDIGEST_DEFAULT to set the default type of MD for authentication and
+# hash-collision detection.  Legal values
+# are:
+#      MDIGEST_CRC - use a CRC (used only as a minimal digest for performance
+#                    testing).
+#
+#     MDIGEST_SHA1 - use a SHA-1 digest.
+#
+#   MDIGEST_SHA256 - use a SHA-256 digest.
+#
+#   MDIGEST_SHA512 - use a SHA-512 digest.
+#
+# (additional ones may be added as needed.)
+#
+# Note: the message digests computed are for uncompressed objects, not
+# including the Git object-header.  If not set, a default defined in the
+# file mdigest.h will be used.
+#
+MDIGEST_DEFAULT = MDIGEST_SHA256
+
+#
+# Define PACKDB to use a GDBM-like database for storing message
+# digests compactly when those digests are not available using the
+# normal mechanisms.  As an example, if an alternate object database
+# is used and if it was created using an older version of git, message
+# digests may not be available, and git by design cannot modify an
+# alternate object database, so the message digests cannot be added to
+# it.  If PACKDB is not defined, at certain points (e.g., during a
+# commit, the digest for an object in an alternate object database
+# will be calculated each time.  When PACKDB is defined, the object's
+# digest is calculated once and stored in the packdb database.  GDBM
+# is too slow for use in general, but it is adequate for handling
+# unusual cases.
+#
+# Valid values are:
+#
+#                0 - use GDBM to implement the database.
+#    [not defined] - do nothing.
+#
+# [more can be added as needed].
+#
+PACKDB =
+
+# Define PACKDB_TEST in order to turn on an inefficient
+# test for PACKDB functions.  This code will add an entry to the packdb
+# database during commits when such an entry is not necessary and then
+# will read it back to make sure the data was added correctly. The option
+# has no effect if PACKDB is not defined.
+#
+# NOTE: this option should not be used in a released version of Git.
+#
+PACKDB_TEST =
+
+# Define COMMIT_DIGEST to include a 'digest' header in a commit. The header
+# will contain a 2-character code indicating the digest type, followed
+# immediately by the digest.  We are delaying turning this on by default
+# until the test scripts are updated, as the test scripts include explicit
+# file lengths and SHA-1 values.
+
+COMMIT_DIGEST =
+
+# Define COMMIT_DIGEST_TEST to force get_objects_mds to be called even if
+# COMMIT_DIGEST is not defined (in which case the digest header will not
+# appear in the commit object created).
+#
+# NOTE: this option should not be used in a released version of Git.
+#
+
+COMMIT_DIGEST_TEST =
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -536,7 +622,9 @@ LIB_H += blob.h
 LIB_H += builtin.h
 LIB_H += bulk-checkin.h
 LIB_H += cache.h
+LIB_H += mdigest.h
 LIB_H += cache-tree.h
+LIB_H += mdsdb.h
 LIB_H += color.h
 LIB_H += commit.h
 LIB_H += compat/bswap.h
@@ -711,6 +799,7 @@ LIB_OBJS += sequencer.o
 LIB_OBJS += sha1-array.o
 LIB_OBJS += sha1-lookup.o
 LIB_OBJS += sha1_file.o
+LIB_OBJS += mdigest.o
 LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
@@ -836,6 +925,7 @@ BUILTIN_OBJS += builtin/write-tree.o
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
 EXTLIBS =
 
+
 #
 # Platform specific tweaks
 #
@@ -1721,6 +1811,37 @@ ifeq ($(PYTHON_PATH),)
 NO_PYTHON=NoThanks
 endif
 
+ifdef MDSDB
+BASIC_CFLAGS += -DBLOB_MDS_CHECK
+endif
+
+ifdef COMMIT_DIGEST
+BASIC_CFLAGS += -DCOMMIT_DIGEST
+endif
+
+ifdef COMMIT_DIGEST_TEST
+BASIC_CFLAGS += -DCOMMIT_DIGEST_TEST
+endif
+
+ifdef MDIGEST_DEFAULT
+BASIC_CFLAGS += -DMDIGEST_DEFAULT=$(MDIGEST_DEFAULT)
+endif
+
+ifeq ($(MDSDB), 0)
+BASIC_CFLAGS += -DMDSDB=$(MDSDB)
+LIB_OBJS += objd-mdsdb.o
+endif
+
+ifeq ($(PACKDB), 0)
+BASIC_CFLAGS += -DPACKDB
+LIB_OBJS += gdbm-packdb.o
+EXTLIBS += -lgdbm
+endif
+
+ifdef PACKDB_TEST
+BASIC_CFLAGS += -DPACKDB_TEST
+endif
+
 QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
 QUIET_SUBDIR1  =
 
diff --git a/builtin/init-db.c b/builtin/init-db.c
index d07554c..6d5ec0f 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -7,6 +7,10 @@
 #include "builtin.h"
 #include "exec_cmd.h"
 #include "parse-options.h"
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 #ifndef DEFAULT_GIT_TEMPLATE_DIR
 #define DEFAULT_GIT_TEMPLATE_DIR "/usr/share/git-core/templates"
@@ -309,6 +313,19 @@ static void create_object_directory(void)
 	strcpy(path+len, "/info");
 	safe_create_dir(path, 1);
 
+#if (MDSDB == 0)
+	strcpy(path+len, "/mdsd");
+	safe_create_dir(path, 1);
+#endif
+	/*
+	 * In case the call in environent.c failed to initialize
+	 * (missing directory?) or somehow wasn't called at all.
+	 */
+	mdsdb_init();
+	mdigest_init();
+#ifdef PACKDB
+	packdb_init();
+#endif
 	free(path);
 }
 
diff --git a/cache.h b/cache.h
index 7d93df6..17e3dd4 100644
--- a/cache.h
+++ b/cache.h
@@ -16,6 +16,7 @@
 #define git_SHA1_Final	SHA1_Final
 #endif
 
+#include "mdigest.h"
 #include <zlib.h>
 typedef struct git_zstream {
 	z_stream z;
@@ -433,6 +434,10 @@ extern int is_inside_work_tree(void);
 extern int have_git_dir(void);
 extern const char *get_git_dir(void);
 extern char *get_object_directory(void);
+extern char *get_object_mds_directory(void);
+#ifdef PACKDB
+extern char *get_object_packdb_node(void);
+#endif
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
 extern int set_git_dir(const char *path);
@@ -541,8 +546,15 @@ extern int ce_path_match(const struct cache_entry *ce, const struct pathspec *pa
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
-extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
-extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
+
+#define index_fd(sha1,fd,st,type,path,flags)			\
+	index_fd_extended((sha1), NULL, (fd), (st), (type), (path), (flags))
+extern int index_fd_extended(unsigned char *sha1, mdigest_t *mdigestp, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
+
+#define index_path(sha1, path, st, flags) \
+	index_path_extended((sha1), NULL, (path), (st), (flags))
+extern int index_path_extended(unsigned char *sha1, mdigest_t *mdigestp, const char *path
+, struct stat *st, unsigned flags);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
 #define REFRESH_REALLY		0x0001	/* ignore_valid */
@@ -670,6 +682,7 @@ extern char *git_path_submodule(const char *path, const char *fmt, ...)
 extern char *sha1_file_name(const unsigned char *sha1);
 extern char *sha1_pack_name(const unsigned char *sha1);
 extern char *sha1_pack_index_name(const unsigned char *sha1);
+extern char *sha1_pack_mds_name(const unsigned char *sha1);
 extern const char *find_unique_abbrev(const unsigned char *sha1, int);
 extern const unsigned char null_sha1[20];
 
@@ -769,9 +782,18 @@ static inline const unsigned char *lookup_replace_object(const unsigned char *sh
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern int sha1_object_info(const unsigned char *, unsigned long *);
-extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
-extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
-extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
+
+#define hash_sha1_file(buf,len,type,sha1) \
+	hash_sha1_file_extended((buf), (len), (type), (sha1), NULL)
+extern int hash_sha1_file_extended(const void *buf, unsigned long len, const char *type, unsigned char *sha1, mdigest_t *mdigestp);
+
+#define write_sha1_file(buf,len,type,return_sha1) \
+	write_sha1_file_extended((buf), (len), (type), (return_sha1), NULL)
+extern int write_sha1_file_extended(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1, mdigest_t *mdigestp);
+
+#define pretend_sha1_file(buf,len,type,sha1) \
+	pretend_sha1_file_extended((buf), (len), (type), (sha1), NULL)
+extern int pretend_sha1_file_extended(void *, unsigned long, enum object_type, unsigned char *, mdigest_t *mdigestp);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
@@ -783,13 +805,18 @@ extern int do_check_packed_object_crc;
 /* for development: log offset of pack access */
 extern const char *log_pack_access;
 
-extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
+#define check_sha1_signature(sha1,buf,size,type) \
+	check_sha1_signature_extended((sha1), NULL, (buf), (size), (type))
+extern int check_sha1_signature_extended(const unsigned char *sha1, mdigest_t *mdigestp, void *buf, unsigned long size, const char *type);
 
 extern int move_temp_to_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
+extern int has_sha1_file_digest(const unsigned char *sha1, mdigest_t *mdigestp);
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
+extern int has_loose_object_nonlocal_digest(const unsigned char *sha1,
+					    mdigest_t *mdigestp);
 
 extern int has_pack_index(const unsigned char *sha1);
 
@@ -831,8 +858,19 @@ static inline int get_sha1_with_context(const char *str, unsigned char *sha1, st
  * null-terminated string.
  */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
-
+extern int get_sha1_hex_digest(const char *hex, unsigned char *sha1,
+			       int *has_digest, mdigest_t *digestp);
+/*
+ * get_mdigest_from_external_hex assumes hex is terminated by something that is
+ * not alphanumeric, so the string does not have to be null terminated.
+ */
+extern int get_mdigest_from_external_hex(mdigest_t *digestp, const char *hex);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern char *sha1_to_hex_digest(const unsigned char *sha1,
+				const mdigest_t *digestp); /* static buffer result! */
+
+extern int get_hex_field_size(char *hex);
+
 extern int read_ref_full(const char *filename, unsigned char *sha1,
 			 int reading, int *flags);
 extern int read_ref(const char *filename, unsigned char *sha1);
@@ -978,10 +1016,13 @@ extern struct packed_git {
 	off_t pack_size;
 	const void *index_data;
 	size_t index_size;
+	const void *mds_data;
+	size_t mds_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
 	unsigned char *bad_object_sha1;
 	int index_version;
+	int mds_version;
 	time_t mtime;
 	int pack_fd;
 	unsigned pack_local:1,
@@ -996,6 +1037,8 @@ struct pack_entry {
 	off_t offset;
 	unsigned char sha1[20];
 	struct packed_git *p;
+	int has_mdigest;
+	mdigest_t mdigest;
 };
 
 struct ref {
@@ -1050,6 +1093,11 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 
 extern void pack_report(void);
 extern int open_pack_index(struct packed_git *);
+extern int open_pack_mds(struct packed_git *p);
+extern int git_open_noatime(const char *name);
+extern size_t required_git_packed_mds_size(const char *path,
+					   void *data, uint32_t nobjects,
+					   size_t actual_size);
 extern void close_pack_index(struct packed_git *);
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
@@ -1058,8 +1106,16 @@ extern void free_pack_by_name(const char *);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *, int, int);
 extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
+extern int nth_packed_object_mdigest(const struct packed_git *p, uint32_t n,
+				      mdigest_t *mdigestp);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
-extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
+
+#define find_pack_entry_one(sha1,p) find_pack_entry_one_extended((sha1),(p), NULL, NULL)
+extern off_t find_pack_entry_one_extended(const unsigned char *,
+					  struct packed_git *,
+					  int *has_mdigestp,
+					  mdigest_t *mdigestp);
+
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
diff --git a/environment.c b/environment.c
index c93b8f4..5ced2e2 100644
--- a/environment.c
+++ b/environment.c
@@ -10,6 +10,10 @@
 #include "cache.h"
 #include "refs.h"
 #include "fmt-merge-msg.h"
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 char git_default_email[MAX_GITNAME];
 char git_default_name[MAX_GITNAME];
@@ -77,6 +81,10 @@ static size_t namespace_len;
 static const char *git_dir;
 static char *git_object_dir, *git_index_file, *git_graft_file;
 
+static char *git_object_mds_dir;
+#ifdef PACKDB
+static char *git_object_packdb_node;
+#endif
 /*
  * Repository-local GIT_* environment variables
  * Remember to update local_repo_env_size in cache.h when
@@ -118,6 +126,11 @@ static char *expand_namespace(const char *raw_namespace)
 
 static void setup_git_env(void)
 {
+	static char cwdbuf[PATH_MAX];
+	int ocn_len;
+#ifdef PACKDB
+	int opn_len;
+#endif
 	git_dir = getenv(GIT_DIR_ENVIRONMENT);
 	git_dir = git_dir ? xstrdup(git_dir) : NULL;
 	if (!git_dir) {
@@ -131,6 +144,31 @@ static void setup_git_env(void)
 		git_object_dir = xmalloc(strlen(git_dir) + 9);
 		sprintf(git_object_dir, "%s/objects", git_dir);
 	}
+	ocn_len = strlen(git_object_dir) + 8 + strlen(getcwd(cwdbuf, PATH_MAX));
+	git_object_mds_dir = xmalloc(ocn_len);
+	memset(git_object_mds_dir, 0, ocn_len);
+	sprintf(git_object_mds_dir, "%s/mdsd", git_object_dir);
+	if (git_object_mds_dir[0] != '/') {
+		int ocn_offset = (git_object_mds_dir[0] == '.' &&
+				  git_object_mds_dir[1] == '/')? 2:0;
+		memset(git_object_mds_dir, 0, ocn_len);
+		sprintf(git_object_mds_dir, "%s/%s/mdsd",
+			getcwd(cwdbuf, PATH_MAX), git_object_dir + ocn_offset);
+	}
+#ifdef PACKDB
+	opn_len = strlen(git_object_dir)
+		+ 10 + strlen(getcwd(cwdbuf, PATH_MAX));
+	git_object_packdb_node = xmalloc(opn_len);
+	memset(git_object_packdb_node, 0, opn_len);
+	sprintf(git_object_packdb_node, "%s/packdb", git_object_dir);
+	if (git_object_packdb_node[0] != '/') {
+		int opn_offset = (git_object_mds_dir[0] == '.' &&
+				  git_object_mds_dir[1] == '/')? 2:0;
+		memset(git_object_packdb_node, 0, opn_len);
+		sprintf(git_object_packdb_node, "%s/%s/packdb",
+			getcwd(cwdbuf, PATH_MAX), git_object_dir + opn_offset);
+	}
+#endif
 	git_index_file = getenv(INDEX_ENVIRONMENT);
 	if (!git_index_file) {
 		git_index_file = xmalloc(strlen(git_dir) + 7);
@@ -143,6 +181,11 @@ static void setup_git_env(void)
 		read_replace_refs = 0;
 	namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
 	namespace_len = strlen(namespace);
+	mdsdb_init();
+	mdigest_init();
+#ifdef PACKDB
+	packdb_init();
+#endif
 }
 
 int is_bare_repository(void)
@@ -210,6 +253,20 @@ char *get_object_directory(void)
 	return git_object_dir;
 }
 
+char *get_object_mds_directory(void) {
+	if (!git_object_mds_dir)
+		setup_git_env();
+	return git_object_mds_dir;
+}
+
+#ifdef PACKDB
+char *get_object_packdb_node(void) {
+	if (!git_object_packdb_node)
+		setup_git_env();
+	return git_object_packdb_node;
+}
+#endif
+
 int odb_mkstemp(char *template, size_t limit, const char *pattern)
 {
 	int fd;
diff --git a/git.c b/git.c
index fb9029c..f43328f 100644
--- a/git.c
+++ b/git.c
@@ -4,7 +4,10 @@
 #include "help.h"
 #include "quote.h"
 #include "run-command.h"
-
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 const char git_usage_string[] =
 	"git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]\n"
 	"           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]\n"
@@ -279,6 +282,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 	struct stat st;
 	const char *prefix;
 
+	static int mdsdb_need_atexit = 1;
+
+	if (mdsdb_need_atexit) {
+#ifdef PACKDB
+		atexit(packdb_finish);
+#endif
+		atexit(mdsdb_finish);
+		mdsdb_need_atexit = 0;
+	}
 	prefix = NULL;
 	help = argc == 2 && !strcmp(argv[1], "-h");
 	if (!help) {
diff --git a/hex.c b/hex.c
index 9ebc050..46c8b8b 100644
--- a/hex.c
+++ b/hex.c
@@ -1,3 +1,4 @@
+#include <ctype.h>
 #include "cache.h"
 
 const signed char hexval_table[256] = {
@@ -56,10 +57,55 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 	return 0;
 }
 
+int get_mdigest_from_external_hex(mdigest_t *digestp, const char *hex)
+{
+	int max = 0;
+	int wcode = 0, blen;
+	const char *ptr = hex;
+	char ch1, ch2;
+	unsigned char *out = digestp->buffer.buffer;
+
+	if (isalnum((ch1 = *(ptr++))) && isalnum((ch2 = *(ptr++)))) {
+		unsigned int val = (hexval(ch1) << 4) | hexval(ch2);
+		if (val & ~0xff)
+			return -1;
+		wcode = (int) val;
+	}
+	blen = get_mdigest_required_len(wcode);
+
+	while (isalnum((ch1 = *(ptr++))) && isalnum((ch2 = *(ptr++)))) {
+		unsigned int val = (hexval(ch1) << 4) | hexval(ch2);
+		if (val & ~0xff)
+			return -1;
+		*(out++) = val;
+		max += 2;
+	}
+	if (max != 2 * blen) return -1;
+	mdigest_load(digestp, wcode, NULL);
+	return max + 2;		/* add the 2 chars for wcode */
+}
+
+int get_sha1_hex_digest(const char *hex, unsigned char *sha1,
+			int *has_digest, mdigest_t *digestp)
+{
+	int result = get_sha1_hex(hex, sha1);
+	if (result) return result;
+	if (hex[40] == '-') {
+		int cnt = get_mdigest_from_external_hex(digestp, hex+41 );
+		*has_digest = (cnt > 0);
+		if (!*has_digest) return -1;
+	} else {
+		*has_digest = 0;
+		mdigest_clear(digestp);
+	}
+	return 0;
+}
+
+
 char *sha1_to_hex(const unsigned char *sha1)
 {
 	static int bufno;
-	static char hexbuffer[4][50];
+	static char hexbuffer[4][50 + 2 + (MAX_DIGEST_LENGTH * 4)];
 	static const char hex[] = "0123456789abcdef";
 	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
 	int i;
@@ -73,3 +119,61 @@ char *sha1_to_hex(const unsigned char *sha1)
 
 	return buffer;
 }
+
+char *mdigest_to_hex(const mdigest_t *digestp) {
+	static int bufno;
+	static char hexbuffer[4][(MAX_DIGEST_LENGTH *2) + 1];
+	static const char hex[] = "0123456789abcdef";
+	const unsigned char *inbuf = get_mdigest_buffer(digestp);
+	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
+	int i;
+	int len = get_mdigest_len(digestp);
+
+	for (i = 0; i < len; i++) {
+		unsigned int val = *inbuf++;
+		*buf++ = hex[val >> 4];
+		*buf++ = hex[val & 0xf];
+	}
+	*buf = '\0';
+
+	return buffer;
+
+}
+
+char *mdigest_to_external_hex(const mdigest_t *digestp) {
+	static int bufno;
+	static char hexbuffer[4][((MAX_DIGEST_LENGTH + 1) * 2) + 1];
+	static const char hex[] = "0123456789abcdef";
+	const unsigned char *inbuf = get_mdigest_buffer(digestp);
+	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
+	int i;
+	int len = get_mdigest_len(digestp);
+	int wcode = get_mdigest_wcode(digestp);
+	unsigned int wval = wcode & 0xff;
+	*buf++ = hex[wval >> 4];
+	*buf++ = hex[wval & 0xf];
+	for (i = 0; i < len; i++) {
+		unsigned int val = *inbuf++;
+		*buf++ = hex[val >> 4];
+		*buf++ = hex[val & 0xf];
+	}
+	*buf = '\0';
+
+	return buffer;
+
+}
+
+char *sha1_to_hex_digest(const unsigned char *sha1, const mdigest_t *digestp)
+{
+	char *result = sha1_to_hex(sha1);
+	sprintf(result+40, "-%s", mdigest_to_external_hex(digestp));
+	return result;
+}
+
+int get_hex_field_size(char *hex) {
+	int tmp;
+	if (!isalnum(hex[0]) || !isalnum(hex[1])) return -1;
+	unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+	tmp = get_mdigest_required_len((int) (val & 0xff));
+	return (tmp < 0)? tmp: 2 * (tmp + 1);
+}
diff --git a/pack-write.c b/pack-write.c
index de2bd01..fe461a5 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -194,6 +194,117 @@ off_t write_pack_header(struct sha1file *f, uint32_t nr_entries)
 	return sizeof(hdr);
 }
 
+const char *write_mds_file(const char *crc_name,
+			   struct pack_idx_entry **objects,
+			   int nr,
+			   const struct pack_idx_option *opts,
+			   unsigned char *sha1)
+{
+	static unsigned char buffer[4 + 4 * MAX_DIGEST_LENGTH];
+	unsigned char *base = buffer;
+	int i, j, fd;
+	struct sha1file *f;
+	int wsize = get_mdigest_wsize_by_type(MDIGEST_DEFAULT);
+	int wbsize = wsize * 4;
+
+	for (i = 0; i < nr; i += 4) {
+		if (objects[i]->has_digest) {
+			int ws = get_mdigest_wsize(&(objects[i]->digest));
+			if (wsize < ws) wsize = ws;
+		}
+	}
+	wbsize = wsize * 4;
+	if (nr) {
+		/*
+		 * Sort just in case objects not already sorted.
+		 */
+		qsort(objects, nr, sizeof(objects[0]), sha1_compare);
+	}
+
+	if (opts->flags & WRITE_IDX_VERIFY) {
+		assert(crc_name);
+		f = sha1fd_check(crc_name);
+		if (f == NULL) {
+			/*
+			 * For backwards-compatability, assume a missing
+			 * mds file is OK.
+			 */
+			return crc_name;
+		}
+	} else {
+		if (!crc_name) {
+			static char tmpfile[PATH_MAX];
+			fd = odb_mkstemp(tmpfile, sizeof(tmpfile),
+					 "pack/tmp_mds_XXXXXX");
+			crc_name = xstrdup(tmpfile);
+		} else {
+			unlink(crc_name);
+			fd = open(crc_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+		}
+		if (fd < 0)
+			die_errno("unable to create '%s'", crc_name);
+		f = sha1fd(fd, crc_name);
+	}
+
+	*(base++) = 'P';
+	*(base++) = 'K';
+	*(base++) = 'M';
+	*(base++) = 'D';
+	*(base++) = 'S';
+	*(base++) = 0;
+	*(base++) = 1; /* version number */
+	*(base++) = (unsigned char) wsize; /* wcode */
+	sha1write(f, buffer, base - buffer);
+	base = buffer;
+
+	for (i = 0; i < nr; i += 4) {
+		int lim = ((nr-i) > 3)? 4: nr-i;
+		int has[4];
+		mdigest_t crc[4];
+		for (j = 0; j < lim; j++) {
+			if (objects[i+j]->has_digest) {
+				has[j] = get_mdigest_wcode
+					(&(objects[i+j]->digest));
+				crc[j] = objects[i+j]->digest;
+			} else {
+				has[j] =
+				  (has_sha1_file_digest(objects[i + j]->sha1,
+							&crc[j]) == 1);
+				if (has[j]) {
+					has[j] = get_mdigest_wcode(&crc[j]);
+				}
+			}
+		}
+		for (j = 0; j < 4; j++) {
+			if (j < lim) {
+				*(base)++ = has[j];
+			} else {
+				has[j] = 0;
+				mdigest_clear(&crc[j]);
+				*(base++) = 0;
+			}
+		}
+		for (j = 0; j < 4; j += 1) {
+			if (j < lim) {
+				if (has[j])
+					mdigest_to_buffer(base, &crc[j],
+							  wbsize);
+				else
+					memset(base, 0, wbsize);
+			} else {
+				memset(base, 0, wbsize);
+			}
+			base += wbsize;
+		}
+		sha1write(f, buffer, base - buffer);
+		base = buffer;
+	}
+	sha1write(f, sha1, 20);
+	sha1close(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
+			    ? CSUM_CLOSE : CSUM_FSYNC));
+	return crc_name;
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
@@ -351,6 +462,7 @@ void finish_tmp_packfile(char *name_buffer,
 			 unsigned char sha1[])
 {
 	const char *idx_tmp_name;
+	const char *mds_tmp_name;
 	char *end_of_name_prefix = strrchr(name_buffer, 0);
 
 	if (adjust_shared_perm(pack_tmp_name))
@@ -358,8 +470,12 @@ void finish_tmp_packfile(char *name_buffer,
 
 	idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
 				      pack_idx_opts, sha1);
+	mds_tmp_name = write_mds_file(NULL, written_list, nr_written,
+				      pack_idx_opts, sha1);
 	if (adjust_shared_perm(idx_tmp_name))
 		die_errno("unable to make temporary index file readable");
+	if (adjust_shared_perm(mds_tmp_name))
+		die_errno("unable to make temporary index file readable");
 
 	sprintf(end_of_name_prefix, "%s.pack", sha1_to_hex(sha1));
 	free_pack_by_name(name_buffer);
@@ -370,6 +486,10 @@ void finish_tmp_packfile(char *name_buffer,
 	sprintf(end_of_name_prefix, "%s.idx", sha1_to_hex(sha1));
 	if (rename(idx_tmp_name, name_buffer))
 		die_errno("unable to rename temporary index file");
+	sprintf(end_of_name_prefix, "%s.mds", sha1_to_hex(sha1));
+	if (rename(mds_tmp_name, name_buffer))
+		die_errno("unable to rename temporary mds file");
 
 	free((void *)idx_tmp_name);
+	free((void *)mds_tmp_name);
 }
diff --git a/pack.h b/pack.h
index aa6ee7d..759d2f4 100644
--- a/pack.h
+++ b/pack.h
@@ -70,6 +70,8 @@ struct pack_idx_entry {
 	unsigned char sha1[20];
 	uint32_t crc32;
 	off_t offset;
+	int has_digest;
+	mdigest_t digest;
 };
 

@@ -77,6 +79,7 @@ struct progress;
 typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
+extern const char *write_mds_file(const char *mds_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, uint32_t);
diff --git a/sha1_file.c b/sha1_file.c
index f291f3f..e176d53 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -19,6 +19,10 @@
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
 #include "bulk-checkin.h"
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -41,6 +45,7 @@ const unsigned char null_sha1[20];
  */
 static struct cached_object {
 	unsigned char sha1[20];
+	unsigned char md_as_array[sizeof (mdigest_t)];
 	enum object_type type;
 	void *buf;
 	unsigned long size;
@@ -49,6 +54,7 @@ static int cached_object_nr, cached_object_alloc;
 
 static struct cached_object empty_tree = {
 	EMPTY_TREE_SHA1_BIN_LITERAL,
+	{0,},
 	OBJ_TREE,
 	"",
 	0
@@ -223,11 +229,18 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 	return sha1_get_pack_name(sha1, &name, &base, "idx");
 }
 
+char *sha1_pack_mds_name(const unsigned char *sha1)
+{
+	static char *name, *base;
+
+	return sha1_get_pack_name(sha1, &name, &base, "mds");
+}
+
+
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
 static void read_info_alternates(const char * alternates, int depth);
-static int git_open_noatime(const char *name);
 
 /*
  * Prepare alternate object database registry.
@@ -416,6 +429,7 @@ void prepare_alt_odb(void)
 	link_alt_odb_entries(alt, alt + strlen(alt), PATH_SEP, NULL, 0);
 
 	read_info_alternates(get_object_directory(), 0);
+	mdsdb_init_alts();
 }
 
 static int has_loose_object_local(const unsigned char *sha1)
@@ -442,6 +456,53 @@ static int has_loose_object(const unsigned char *sha1)
 	       has_loose_object_nonlocal(sha1);
 }
 
+static int has_loose_object_local_digest(const unsigned char *sha1,
+				      mdigest_t *digestp)
+{
+	int status;
+	mdsdb_open(NULL);
+	status = mdsdb_lookup(NULL, sha1, digestp) > 0;
+	mdsdb_close(NULL);
+	return status;
+}
+
+int has_loose_object_nonlocal_digest(const unsigned char *sha1,
+				  mdigest_t *digestp)
+{
+	struct alternate_object_database *alt;
+
+	if (digestp == NULL) return 0;
+	prepare_alt_odb();
+	for (alt = alt_odb_list; alt; alt = alt->next) {
+		fill_sha1_path(alt->name, sha1);
+		if (!access(alt->base, F_OK)) {
+			mdigest_t xdigest;
+			/* Use the crc corresponding to the hash */
+			mdsdb_t dbf;
+			int status;
+			dbf = mdsdb_open_alt(alt);
+			status = mdsdb_lookup(dbf, sha1,
+					      (digestp? digestp: &xdigest));
+			mdsdb_close(dbf);
+			switch (status) {
+			case 0: return 0;
+			case 1: return 1;
+			case -1:
+			default:
+				return 0;
+			}
+		}
+	}
+	return 0;
+}
+
+static int has_loose_object_digest(const unsigned char *sha1,
+				   mdigest_t *digestp)
+{
+	return has_loose_object_local_digest(sha1, digestp) ||
+	       has_loose_object_nonlocal_digest(sha1, digestp);
+}
+
 static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
@@ -575,6 +636,87 @@ static int check_packed_git_idx(const char *path,  struct packed_git *p)
 	return 0;
 }
 
+size_t required_git_packed_mds_size(const char *path, void *data,
+				    uint32_t nobjects,
+				    size_t actual_size) {
+	unsigned char *base;
+	int wsize, version;
+	size_t required_size;
+	if (actual_size < 8) {
+		error("mds file %s is too small", path);
+		return 0;
+	}
+
+	base = data;
+	if ((*(base++) != 'P')
+	    || (*(base++) != 'K')
+	    || (*(base++) != 'M')
+	    || (*(base++) != 'D')
+	    || (*(base++) != 'S')
+	    || (*(base++) != 0)) {
+		error("mds file %s corrupted (bad header)",
+			     path);
+		return 0;
+
+	}
+	if ((version = *(base++)) != 1) {
+		error("mds file %s uses an unrecognized version %d",
+		      path, version);
+		return 0;
+	}
+	wsize = (*(base++)) * 4;
+	if (wsize == 0) {
+		/* must be positive and a multiple of 4 */
+		error("mds file %s corrupted (bad wsize field)",
+			     path);
+		return 0;
+	}
+	required_size = (size_t)8 +
+	  ((size_t)((nobjects)/4 + (nobjects % 4 != 0))
+	   * (size_t)(4 * (1 + wsize))) + (size_t)(20 * 2);
+	if (required_size != actual_size) {
+		error("mds file %s not the right size: %ld != %ld",
+		      path, (long)actual_size, (long)required_size);
+		return 0;
+	}
+	return required_size;
+}
+
+static int check_packed_git_mds(const char *path, struct packed_git *p)
+{
+	void *mds_map;
+	size_t mds_size, required_size;
+	unsigned char *base;
+	int fd = git_open_noatime(path);
+	int version;
+	struct stat st;
+	if (fd < 0)
+		return -1;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+	mds_size = xsize_t(st.st_size);
+	if (mds_size < 8 + 20 + 20) {
+		close(fd);
+		return error("mds file %s is too small", path);
+	}
+	mds_map = xmmap(NULL, mds_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	base = mds_map;
+	required_size = required_git_packed_mds_size(path, mds_map,
+						     p->num_objects,
+						     mds_size);
+	if (required_size == 0) {
+		munmap(mds_map, mds_size);
+		return -1;
+	}
+	p->mds_data = mds_map;
+	p->mds_size = mds_size;
+	p->mds_version = version;
+	return 0;
+}
+
 int open_pack_index(struct packed_git *p)
 {
 	char *idx_name;
@@ -590,6 +732,20 @@ int open_pack_index(struct packed_git *p)
 	return ret;
 }
 
+int open_pack_mds(struct packed_git *p) {
+	char *mds_name;
+	int ret;
+
+	if (p->mds_data)
+		return 0;
+
+	mds_name = xstrdup(p->pack_name);
+	strcpy(mds_name + strlen(mds_name) - strlen(".pack"), ".mds");
+	ret = check_packed_git_mds(mds_name, p);
+	free(mds_name);
+	return ret;
+}
+
 static void scan_windows(struct packed_git *p,
 	struct packed_git **lru_p,
 	struct pack_window **lru_w,
@@ -691,6 +847,15 @@ void close_pack_index(struct packed_git *p)
 	if (p->index_data) {
 		munmap((void *)p->index_data, p->index_size);
 		p->index_data = NULL;
+		p->index_size = 0;
+	}
+}
+
+void close_pack_mds(struct packed_git *p) {
+	if (p->mds_data) {
+		munmap((void *)p->mds_data, p->mds_size);
+		p->mds_data = NULL;
+		p->mds_size = 0;
 	}
 }
 
@@ -718,6 +883,7 @@ void free_pack_by_name(const char *pack_name)
 				pack_open_fds--;
 			}
 			close_pack_index(p);
+			close_pack_mds(p);
 			free(p->bad_object_sha1);
 			*pp = p->next;
 			free(p);
@@ -741,6 +907,10 @@ static int open_packed_git_1(struct packed_git *p)
 
 	if (!p->index_data && open_pack_index(p))
 		return error("packfile %s index unavailable", p->pack_name);
+	/*
+	 * Assume an mds file might not be available - backwards compatibility
+	 */
+	if (!p->mds_data) open_pack_mds(p);
 
 	if (!pack_max_fds) {
 		struct rlimit lim;
@@ -1142,14 +1312,23 @@ static const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
 	return NULL;
 }
 
-int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long size, const char *type)
+int check_sha1_signature_extended(const unsigned char *sha1,
+				  mdigest_t *digestp,
+				  void *map, unsigned long size,
+				  const char *type)
 {
 	unsigned char real_sha1[20];
-	hash_sha1_file(map, size, type, real_sha1);
-	return hashcmp(sha1, real_sha1) ? -1 : 0;
+	mdigest_t rdigest;
+	hash_sha1_file_extended(map, size, type, real_sha1,
+				((digestp == NULL)? NULL: &rdigest));
+	int ret = hashcmp(sha1, real_sha1) ? -1 : 0;
+	if (digestp && ret == 0) {
+		ret = mdigest_tst(digestp, &rdigest);
+	}
+	return ret;
 }
 
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
 {
 	static int sha1_file_open_flag = O_NOATIME;
 
@@ -1926,15 +2105,48 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
 	}
 }
 
-off_t find_pack_entry_one(const unsigned char *sha1,
-				  struct packed_git *p)
+int nth_packed_object_mdigest(const struct packed_git *p, uint32_t n,
+			       mdigest_t *digestp)
+{
+	int r;
+	unsigned char *base = (unsigned char *)(p->mds_data);
+	int wsize; /*size in bytes per MDS field, stored as 32-bit words */
+	int wcode;
+
+	if (base == NULL) return 0;
+
+	base += 7;
+	wsize = (*(base++)) * 4;
+	if (wsize == 0) {
+		/* must be positive to store a digest */
+		return -1;
+	}
+	base += (n / 4) * (uint32_t)(4 * (1 + wsize));
+	r = n % 4;
+	wcode = base[r];
+	if (wcode == 0) return 0;
+	base += 4;
+	base += wsize * r;
+	mdigest_load(digestp, wcode, base);
+	return 1;
+}
+
+
+
+off_t find_pack_entry_one_extended(const unsigned char *sha1,
+				   struct packed_git *p,
+				   int *has_digestp, mdigest_t *digestp)
 {
 	const uint32_t *level1_ofs = p->index_data;
 	const unsigned char *index = p->index_data;
+	const unsigned char *mds = p->mds_data;
 	unsigned hi, lo, stride;
 	static int use_lookup = -1;
 	static int debug_lookup = -1;
 
+	if (has_digestp) *has_digestp = 0;
+	if (digestp) mdigest_clear(digestp);
+
 	if (debug_lookup < 0)
 		debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
 
@@ -1944,6 +2156,11 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		level1_ofs = p->index_data;
 		index = p->index_data;
 	}
+
+	if (!mds) {
+		open_pack_mds(p);
+	}
+
 	if (p->index_version > 1) {
 		level1_ofs += 2;
 		index += 8;
@@ -1979,8 +2196,14 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		if (debug_lookup)
 			printf("lo %u hi %u rg %u mi %u\n",
 			       lo, hi, hi - lo, mi);
-		if (!cmp)
+		if (!cmp) {
+			if (has_digestp && digestp)
+				*(has_digestp) =
+				  (nth_packed_object_mdigest(p,
+							     mi,
+							     digestp) == 1);
 			return nth_packed_object_offset(p, mi);
+		}
 		if (cmp > 0)
 			hi = mi;
 		else
@@ -2029,7 +2252,9 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
 					goto next;
 		}
 
-		offset = find_pack_entry_one(sha1, p);
+		offset = find_pack_entry_one_extended(sha1, p,
+						      &(e->has_mdigest),
+						      &(e->mdigest));
 		if (offset) {
 			/*
 			 * We are about to tell the caller where they can
@@ -2175,14 +2400,33 @@ static void *read_packed_sha1(const unsigned char *sha1,
 	return data;
 }
 
-int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
-		      unsigned char *sha1)
+int pretend_sha1_file_extended(void *buf, unsigned long len,
+			       enum object_type type,
+			       unsigned char *sha1, mdigest_t *digestp)
 {
-	struct cached_object *co;
+	struct cached_object *co = NULL;
+	mdigest_t dgst;
+	int has_dgst = 0;
 
-	hash_sha1_file(buf, len, typename(type), sha1);
-	if (has_sha1_file(sha1) || find_cached_object(sha1))
+	hash_sha1_file_extended(buf, len, typename(type), sha1, &dgst);
+	if (has_sha1_file(sha1) || (co = find_cached_object(sha1))) {
+		mdigest_t old_dgst;
+		if (!has_sha1_file_digest(sha1, &old_dgst)) {
+			if (co != NULL) {
+				memcpy(&old_dgst,co->md_as_array,
+				       sizeof (mdigest_t));
+				has_dgst = 1;
+			}
+		} else {
+			has_dgst = 1;
+		}
+		if (has_dgst && mdigest_tst(&old_dgst, &dgst)) {
+			  die("SHA1 COLLISION FOUND FOR %s "
+			      "(dummy commit when running blame?)",
+			      sha1_to_hex(sha1));
+		}
 		return 0;
+	}
 	if (cached_object_alloc <= cached_object_nr) {
 		cached_object_alloc = alloc_nr(cached_object_alloc);
 		cached_objects = xrealloc(cached_objects,
@@ -2193,8 +2437,10 @@ int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
 	co->size = len;
 	co->type = type;
 	co->buf = xmalloc(len);
+	memcpy(co->md_as_array, &dgst, sizeof (mdigest_t));
 	memcpy(co->buf, buf, len);
 	hashcpy(co->sha1, sha1);
+	if (digestp) *digestp = dgst;
 	return 0;
 }
 
@@ -2316,11 +2562,11 @@ void *read_object_with_reference(const unsigned char *sha1,
 }
 
 static void write_sha1_file_prepare(const void *buf, unsigned long len,
-                                    const char *type, unsigned char *sha1,
-                                    char *hdr, int *hdrlen)
+				    const char *type, unsigned char *sha1,
+				    mdigest_t *digestp,
+				    char *hdr, int *hdrlen)
 {
 	git_SHA_CTX c;
-
 	/* Generate the header */
 	*hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
 
@@ -2329,6 +2575,12 @@ static void write_sha1_file_prepare(const void *buf, unsigned long len,
 	git_SHA1_Update(&c, hdr, *hdrlen);
 	git_SHA1_Update(&c, buf, len);
 	git_SHA1_Final(sha1, &c);
+	if (digestp) {
+		mdigest_context_t mdc;
+		mdigest_Init(&mdc, MDIGEST_DEFAULT);
+		mdigest_Update(&mdc, buf, len);
+		mdigest_Final(digestp, &mdc);
+	}
 }
 
 /*
@@ -2384,12 +2636,13 @@ static int write_buffer(int fd, const void *buf, size_t len)
 	return 0;
 }
 
-int hash_sha1_file(const void *buf, unsigned long len, const char *type,
-                   unsigned char *sha1)
+int hash_sha1_file_extended(const void *buf, unsigned long len,
+			    const char *type,
+			    unsigned char *sha1, mdigest_t *digestp)
 {
 	char hdr[32];
 	int hdrlen;
-	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	write_sha1_file_prepare(buf, len, type, sha1, digestp, hdr, &hdrlen);
 	return 0;
 }
 
@@ -2443,10 +2696,14 @@ static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
 	return fd;
 }
 
-static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
+
+static int write_loose_object(const unsigned char *sha1, mdigest_t *digestp,
+			      char *hdr, int hdrlen,
 			      const void *buf, unsigned long len, time_t mtime)
 {
 	int fd, ret;
+	mdigest_t digest;
+	mdigest_context_t mdc;
 	unsigned char compressed[4096];
 	git_zstream stream;
 	git_SHA_CTX c;
@@ -2469,7 +2726,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	stream.next_out = compressed;
 	stream.avail_out = sizeof(compressed);
 	git_SHA1_Init(&c);
-
+	mdigest_Init(&mdc, MDIGEST_DEFAULT);
 	/* First header.. */
 	stream.next_in = (unsigned char *)hdr;
 	stream.avail_in = hdrlen;
@@ -2484,23 +2741,30 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 		unsigned char *in0 = stream.next_in;
 		ret = git_deflate(&stream, Z_FINISH);
 		git_SHA1_Update(&c, in0, stream.next_in - in0);
+		mdigest_Update(&mdc, in0, stream.next_in - in0);
 		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
 			die("unable to write sha1 file");
 		stream.next_out = compressed;
 		stream.avail_out = sizeof(compressed);
 	} while (ret == Z_OK);
+	mdigest_Final(&digest, &mdc);
 
 	if (ret != Z_STREAM_END)
-		die("unable to deflate new object %s (%d)", sha1_to_hex(sha1), ret);
+		die("unable to deflate new object %s (%d)",
+		    sha1_to_hex(sha1), ret);
 	ret = git_deflate_end_gently(&stream);
 	if (ret != Z_OK)
-		die("deflateEnd on object %s failed (%d)", sha1_to_hex(sha1), ret);
+		die("deflateEnd on object %s failed (%d)",
+		    sha1_to_hex(sha1), ret);
 	git_SHA1_Final(parano_sha1, &c);
 	if (hashcmp(sha1, parano_sha1) != 0)
-		die("confused by unstable object source data for %s", sha1_to_hex(sha1));
-
+		die("confused by unstable object source data for %s",
+		    sha1_to_hex(sha1));
+	if (digestp && mdigest_tst(digestp, &digest)) {
+		die("confused by unstable object source data "
+		    "(digest mismatch) for %s", sha1_to_hex(sha1));
+	}
 	close_sha1_file(fd);
-
 	if (mtime) {
 		struct utimbuf utb;
 		utb.actime = mtime;
@@ -2510,24 +2774,41 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 				tmpfile, strerror(errno));
 	}
 
-	return move_temp_to_file(tmpfile, filename);
+	ret = move_temp_to_file(tmpfile, filename);
+	if (ret == 0) {
+		mdsdb_open(NULL);
+		mdsdb_process((mdsdb_t)NULL, sha1, &digest);
+		mdsdb_close(NULL);
+	}
+	return ret;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
+int write_sha1_file_extended(const void *buf, unsigned long len,
+			     const char *type, unsigned char *returnsha1,
+			     mdigest_t *digestp)
 {
 	unsigned char sha1[20];
 	char hdr[32];
 	int hdrlen;
+	mdigest_t newdigest;
 
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	write_sha1_file_prepare(buf, len, type, sha1, &newdigest, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (has_sha1_file(sha1))
+	if (digestp) *digestp = newdigest;
+	if (has_sha1_file(sha1)) {
+		mdigest_t old_digest;
+		if (has_sha1_file_digest(sha1, &old_digest)) {
+			if (mdigest_tst(&newdigest, &old_digest)) {
+				die("hash collision");
+			}
+		}
 		return 0;
-	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
+	}
+	return write_loose_object(sha1, &newdigest, hdr, hdrlen, buf, len, 0);
 }
 
 int force_object_loose(const unsigned char *sha1, time_t mtime)
@@ -2538,6 +2819,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 	char hdr[32];
 	int hdrlen;
 	int ret;
+	mdigest_t * const digestp = NULL;
 
 	if (has_loose_object(sha1))
 		return 0;
@@ -2545,7 +2827,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 	if (!buf)
 		return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
 	hdrlen = sprintf(hdr, "%s %lu", typename(type), len) + 1;
-	ret = write_loose_object(sha1, hdr, hdrlen, buf, len, mtime);
+	ret = write_loose_object(sha1, digestp, hdr, hdrlen, buf, len, mtime);
 	free(buf);
 
 	return ret;
@@ -2574,6 +2856,85 @@ int has_sha1_file(const unsigned char *sha1)
 	return has_loose_object(sha1);
 }
 
+int has_sha1_file_digest(const unsigned char *sha1, mdigest_t *digestp)
+{
+	struct pack_entry e;
+	/*
+	 * builtin/send-pack.c uses a null SHA1 (all bytes zero) to
+	 * indicate that a SHA-1 hash does not exist.  We explicitly
+	 * return 0 for this case, for correct behavior even if we
+	 * somehow get that value into the database.
+	 */
+	if (!hashcmp(sha1, null_sha1)) return 0;
+	if (find_pack_entry(sha1, &e)) {
+		if (e.has_mdigest) {
+			if (digestp) *digestp = e.mdigest;
+			return 1;
+		} else {
+#ifdef PACKDB
+			if (e.p && e.p->pack_local) {
+				/*
+				 * We have a local pack file, but could not
+				 * find the CRC, so we first check if the
+				 * CRC is still stored for loose objects.
+				 * Then we try packdb (separate database for
+				 * packed objects) and if it is not there, we
+				 * compute it from scratch and add it to
+				 * packdb.
+				 */
+				if (has_loose_object_local_digest(sha1,
+							       digestp)) {
+					return 1;
+				} else {
+					int status ;
+					packdb_open();
+					status = (packdb_lookup(sha1,
+								digestp)
+						  == 1);
+					if (status == 0) {
+						unsigned long len;
+						enum object_type type;
+						mdigest_t digest;
+						mdigest_context_t mdc;
+						mdigest_Init(&mdc,
+							     MDIGEST_DEFAULT);
+						void *buf = read_sha1_file
+							(sha1, &type, &len);
+						mdigest_Update(&mdc, buf, len);
+						mdigest_Final(&digest, &mdc);
+						switch(packdb_process
+						       (sha1, &digest)) {
+						case 0:
+							if (digestp)
+								*digestp
+								 = digest;
+							status = 1;
+							break;
+						case 1:
+							error("packdb insert"
+							      " botched");
+							status = 0;
+							break;
+						case -1:
+							error("packdb failed");
+							status = 0;
+							break;
+						}
+					}
+					packdb_close();
+					return status;
+				}
+			} else {
+				return 0;
+			}
+#else
+			return has_loose_object_local_digest(sha1, digestp);
+#endif
+		}
+	}
+	return has_loose_object_digest(sha1, digestp);
+}
+
 static void check_tree(const void *buf, size_t size)
 {
 	struct tree_desc desc;
@@ -2602,7 +2963,8 @@ static void check_tag(const void *buf, size_t size)
 		die("corrupt tag");
 }
 
-static int index_mem(unsigned char *sha1, void *buf, size_t size,
+static int index_mem(unsigned char *sha1, mdigest_t *digestp,
+		     void *buf, size_t size,
 		     enum object_type type,
 		     const char *path, unsigned flags)
 {
@@ -2631,24 +2993,27 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 		if (type == OBJ_TAG)
 			check_tag(buf, size);
 	}
-
 	if (write_object)
-		ret = write_sha1_file(buf, size, typename(type), sha1);
+		ret = write_sha1_file_extended(buf, size, typename(type), sha1,
+					       digestp);
 	else
-		ret = hash_sha1_file(buf, size, typename(type), sha1);
+		ret = hash_sha1_file_extended(buf, size, typename(type), sha1,
+				     digestp);
 	if (re_allocated)
 		free(buf);
 	return ret;
 }
 
-static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
+static int index_pipe(unsigned char *sha1, mdigest_t *digestp,
+		      int fd, enum object_type type,
 		      const char *path, unsigned flags)
 {
 	struct strbuf sbuf = STRBUF_INIT;
 	int ret;
 
 	if (strbuf_read(&sbuf, fd, 4096) >= 0)
-		ret = index_mem(sha1, sbuf.buf, sbuf.len, type,	path, flags);
+		ret = index_mem(sha1, digestp, sbuf.buf, sbuf.len, type,
+				path, flags);
 	else
 		ret = -1;
 	strbuf_release(&sbuf);
@@ -2657,24 +3022,26 @@ static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
 
 #define SMALL_FILE_SIZE (32*1024)
 
-static int index_core(unsigned char *sha1, int fd, size_t size,
+static int index_core(unsigned char *sha1, mdigest_t *digestp,
+		      int fd, size_t size,
 		      enum object_type type, const char *path,
 		      unsigned flags)
 {
 	int ret;
 
 	if (!size) {
-		ret = index_mem(sha1, NULL, size, type, path, flags);
+		ret = index_mem(sha1, digestp, NULL, size, type, path, flags);
 	} else if (size <= SMALL_FILE_SIZE) {
 		char *buf = xmalloc(size);
 		if (size == read_in_full(fd, buf, size))
-			ret = index_mem(sha1, buf, size, type, path, flags);
+			ret = index_mem(sha1, digestp,
+					buf, size, type, path, flags);
 		else
 			ret = error("short read %s", strerror(errno));
 		free(buf);
 	} else {
 		void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-		ret = index_mem(sha1, buf, size, type, path, flags);
+		ret = index_mem(sha1, digestp, buf, size, type, path, flags);
 		munmap(buf, size);
 	}
 	return ret;
@@ -2692,41 +3059,122 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
  * avoid mmaping it in core is to deal with large binary blobs, and
  * by definition they do _not_ want to get any conversion.
  */
-static int index_stream(unsigned char *sha1, int fd, size_t size,
+static int index_stream(unsigned char *sha1, mdigest_t *digestp,
+			int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
-	return index_bulk_checkin(sha1, fd, size, type, path, flags);
+#if 1
+	int result = index_bulk_checkin(sha1, fd, size, type, path, flags);
+	if (digestp) {
+		if (result || !has_sha1_file_digest(sha1, digestp)) {
+			mdigest_clear(digestp);
+		}
+	}
+	return result;
+#else
+	struct child_process fast_import;
+	char export_marks[512];
+	const char *argv[] = { "fast-import", "--quiet", export_marks, NULL };
+	char tmpfile[512];
+	char fast_import_cmd[512];
+	char buf[512];
+	int len, tmpfd;
+
+	strcpy(tmpfile, git_path("hashstream_XXXXXX"));
+	tmpfd = git_mkstemp_mode(tmpfile, 0600);
+	if (tmpfd < 0)
+		die_errno("cannot create tempfile: %s", tmpfile);
+	if (close(tmpfd))
+		die_errno("cannot close tempfile: %s", tmpfile);
+	sprintf(export_marks, "--export-marks=%s", tmpfile);
+
+	memset(&fast_import, 0, sizeof(fast_import));
+	fast_import.in = -1;
+	fast_import.argv = argv;
+	fast_import.git_cmd = 1;
+	if (start_command(&fast_import))
+		die_errno("index-stream: git fast-import failed");
+
+	len = sprintf(fast_import_cmd, "blob\nmark :1\ndata %lu\n",
+		      (unsigned long) size);
+	write_or_whine(fast_import.in, fast_import_cmd, len,
+		       "index-stream: feeding fast-import");
+	while (size) {
+		char buf[10240];
+		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
+		ssize_t actual;
+
+		actual = read_in_full(fd, buf, sz);
+		if (actual < 0)
+			die_errno("index-stream: reading input");
+		if (write_in_full(fast_import.in, buf, actual) != actual)
+			die_errno("index-stream: feeding fast-import");
+		size -= actual;
+	}
+	if (close(fast_import.in))
+		die_errno("index-stream: closing fast-import");
+	if (finish_command(&fast_import))
+		die_errno("index-stream: finishing fast-import");
+
+	tmpfd = open(tmpfile, O_RDONLY);
+	if (tmpfd < 0)
+		die_errno("index-stream: cannot open fast-import mark");
+	len = read(tmpfd, buf, sizeof(buf));
+	if (len < 0)
+		die_errno("index-stream: reading fast-import mark");
+	if (close(tmpfd) < 0)
+		die_errno("index-stream: closing fast-import mark");
+	if (unlink(tmpfile))
+		die_errno("index-stream: unlinking fast-import mark");
+	if (len != 44 ||
+	    memcmp(":1 ", buf, 3) ||
+	    get_sha1_hex(buf + 3, sha1))
+		die_errno("index-stream: unexpected fast-import mark: <%s>", buf);
+	/*
+	 * since we got a sha1 value from fast-import, an mds file was
+	 * created, so we can just look up the digest.  Just in case, we
+	 * clear the digest if the lookup failed.
+	 */
+	if (digestp) {
+		if (!has_sha1_file_digest(sha1, digestp)) {
+			mdigest_clear(digestp);
+		}
+	}
+	return 0;
+#endif
 }
 
-int index_fd(unsigned char *sha1, int fd, struct stat *st,
-	     enum object_type type, const char *path, unsigned flags)
+int index_fd_extended(unsigned char *sha1, mdigest_t *digestp,
+		      int fd, struct stat *st,
+		      enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
 	size_t size = xsize_t(st->st_size);
 
 	if (!S_ISREG(st->st_mode))
-		ret = index_pipe(sha1, fd, type, path, flags);
+		ret = index_pipe(sha1, digestp, fd, type, path, flags);
 	else if (size <= big_file_threshold || type != OBJ_BLOB)
-		ret = index_core(sha1, fd, size, type, path, flags);
+		ret = index_core(sha1, digestp, fd, size, type, path, flags);
 	else
-		ret = index_stream(sha1, fd, size, type, path, flags);
+		ret = index_stream(sha1, digestp,
+				   fd, size, type, path, flags);
 	close(fd);
 	return ret;
 }
 
-int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags)
+int index_path_extended(unsigned char *sha1, mdigest_t *digestp, const char *path, struct stat *st, unsigned flags)
 {
 	int fd;
 	struct strbuf sb = STRBUF_INIT;
-
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
 		fd = open(path, O_RDONLY);
 		if (fd < 0)
 			return error("open(\"%s\"): %s", path,
 				     strerror(errno));
-		if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+		if (index_fd_extended(sha1, digestp, fd, st,
+				      OBJ_BLOB, path, flags) < 0)
 			return error("%s: failed to insert into database",
 				     path);
 		break;
@@ -2737,8 +3185,10 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 			             errstr);
 		}
 		if (!(flags & HASH_WRITE_OBJECT))
-			hash_sha1_file(sb.buf, sb.len, blob_type, sha1);
-		else if (write_sha1_file(sb.buf, sb.len, blob_type, sha1))
+			hash_sha1_file_extended(sb.buf, sb.len, blob_type, sha1,
+				       digestp);
+		else if (write_sha1_file_extended(sb.buf, sb.len, blob_type,
+						  sha1, digestp))
 			return error("%s: failed to insert into database",
 				     path);
 		strbuf_release(&sb);
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index f4e8f43..53e1b7d 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -34,17 +34,18 @@ fi
 # git init has been done in an empty repository.
 # make sure it is empty.
 
-find .git/objects -type f -print >should-be-empty
+find .git/objects -type f -a  ! -name mdsd -a ! -name packdb -print >should-be-empty
 test_expect_success \
     '.git/objects should be empty after git init in an empty repo.' \
     'cmp -s /dev/null should-be-empty'
 
-# also it should have 2 subdirectories; no fan-out anymore, pack, and info.
-# 3 is counting "objects" itself
-find .git/objects -type d -print >full-of-directories
+# also it should have 3 subdirectories;
+# no fan-out anymore, pack, and info and mdsd.
+# 4 (listed by find) is the result of counting "objects" as well.
+find .git/objects \( -type d -o -name mdsd  \) -print >full-of-directories
 test_expect_success \
-    '.git/objects should have 3 subdirectories.' \
-    'test $(wc -l < full-of-directories) = 3'
+    '.git/objects should have 3 subdirectories or files.' \
+    'test $(wc -l < full-of-directories) = 4'
 
 ################################################################
 # Test harness
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/6] Add the mdigest, mdsdb, and packdb modules.
From: Bill Zaumen @ 2011-12-21  7:08 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

* The mdigest module manipulates and creates message digests. The
files are mdigest.h and mdigest.c. These can be modified to
include additional digests.

* The mdsdb module stores message digests for loose objects. The
files are mdsdb.h and objd-mdsdb.c. The variable MDSDB in the
Makefile selects the implementation (currently only objd-mdsdb.c, which
keeps each digest in its own file).

* The packdb module stores digests in a space-efficient form. It is
intended for unusual conditions (e.g., caching a digest for an object
in an alternative object database that does not support digests). The
implementation provided uses GDBM, however it is easy to add alternative
implementation.  The choice of implementations is determined by the
value of the PACKDB variable in the Makefile (if undefined, packdb is
not used).

These modules are mostly self-contained - there is little interaction
with the rest of git beyond calling functions such as "die" for a few
error conditions.  Documentation for the functions in these modules
is in the header files.

The Makefile changes will be in a subsequent commit.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 gdbm-packdb.c |  249 +++++++++++++++++++++++++++++++++++++++++
 mdigest.c     |  221 +++++++++++++++++++++++++++++++++++++
 mdigest.h     |  334 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mdsdb.h       |  192 ++++++++++++++++++++++++++++++++
 objd-mdsdb.c  |  340 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 packdb.h      |   93 ++++++++++++++++
 6 files changed, 1429 insertions(+), 0 deletions(-)
 create mode 100644 gdbm-packdb.c
 create mode 100644 mdigest.c
 create mode 100644 mdigest.h
 create mode 100644 mdsdb.h
 create mode 100644 objd-mdsdb.c
 create mode 100644 packdb.h

diff --git a/gdbm-packdb.c b/gdbm-packdb.c
new file mode 100644
index 0000000..6443ec5
--- /dev/null
+++ b/gdbm-packdb.c
@@ -0,0 +1,249 @@
+#include<sys/types.h>
+#include<sys/stat.h>
+#include <sys/param.h>
+#include<stdio.h>
+#include<string.h>
+#include<malloc.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <time.h>
+#include <pthread.h>
+#include <errno.h>
+#include <gdbm.h>
+
+#include "cache.h"
+#include "packdb.h"
+
+static void nsleep() {
+#if _POSIX_C_SOURCE >= 199309L
+	struct timespec ts;
+	ts.tv_sec = 0;
+	ts.tv_nsec = 100000;
+	nanosleep(&ts, NULL);
+#else
+	sleep(1);
+#endif
+}
+
+static int initialized = 0;
+
+static GDBM_FILE dbf = NULL;
+char *dbf_name;
+static int dbf_depth = 0;
+
+pthread_mutex_t gdbm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void packdb_close_nolock(void);
+
+void packdb_init(void) {
+	char *last;
+	pthread_mutex_lock(&gdbm_mutex);
+	if (initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return;
+	}
+	dbf_name = get_object_packdb_node();
+	last = rindex(dbf_name, '/');
+	*last = 0;
+	if (!access(dbf_name, R_OK|W_OK|X_OK)) {
+		initialized = 1;
+	}
+	*last = '/';
+	pthread_mutex_unlock(&gdbm_mutex);
+}
+
+int packdb_initialized(void) {
+  return initialized;
+}
+
+static void packdb_open_nolock(void) {
+	if (dbf_depth == 0) {
+	AGAIN_W:
+		dbf = gdbm_open(dbf_name, 0, GDBM_WRCREAT, PERM_GROUP, NULL);
+		if (dbf == NULL && gdbm_errno == GDBM_CANT_BE_WRITER) {
+			nsleep();
+			goto AGAIN_W;
+		}
+	}
+	dbf_depth++;
+}
+
+void packdb_open(void) {
+	pthread_mutex_lock(&gdbm_mutex);
+	packdb_open_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+}
+
+
+int packdb_lookup(const unsigned char *sha1, mdigest_t *digestp) {
+	datum key;
+	datum ovalue;
+	pthread_mutex_lock(&gdbm_mutex);
+
+	if (!initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+
+	packdb_open_nolock();
+	if (dbf == NULL) {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	ovalue = gdbm_fetch(dbf, key);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+
+	if (ovalue.dptr == NULL) return 0;
+	if (digestp) {
+		int len;
+		int wsize = (int) *(unsigned char *)(ovalue.dptr);
+		unsigned char *buffer = (unsigned char *)(ovalue.dptr + 1);
+		len = get_mdigest_required_len(wsize);
+		if (len + 1 > ovalue.dsize)
+			die("existing db entry for %s corrupted [1], len = %d,"
+			    " ovalue.dsize = %d",
+			    sha1_to_hex(sha1), len, ovalue.dsize);
+		mdigest_load(digestp, wsize, buffer);
+	}
+	free(ovalue.dptr);
+	/* if (digestp) *digestp = (old_digest); */
+	return 1;
+}
+
+int packdb_remove(const unsigned char *sha1) {
+	datum key;
+	int result;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized)  || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+	packdb_open_nolock();
+	result = gdbm_delete(dbf, key);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+	return result;
+}
+
+
+int packdb_process(const unsigned char *sha1, mdigest_t *digestp) {
+	datum key;
+	datum nvalue;
+	datum ovalue;
+	mdigest_t newdigest = (*digestp);
+	mdigest_t old_digest;
+	newdigest.hdr.lhdr.wcode = get_mdigest_wcode(digestp);
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized) || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+
+	nvalue.dptr = (char *)&(newdigest.hdr.lhdr.wcode);
+	nvalue.dsize = 1 + get_mdigest_len(digestp);
+
+	packdb_open_nolock();
+	ovalue = gdbm_fetch(dbf, key);
+	if (dbf == dbf && ovalue.dptr == NULL) {
+		int status;
+		status = gdbm_store(dbf, key, nvalue, GDBM_INSERT);
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		switch (status) {
+		case 0:
+			return 0;
+		case -1:
+		  error("could not enter crc into database - key = %s",
+		      sha1_to_hex(sha1));
+		      return -1;
+		case 1:
+			return 1;
+		}
+		return -1;	/* should not occur */
+	} else if (ovalue.dptr == NULL) {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		return 0;
+	} else {
+		int wcode, len;
+		unsigned char *buffer;
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		wcode = (int) *(unsigned char *)ovalue.dptr;
+		len = get_mdigest_required_len(wcode);
+		if (len + 1 > ovalue.dsize)
+			die("existing db entry for %s corrupted [2]",
+			    sha1_to_hex(sha1));
+		buffer = (unsigned char *) ovalue.dptr + 1;
+		mdigest_load(&old_digest, wcode, buffer);
+		free(ovalue.dptr);
+		/*
+		 * Both old_digest and newdigest are in network byte order.
+		 */
+		if (mdigest_tst(&old_digest, digestp)) {
+			die("SHA1  COLLISION WHEN INSERTING OBJECT %s",
+			    sha1_to_hex(sha1));
+			return -1;
+		}
+		return 1;
+	}
+}
+
+int packdb_reorganize() {
+	int status;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized)  || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	packdb_open_nolock();
+	status = gdbm_reorganize(dbf);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+	return status;
+}
+
+
+static void packdb_close_nolock(void) {
+	  if (!initialized) {
+		return;
+	  }
+	  dbf_depth--;
+	  if (dbf_depth == 0 && dbf != NULL) {
+		gdbm_close(dbf);
+		dbf = NULL;
+	  }
+	  if (dbf_depth < 0) {
+		die("packdb dbf_depth %d < 0", dbf_depth);
+	  }
+	  return;
+}
+
+void packdb_close(void) {
+	  pthread_mutex_lock(&gdbm_mutex);
+	  packdb_close_nolock();
+	  pthread_mutex_unlock(&gdbm_mutex);
+}
+
+void packdb_finish(void) {
+	pthread_mutex_lock(&gdbm_mutex);
+	if (!initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return;
+	}
+	if (dbf != NULL) gdbm_close(dbf);
+	dbf = NULL;
+	dbf_depth = 0;
+	initialized = 0;
+	pthread_mutex_unlock(&gdbm_mutex);
+}
diff --git a/mdigest.c b/mdigest.c
new file mode 100644
index 0000000..94c72a2
--- /dev/null
+++ b/mdigest.c
@@ -0,0 +1,221 @@
+#include "mdigest.h"
+
+struct mdigest_config {
+	const enum mdigest_type mdt; /* table key - must match index */
+	const char *name;		    /* print name */
+	const int wcode;		    /* wsize code for MDS files*/
+	const int wsize;			    /* word size for MDS file entry */
+	const int blen;			    /* byte length of digest (used) */
+};
+
+static const struct mdigest_config mdigest_table[] = {
+	{MDIGEST_CRC, "CRC-32", MDIGEST_CRC_WCODE, 1, 4},
+	{MDIGEST_SHA1, "SHA-1", MDIGEST_SHA1_WCODE, 5, 20},
+	{MDIGEST_SHA256, "SHA-256", MDIGEST_SHA256_WCODE, 8, 32},
+	{MDIGEST_SHA512, "SHA-512", MDIGEST_SHA512_WCODE, 16, 64},
+};
+
+/*
+ * Table indexed by wcode;
+ */
+struct mdigest_config_aux {
+	int wcode;		    /* wcode code for MDS files*/
+	int wsize;		    /* word size for MDS file entry */
+	int blen;		    /* byte length of digest (used) */
+	enum mdigest_type mdt;	    /* message digest type */
+};
+
+static struct mdigest_config_aux *mdigest_aux_table;
+
+void mdigest_init(void) {
+	int i;
+	int n = sizeof(mdigest_table)
+	  / sizeof(struct mdigest_config);
+	int m = 0;
+
+	for (i = 0; i < n; i++) {
+		if (m < mdigest_table[i].wcode)
+			m = mdigest_table[i].wcode;
+	}
+	m += 1;			/* table size is one more than the max index */
+	mdigest_aux_table = (struct mdigest_config_aux *)
+	  xcalloc(m, sizeof (struct mdigest_config_aux));
+
+	for (i = 0; i < n; i++) {
+		int wc, ws, bl;
+		enum mdigest_type mdt;
+		wc = mdigest_table[i].wcode;
+		ws = mdigest_table[i].wsize;
+		bl = mdigest_table[i].blen;
+		mdt = mdigest_table[i].mdt;
+		
+		mdigest_aux_table[wc].wcode = wc;
+		mdigest_aux_table[wc].wsize = ws;
+		mdigest_aux_table[wc].blen = bl;
+		mdigest_aux_table[wc].mdt = mdt;
+	}
+}
+
+int get_mdigest_wsize(mdigest_t *mdigestp) {
+	return mdigest_table[mdigestp->hdr.info.mdt].wsize;
+}
+
+const char *get_mdigest_name(enum mdigest_type mdt)
+{
+	return mdigest_table[mdt].name;
+}
+
+int get_mdigest_wcode(const mdigest_t *digestp) {
+	return mdigest_table[digestp->hdr.info.mdt].wcode;
+}
+
+int get_mdigest_wcode_by_type(enum mdigest_type type) {
+	return mdigest_table[type].wcode;
+}
+
+int get_mdigest_wsize_by_type(enum mdigest_type type) {
+	return mdigest_table[type].wsize;
+}
+
+int get_mdigest_required_len_by_type(enum mdigest_type type) {
+	return mdigest_table[type].blen;
+}
+
+int mdigest_to_buffer(unsigned char *buffer, mdigest_t *digestp, int blen)
+{
+	int len = blen;
+	if (len > digestp->hdr.info.len) len = digestp->hdr.info.len;
+	memcpy(buffer, digestp->buffer.buffer, len);
+	memset(buffer + len, 0, blen - len);
+	return len;
+}
+
+
+int mdigest_tst(mdigest_t *md1p, mdigest_t *md2p)
+{
+	int i, n;
+	int n32;
+	unsigned char *x, *y;
+	uint32_t *x32, *y32;
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+	int n64;
+	uint64_t *x64, *y64;
+#endif	
+	if (md1p->hdr.info.mdt - md2p->hdr.info.mdt) return -1;
+	n = md1p->hdr.info.len;
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+	n64 = n/8;
+	x64 = (md1p->buffer.buffer64);
+	y64 = (md2p->buffer.buffer64);
+	for (i = 0; i < n64; i++) {
+		if (x64[i] != y64[i]) return -1;
+	}
+	i *= 2;
+#else
+	i = 0;
+#endif	
+	n32 = n/4;
+	if (i != n32) {
+		x32 = (md1p->buffer.buffer32);
+		y32 = (md2p->buffer.buffer32);
+		while (i < n32) {
+			if (x32[i] != y32[i]) return -1;
+			i++;
+		}
+		i *= 4;
+		if (i < n) {
+			x = (md1p->buffer.buffer);
+			y = (md2p->buffer.buffer);
+			while (i < n) {
+				if (x[i] != y[i]) return -1;
+				i++;
+			}
+		}
+	}
+	return 0;
+
+}
+
+void mdigest_Init(mdigest_context_t *ctx,
+			 enum mdigest_type mdt)
+{
+	ctx->mdt = mdt;
+	switch (mdt) {
+	case MDIGEST_CRC:
+		ctx->context.crc32 = crc32(0, NULL, 0);
+		break;
+	case MDIGEST_SHA1:
+		git_SHA1_Init(&ctx->context.sha1);
+		break;
+	case MDIGEST_SHA256:
+		EVP_MD_CTX_init(&ctx->context.evp);
+		EVP_DigestInit_ex(&ctx->context.evp, EVP_sha256(), NULL);
+		break;
+	case MDIGEST_SHA512:
+		EVP_MD_CTX_init(&ctx->context.evp);
+		EVP_DigestInit_ex(&ctx->context.evp, EVP_sha512(), NULL);
+		break;
+	}
+}
+
+void mdigest_Update(mdigest_context_t *ctx,
+			   const void *dataIn,
+			   unsigned long len)
+{
+	switch (ctx->mdt) {
+	case MDIGEST_CRC:
+		ctx->context.crc32 = crc32(ctx->context.crc32, dataIn, len);
+		break;
+	case MDIGEST_SHA1:
+		git_SHA1_Update(&(ctx->context.sha1), dataIn, len);
+		break;
+	case MDIGEST_SHA256:
+	case MDIGEST_SHA512:
+		EVP_DigestUpdate(&(ctx->context.evp), dataIn, len);
+		break;
+	}
+}
+
+void mdigest_Final(mdigest_t *digest,
+			  mdigest_context_t *ctx)
+{
+	enum mdigest_type mdt = ctx->mdt;
+	digest->hdr.info.mdt = mdt;
+	digest->hdr.info.len = mdigest_table[mdt].blen;
+	switch (mdt) {
+	case MDIGEST_CRC:
+		digest->buffer.crc32 = htonl(ctx->context.crc32);
+		break;
+	case MDIGEST_SHA1:
+		git_SHA1_Final(digest->buffer.buffer, &(ctx->context.sha1));
+		break;
+	case MDIGEST_SHA256:
+	case MDIGEST_SHA512:
+		EVP_DigestFinal_ex(&(ctx->context.evp),
+				  digest->buffer.buffer, NULL);
+		break;
+	}
+}
+
+int mdigest_load(mdigest_t *digestp, int wcode, unsigned char *buffer)
+{
+	int len;
+	if (mdigest_aux_table[wcode].wcode == 0) return -1;
+	digestp->hdr.info.mdt = mdigest_aux_table[wcode].mdt;
+	len = mdigest_aux_table[wcode].blen;
+	digestp->hdr.info.len = len;
+	/*
+	 * If buffer is NULL, we assume that one has already copied
+	 * the data into the buffer so we only had to set up the
+	 * other fields in the mdigest_t structure.
+	 */
+	if (buffer != NULL)
+		memcpy(digestp->buffer.buffer, buffer, len);
+	return 0;
+}
+
+int get_mdigest_required_len(int wcode) {
+	if (mdigest_aux_table[wcode].wcode == 0) return 0;
+	return mdigest_aux_table[wcode].blen;
+}
+
diff --git a/mdigest.h b/mdigest.h
new file mode 100644
index 0000000..8a1d713
--- /dev/null
+++ b/mdigest.h
@@ -0,0 +1,334 @@
+/*
+ * This file is included in cache.h, so the following is just in
+ * case cache.h is not included at all.
+ */
+#include "cache.h"
+/*
+ * Define here because cache.h needs some of the typedefs below.
+ */
+#ifndef MDIGEST_H
+#define MDIGEST_H
+#include <stdint.h>
+
+/**
+ * Enumeration to list supported message digests.
+ */
+enum mdigest_type {
+	MDIGEST_CRC,
+	MDIGEST_SHA1,
+	MDIGEST_SHA256,
+	MDIGEST_SHA512
+};
+
+/*
+ * Constants defining wcode values.  These are used in external
+ * representations of a digest to code the digest type.  The maximum
+ * number of digests supported is 255 (0 is reserved to indicate an
+ * unknown or uninitialized digest type).
+ */
+#define MDIGEST_CRC_WCODE	1
+#define	MDIGEST_SHA1_WCODE	5
+#define	MDIGEST_SHA256_WCODE	8
+#define	MDIGEST_SHA512_WCODE	16
+
+/*
+ * Standard digest.
+ */
+#ifndef MDIGEST_DEFAULT
+#define MDIGEST_DEFAULT MDIGEST_SHA256
+#endif
+
+#define MAX_DIGEST_LENGTH 64	/* set to maximum length we'll support */
+
+/**
+ * Message digest data structure.
+ * Holds a message digest along with some additional information.
+ * The hdr.info field specifies the digest type and the digest length in bytes.
+ * The hdr.lhdr.wcode field provides a code indicating the digest type
+ * as stored externally (this is set temporarily for a couple of
+ * operations and in general should not be used.
+ * The buffer union allows the buffer to be viewed as an unsigned, 32-bit
+ * integer, an array of unsigned characters, an array of 32-bit unsigned
+ * integers, or an array of 64-bit unsigned integers (on machines that
+ * support 64-bit unsigned integer arithmetic.
+ *
+ * In nearly all cases, one should use the access functions.
+ */
+typedef struct mdigest {
+	union {
+	  	struct mdigest_info {
+			enum mdigest_type mdt;
+			int len;
+		} info;
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+		uint64_t align64; /* to allow 64-bit operations on buffers */
+#define MDIGEST_SPACER_SIZE ((sizeof (struct mdigest_info) > 8)? \
+			     (sizeof (struct mdigest_info) - 1): 7)
+#else
+#define MDIGEST_SPACER_SIZE (sizeof (struct mdigest_info) - 1)
+#endif
+	  	/*
+		 * This is used destructively when a digest is
+		 * about to be written to disk (e.g., for the
+		 * loose object digests). The address of the
+		 * wcode member will provide a buffer prefaced
+		 * by a byte containing a wcode tag to indicate
+		 * the digest type.
+		 */
+		struct {
+			unsigned char spacer[MDIGEST_SPACER_SIZE];
+			unsigned char wcode;
+		} lhdr;		/* header for loose objects */
+	} hdr;
+	union {
+		uint32_t crc32;
+		unsigned char buffer[MAX_DIGEST_LENGTH];
+		uint32_t buffer32[MAX_DIGEST_LENGTH/4];
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+		uint64_t buffer64[MAX_DIGEST_LENGTH/8];
+#endif  
+	} buffer;
+} mdigest_t;
+
+/**
+ * Get the message digest type of a message digest.
+ * Arguments:
+ *     mdp - a pointer to the message digest.
+ *
+ * Returns:
+ *     the type of the message digest.
+ *
+ * Precoditions:
+ *     the message digest must have been created (by calling the function
+ *     mdigest_Final).
+ */
+static inline enum mdigest_type get_mdigest_type(const mdigest_t *mdp) {
+	return mdp->hdr.info.mdt;
+}
+
+/**
+ * Get the length of a message digest.
+ * Arguments:
+ *     mdp - a pointer to the message digest.
+ *
+ * Returns:
+ *    the length in bytes of the message digest
+ *
+ * Precoditions:
+ *     the message digest must have been created (by calling the function
+ *     mdigest_Final).
+ */
+static inline int get_mdigest_len(const mdigest_t *mdp) {
+	return mdp->hdr.info.len;
+}
+
+
+/**
+ * Get the buffer containing a message digest.
+ * Arguments:
+ *     mdp - a pointer to the message digest.
+ *
+ * Returns:
+ *    the buffer (unsigned char array) of the message digest.
+ *
+ * Precoditions:
+ *     the message digest must have been created (by calling the function
+ *     mdigest_Final).
+ */
+static inline const unsigned char *get_mdigest_buffer(const mdigest_t *mdp) {
+  return mdp->buffer.buffer;
+}
+
+/**
+ * Test to see if two message digests are identical.
+ *
+ * Arguments:
+ *       md1p - a pointer to the first digest
+ *       md2p - a pointer to the second digest
+ * Returns:
+ *   0 if the digests match; -1 otherwise
+ */
+extern int mdigest_tst(mdigest_t *md1p, mdigest_t *md2p);
+
+/**
+ * Get the print-name of a message digest.
+ * Arguments:
+ *   mdt  the message digest's type
+ * Returns:
+ *   The name of the digest, suitable for printing or displaying.
+ */
+extern const char *get_mdigest_name(enum mdigest_type mdt);
+
+
+/**
+ * Message digest context.
+ * This is data structure maintains the state of a message-digest
+ * computation.  Each field in the union specifies the context needed
+ * for a particular digest or set of digests.
+ */
+typedef struct  mdigest_context {
+	union {
+  		uint32_t crc32;		/* minimal digest for testing. */
+  		git_SHA_CTX sha1;	/* SHA-1 (git-internal impl) */
+		EVP_MD_CTX evp;		/* For openssl EVP digest functions */
+		/* Add the others later */
+	} context;
+	enum mdigest_type mdt;
+} mdigest_context_t;
+
+/**
+ * Initialize the mdigest module.
+ */
+extern void mdigest_init(void);
+
+/*
+ *  Modeled after Git SHA-1 API, which follows that used by openssl
+ */
+
+/**
+ * Initialize a message digest context.
+ * Arguments:
+ *   ctx - a pointer to the context to initialize
+ *   mdt - the type of the message digest.
+ */
+extern void mdigest_Init(mdigest_context_t *ctx,
+				enum mdigest_type mdt);
+
+
+/**
+ * Update a message digest context
+ * Arguments:
+ *      ctx - a pointer to the context to initialize
+ *   dataIn - the data to add to the digest
+ *      len - the length of dataIn
+ * 
+ */
+extern void mdigest_Update(mdigest_context_t *ctx,
+			   const void *dataIn,
+			   unsigned long len);
+
+/**
+ * Complete and provide a message digest.
+ * Arguments:
+ *   digest - the data structure that will store the message digest.
+ *      ctx - a pointer to the context to initialize
+ */
+extern void mdigest_Final(mdigest_t *digest, mdigest_context_t *ctx);
+
+/**
+ * Initialize a digest given external data representing the digest.
+ *
+ * Arguments:
+ *    digestp - a pointer to the message digest to initialize
+ *      wcode - the code (external representation) representing the 
+ *              message type
+ *     buffer - the digest itself, as a sequence of bytes.
+ *  
+ * Returns:
+ *    0 on success, -1 on error
+ */
+extern int mdigest_load(mdigest_t *digestp, int wcode, unsigned char *buffer);
+
+/**
+ * Get the required message-digest length.
+ * Arguments:
+ *  wcode - a code (external representation) indicating the type of digest.
+ * Returns:
+ *   The message-digest length in bytes; 0 for unrecognized codes
+ */
+extern int get_mdigest_required_len(int wcode);
+
+/**
+ * Get the message-digest code for a message digest.
+ * Arguments:
+ *      digestp - a pointer to the message digest
+ * Return:
+ *    the message digest code
+ */
+extern int get_mdigest_wcode(const mdigest_t *digestp);
+
+
+/**
+ * Get the message-digest code for a message digest type.
+ * Arguments:
+ *      type - the type of a message digest
+ * Return:
+ *    the message digest code
+ */
+extern int get_mdigest_wcode_by_type(enum mdigest_type type);
+
+/**
+ * Get the word size for a digest given the digest type
+ * Arguments:
+ *   type - the type of the message digest
+ * Returns:
+ *  the size in 32-bit words of a message digest of a given type.
+ *
+ */
+extern int get_mdigest_wsize_by_type(enum mdigest_type type);
+
+/**
+ * Get the word size for a digest
+ * Arguments:
+ *   digestp - a pointer to a messag digest.
+ * Returns:
+ *  the size in 32-bit words of a message digest of a given type.
+ */
+extern int get_mdigest_wsize(mdigest_t *digestp);
+
+/**
+ * Get the required message-digest length by type.
+ * Arguments:
+ *   type - the type of the message digest
+ * Returns:
+ *   The message-digest length in bytes
+ */
+extern int get_mdigest_required_len_by_type(enum mdigest_type type);
+
+/**
+ * Copy a digest to a buffer.
+ * If the buffer is longer than required, it will be padded with null
+ * bytes.  If it is shorter than required, only the number of bytes
+ * given by blen will be copied.
+ * Arguments:
+ *    buffer - the buffer to store the digest, represented as a
+ *             sequence of bytes
+ *   digestp - the digest
+ *      blen - the length of the buffer
+ * Returns:
+ *    the number of bytes from the digest copied into the buffer
+ */
+extern int mdigest_to_buffer(unsigned char *buffer, 
+			     mdigest_t *digestp, int blen);
+
+/**
+ * Get the hexadecimal representation of a message digest.
+ * Arguments:
+ *    digestp - a pointer to the digest
+ * Returns:
+ *   a sequence of hexadecimal characters containing the digest
+ */
+extern char *mdigest_to_hex(const mdigest_t *digestp);  /* static buffer result! */
+
+/**
+ * Get the tagged hexadecimal representation of a message digest.
+ * The first two bytes represents the wcode value giving the digest
+ * type.
+ * Arguments:
+ *    digestp - a pointer to the digest
+ * Returns:
+ *   a sequence of hexadecimal characters containing the digest, prefaced
+ *   by two hexadecimal digits representing the type of digest
+ */
+extern char *mdigest_to_external_hex(const mdigest_t *digestp);  /* static buffer result! */
+
+/**
+ * Clear a message digest
+ * Arguments:
+ *   digestp - a pointer to a messag digest.
+ */
+static inline void mdigest_clear(mdigest_t *digestp) {
+	memset(digestp, 0, sizeof(mdigest_t));
+}
+
+#endif /* MDIGEST_H */
diff --git a/mdsdb.h b/mdsdb.h
new file mode 100644
index 0000000..5502985
--- /dev/null
+++ b/mdsdb.h
@@ -0,0 +1,192 @@
+#ifndef MDSDB_H
+#define MDSDB_H
+
+/**
+ * MD (Message Digest) Database Support.
+ *
+ * This module maintains a database mapping SHA-1 object keys to MDs
+ * (Message Digests) for purposes of detecting hash collisions.  The
+ * MDs are stored in the database as a sequence of bytes, prefaced
+ * by a one-byte code giving the MD type).  The functions allow for
+ * initialization, queries, adding new entries (with a collision
+ * check), and managing access to alternate databases.  The entries
+ * in the database correspond to Git loose objects
+ *
+ * The preprocessor symbol MDSDB determines the implementation of the
+ * module.
+ * Values:
+ *   0 - implement using directories and files - the first byte of a
+ *       SHA1 hash determines a subdirectory of ../objects/mdsd, and
+ *       the remaining bytes determine the file name, with the names
+ *       consisting of the hexadecimal representation of each byte's
+ *       value. The files then contain  a one byte code that determines
+*        the type of the MD, followed by the MD itself as a sequence of
+*        bytes. A value of 1 implies that packdb will also be used when
+*        creating pa
+ */
+
+#include<stdint.h>
+
+#include "cache.h"
+
+#if (MDSDB == 0)
+/**
+ * Opaque data type - because the typedef is for a pointer, we
+ * don't need the structure defined in files that use the pointer.
+ * We do need it defined somewhere, in this case in the file
+ * objd-mdsdb.c, which is the only place the fields are used.
+ */
+typedef struct objd_mdsdb *mdsdb_t;
+#endif
+
+/**
+ *  Initialize the database.
+ *  This opens a database file in the objects directory named mdsd,
+ *  used to store MDS of objects (uncompressed, excluding the header)
+ *  for hash-collision detection.
+ */
+extern void mdsdb_init(void);
+
+/**
+ * Check if the database has been initialized.
+ * Returns:
+ *   1 if mdsdb_init has been called; false otherwise.
+ */
+extern int mdsdb_initialized(void);
+
+/**
+ * Initializes alternative databases by adding them to a table with
+ * these databases closed.
+ */
+extern void mdsdb_init_alts();
+
+
+/**
+ * Open a database file.
+ *
+ * The default database can be read or written. alternate database
+ * files are read-only databases.  Multiple calls without intervening
+ * calls to mdsdb_close for a given argument will result in the same
+ * object being returned each successive time.  The pathname must match
+ * one stored by a call to mdsdb_init_alts.
+ *
+ * Arguments:
+ *    pathname - the pathname of the file; NULL for the default db;
+ *
+ * Returns:
+ *    the database (NULL indicates the default)
+ */
+extern mdsdb_t mdsdb_open(char *pathname);
+
+/**
+ * Open a database file given an alterate object database pointer.
+ *
+ * The default database can be read or written. alternate database
+ * files are read-only databases.  Multiple calls without intervening
+ * calls to mdsdb_close for a given argument will result in the same
+ * object being returned each successive time The argument must match
+ * an alternate object database pointer stored by a precding call to
+ * mdsdb_init_alts.
+ *
+ * Arguments:
+ *    alt - an alternate object database pointer (which provides the
+ *          pathname).
+ *
+ * Returns:
+ *    the database (NULL indicates the default)
+ */
+extern mdsdb_t mdsdb_open_alt(struct alternate_object_database *alt);
+
+/**
+ * Lookup a MD from a database.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *    digestp - a pointer to a uint32_t to store the returned value when
+ *              an entry in the database exists.
+ *
+ * Returns:
+ *   0 if no entry, 1 if there is an existing entry.
+ */
+extern int mdsdb_lookup(mdsdb_t dbf, const unsigned char *sha1,
+			mdigest_t *digestp);
+
+/**
+ * Remove a MD from a database.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *
+ * Returns:
+ *   0 on success; -1 if the entry did not exist or if an entry
+ *   could not be deleted
+ */
+extern int mdsdb_remove(mdsdb_t dbf, const unsigned char *sha1);
+
+/**
+ * Process a MD for a SHA-1 key.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *    digestp - the crc to store.
+ *
+ * Returns:
+ *   0 if this is a new entry; 1 if it is an existing entry, -1 if
+ *   an entry cannot be added ot the database.
+ *
+ * Errors:
+ *   Will call 'die' and exit if there is a hash collision. Will call
+ *   'error' if the value cannot be entered.
+ */
+extern int mdsdb_process(mdsdb_t dbf, const unsigned char *sha1,
+			 mdigest_t *digestp);
+
+/**
+ * Reorganize a MD database.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ * Returns:
+ *   0 on success; -1 on failure
+ */
+extern int mdsdb_reorganize(mdsdb_t dbf);
+
+
+/**
+ * Close a  database file.
+ *
+ * If the same database was opened multiple times, a reference count is
+ * decremented and the the database will not be closed until the count
+ * reaches zero.  Calls to mdsdb_open or mdsdb_open_alt must be balanced
+ * by calls to mdsdb_close or mdsdb_close_alt.
+ *
+ * Arguments:
+ *        dbf - the MD database.
+ */
+extern void mdsdb_close(mdsdb_t dbf);
+
+/**
+ * Close a database file given an alternate object database pointer.
+ *
+ * If the same database was opened multiple times, a reference count is
+ * decremented and the the database will not be closed until the count
+ * reaches zero.  Calls to mdsdb_open or mdsdb_open_alt must be balanced
+ * by calls to mdsdb_close or mdsdb_close_alt.
+ *
+ * Arguments:
+ *       alt - a pointer ot an alternate object database
+ */
+extern void mdsdb_close_alt(struct alternate_object_database *alt);
+
+/**
+ * Shutdown the database files.
+ * This will shut down the default database and the cached alternative
+ * databases.  All others should be closed by calling crcb_alt_close
+ * explicitly
+ */
+extern void mdsdb_finish(void);
+
+#endif /*MDSDB_H */
diff --git a/objd-mdsdb.c b/objd-mdsdb.c
new file mode 100644
index 0000000..268c5c4
--- /dev/null
+++ b/objd-mdsdb.c
@@ -0,0 +1,340 @@
+#include<sys/types.h>
+#include "cache.h"
+#include "mdsdb.h"
+
+struct objd_mdsdb {
+  char *root;
+};
+
+static struct objd_mdsdb db;
+
+static mdsdb_t no_dbf = (mdsdb_t) 4;
+
+static mdsdb_t dbf = NULL;
+
+#define ALT_DBF_LIMIT  512
+
+
+struct alt_map {
+	struct objd_mdsdb db;
+	struct alternate_object_database *alt;
+	struct alt_map *refer;
+};
+
+struct alt_map alt_map[ALT_DBF_LIMIT];
+static int alt_in_use = 0;
+static int initialized = 0;
+
+
+void mdsdb_init(void) {
+	if (initialized) {
+		return;
+	}
+	dbf = &db;
+	db.root = get_object_mds_directory();
+	initialized = 1;
+}
+
+int mdsdb_initialized(void) {
+	return initialized;
+}
+
+static int setup_alt(struct alternate_object_database *alt, void *param) {
+	static char buffer[PATH_MAX];
+	int i;
+	int lim = alt->name - alt->base;
+	memcpy(buffer, alt->base, lim);
+	memcpy(buffer, alt->base, lim);
+	memcpy(buffer+lim, "mdsd", 4);
+	buffer[lim+4] = 0;
+	for (i = 0; i < alt_in_use; i++) {
+		if (alt_map[i].alt == alt) {
+			/* don't put in the same entry twice */
+			return 0;
+		}
+		if (strcmp(buffer, alt_map[i].db.root) == 0) {
+			break;
+		}
+	}
+	alt_map[alt_in_use].db.root = xstrdup(buffer);
+	alt_map[alt_in_use].alt = alt;
+	if (i < alt_in_use) {
+		alt_map[alt_in_use].refer = alt_map + i;
+	} else {
+		alt_map[alt_in_use].refer = NULL;
+	}
+	alt_in_use++;
+	return 0;
+}
+
+static int alt_initialized = 0;
+
+void mdsdb_init_alts(void){
+	if (alt_initialized) return;
+	foreach_alt_odb(setup_alt, NULL);
+	alt_initialized = 1;
+}
+
+
+mdsdb_t mdsdb_open(char *name) {
+	int i;
+	if (name == NULL) return NULL;
+	for (i = 0; i < alt_in_use; i++) {
+		if (strcmp(alt_map[i].db.root, name) == 0) {
+			if (alt_map[i].refer) {
+				i = (alt_map[i].refer - alt_map);
+			}
+			return (mdsdb_t)&(alt_map[i].db);
+		}
+	}
+	return no_dbf;
+}
+
+mdsdb_t mdsdb_open_alt(struct alternate_object_database *alt) {
+	int i;
+	for (i = 0; i < alt_in_use; i++) {
+		if (alt_map[i].alt == alt) {
+			return (mdsdb_t)&(alt_map[i].db);
+		}
+	}
+	return no_dbf;
+
+}
+/* copied from sha1_file.c */
+static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
+{
+	int i;
+	for (i = 0; i < 20; i++) {
+		static char hex[] = "0123456789abcdef";
+		unsigned int val = sha1[i];
+		char *pos = pathbuf + i*2 + (i > 0);
+		*pos++ = hex[val >> 4];
+		*pos = hex[val & 0xf];
+	}
+}
+
+/*
+ * Warning: returns a static buffer so be careful about threading.
+ */
+static char *crc32_file_name(const char *path, const unsigned char *sha1)
+{
+	static char buf[PATH_MAX];
+	const char *digestdir;
+	int len;
+
+	digestdir = path;
+	len = strlen(digestdir);
+
+	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
+	if (len + 43 > PATH_MAX)
+		die("insanely long object crc directory %s", digestdir);
+	memcpy(buf, digestdir, len);
+	buf[len] = '/';
+	buf[len+3] = '/';
+	buf[len+42] = '\0';
+	fill_sha1_path(buf + len + 1, sha1);
+	return buf;
+}
+
+static int mdsdb_lookup_aux(char *path, mdigest_t *digestp)
+{
+	if (!access(path, F_OK)) {
+		if (digestp) {
+			int fd = open(path, O_RDONLY);
+			int wcode, len;
+			unsigned char wsch;
+			unsigned char buffer[MAX_DIGEST_LENGTH];
+			if (fd < 0) {
+				return 0;
+			}
+			if (read_in_full(fd, &wsch, 1) != 1) {
+				close(fd);
+				return 0;
+			}
+			wcode = wsch;
+			len = get_mdigest_required_len(wcode);
+			if(read_in_full(fd, buffer, len)
+			   != len) {
+				close(fd);
+				return 0;
+			}
+			close(fd);
+			mdigest_load(digestp, wcode, buffer);
+		}
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+
+int mdsdb_lookup(mdsdb_t gdbf, const unsigned char *sha1, mdigest_t *digestp) {
+	char *path;
+
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) gdbf = dbf;
+
+	path = crc32_file_name(gdbf->root, sha1);
+	return mdsdb_lookup_aux(path, digestp);
+}
+
+int mdsdb_remove(mdsdb_t gdbf, const unsigned char *sha1) {
+	char *path;
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+
+	if (gdbf == NULL) {
+		gdbf = dbf;
+	} else {
+		return -1;
+	}
+	path = crc32_file_name(gdbf->root, sha1);
+	return unlink(path);
+}
+
+/* copied from sha1_file.c */
+/* Size of directory component, including the ending '/' */
+static inline int directory_size(const char *filename)
+{
+	const char *s = strrchr(filename, '/');
+	if (!s)
+		return 0;
+	return s - filename + 1;
+}
+
+
+/* copied from sha1_file.c */
+static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
+{
+	int fd, dirlen = directory_size(filename);
+
+	if (dirlen + 20 > bufsiz) {
+		errno = ENAMETOOLONG;
+		return -1;
+	}
+	memcpy(buffer, filename, dirlen);
+	strcpy(buffer + dirlen, "tmp_obj_XXXXXX");
+	fd = git_mkstemp_mode(buffer, 0444);
+	if (fd < 0 && dirlen && errno == ENOENT) {
+		/* Make sure the directory exists */
+		memcpy(buffer, filename, dirlen);
+		buffer[dirlen-1] = 0;
+		if (mkdir(buffer, 0777) || adjust_shared_perm(buffer))
+			return -1;
+
+		/* Try again */
+		strcpy(buffer + dirlen - 1, "/tmp_obj_XXXXXX");
+		fd = git_mkstemp_mode(buffer, 0444);
+	}
+	return fd;
+}
+
+/* copied from sha1_file.c */
+static int write_buffer(int fd, const void *buf, size_t len)
+{
+	if (write_in_full(fd, buf, len) < 0)
+		return error("file write error (%s)", strerror(errno));
+	return 0;
+}
+
+/* copied from sha1_file.c */
+/* Finalize a file on disk, and close it. */
+static void close_sha1_file(int fd)
+{
+	if (fsync_object_files)
+		fsync_or_die(fd, "sha1 file");
+	if (close(fd) != 0)
+		die_errno("error when closing sha1 file");
+}
+
+
+int mdsdb_process(mdsdb_t gdbf, const unsigned char *sha1,
+		  mdigest_t *digestp)
+{
+	mdigest_t old_digest;
+	int has_old_digest = 0;
+	char *path;
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) gdbf = dbf;
+	path = crc32_file_name(gdbf->root, sha1);
+	has_old_digest = mdsdb_lookup_aux(path, &old_digest);
+	if (gdbf == dbf && !has_old_digest) {
+		mdigest_t crc;
+		int len, wcode;
+		static char ctmpfile[PATH_MAX];
+		int fdc = create_tmpfile(ctmpfile, sizeof(ctmpfile), path);
+		if (fdc < 0) {
+		  return -1;
+		}
+		crc = *(digestp);
+		len = get_mdigest_len(digestp);
+		wcode = get_mdigest_wcode(digestp);
+		crc.hdr.lhdr.wcode = (unsigned char)wcode;
+		if (fdc >= 0 && write_buffer(fdc, &crc.hdr.lhdr.wcode,
+					     len + 1) < 0) {
+			close_sha1_file(fdc);
+			return -1;
+		}
+		if (fdc >= 0) {
+			close_sha1_file(fdc);
+			return (move_temp_to_file(ctmpfile, path) == 0)?
+				0: -1;
+		}
+		return -1;
+	} else if (has_old_digest) {
+	  if (mdigest_tst(&old_digest, digestp)) {
+			die("SHA1 COLLISION WHEN INSERTING OBJECT %s",
+			    sha1_to_hex(sha1));
+			return -1;
+		}
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+
+void mdsdb_close(mdsdb_t gdbf) {
+	return;
+}
+
+void mdsdb_close_alt(struct alternate_object_database *alt) {
+	return;
+}
+
+
+
+int mdsdb_reorganize(mdsdb_t gdbf) {
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) {
+		return 0;
+	} else {
+		return -1;
+	}
+}
+
+
+
+void mdsdb_finish(void) {
+	int i;
+	if (!initialized) {
+		return;
+	}
+	dbf->root = NULL;
+
+	for (i = 0; i < alt_in_use; i++) {
+		free(alt_map[i].db.root);
+		alt_map[i].db.root = NULL;
+	}
+	memset(alt_map, 0, sizeof(struct alt_map) *alt_in_use);
+	alt_in_use = 0;
+	initialized = 0;
+	alt_initialized = 0;
+}
diff --git a/packdb.h b/packdb.h
new file mode 100644
index 0000000..64c1d0a
--- /dev/null
+++ b/packdb.h
@@ -0,0 +1,93 @@
+#ifndef PACKDB_H
+#define PACKDB_H
+
+#include<stdint.h>
+#include "mdigest.h"
+
+/**
+ *  Initialize the database.
+ *  This opens a database file in the objects directory named mdsd,
+ *  used to store CRCS of objects (uncompressed, excluding the header)
+ *  for hash-collision detection.
+ */
+extern void packdb_init(void);
+
+/**
+ * Check if the database has been initialized.
+ * Returns:
+ *   1 if packdb_init has been called; false otherwise.
+ */
+extern int packdb_initialized(void);
+
+/**
+ * Open the persistent database to store a copy of obj CRCs in pack index files.
+ * Nested calls are allowed, but must be balanced by calls to packdb_close.
+ * For nested calls, subsequent ones merely increment a reference count.
+ *
+ * This is used to create space-efficient storage of object CRCs that
+ * are not associated with loose objects (e.g., because they are in pack
+ * files).  Intended for use when building pack files.
+ *
+ * Note:
+ *   Interacting with another process that calls this function on the
+ *   same repository may lead to deadlock unless packdb_close is
+ *   called before that interaction.
+ */
+extern void packdb_open(void);
+
+/**
+ * Store a crc in the persistent database for creating pack index files.
+ *
+ * Arguments:
+ *   sha1 - the key for the entry (a 20-byte sha1 hash)
+ *   crc - the crc to store (the crc of an object's data)
+ * Returns:
+ *   0 if we added a new entry, 1 if the entry already exists, -1 on error
+ */
+extern int packdb_process(const unsigned char *sha1, mdigest_t *digestp);
+
+/**
+ * Lookup a CRC from a database.
+ *
+ * Arguments:
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *    digestp - a pointer to a mdigest_t to store the returned value when
+ *              an entry in the database exists.
+ * Returns:
+ *   0 if no entry, 1 if there is an existing entry.
+ */
+extern int packdb_lookup(const unsigned char *sha1, mdigest_t *digestp);
+
+/**
+ * Remove a CRC from a database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *
+ * Returns:
+ *   0 on success; -1 if the entry did not exist or if an entry
+ *   could not be deleted
+ */
+extern int packdb_remove(const unsigned char *sha1);
+
+
+/**
+ * Reorganize the database.
+ * Returns:
+ *   0 on success; -1 on failure
+ */
+extern int packdb_reorganize(void);
+
+/**
+ * Close the database file.
+ */
+extern void packdb_close(void);
+
+/**
+ * Close the database if opened and uninitialize the module.
+ * This is intended to be called when the module is no longer needed.
+ */
+extern void packdb_finish(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox