Git development
 help / color / mirror / Atom feed
* [PATCH] builtin/log: remove redundant initialization
From: Michael Schubert @ 2011-12-21 12:05 UTC (permalink / raw)
  To: git

"abbrev" and "commit_format" in struct rev_info get initialized in
init_revisions - no need to reinit in cmd_log_init_defaults.

Signed-off-by: Michael Schubert <mschub@elegosoft.com>
---
 builtin/log.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 89d0cc0..7d1f6f8 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -73,8 +73,6 @@ static int decorate_callback(const struct option *opt, const char *arg, int unse
 
 static void cmd_log_init_defaults(struct rev_info *rev)
 {
-	rev->abbrev = DEFAULT_ABBREV;
-	rev->commit_format = CMIT_FMT_DEFAULT;
 	if (fmt_pretty)
 		get_commit_format(fmt_pretty, rev);
 	rev->verbose_header = 1;
-- 
1.7.8.400.g03f4

^ permalink raw reply related

* Re: [PATCH] Specify a precision for the length of a subject string
From: Andreas Schwab @ 2011-12-21 11:26 UTC (permalink / raw)
  To: Nathan W. Panike; +Cc: git
In-Reply-To: <20111220220754.GC21353@llunet.cs.wisc.edu>

"Nathan W. Panike" <nathan.panike@gmail.com> writes:

> $ git log --pretty='%h %30s' d165204 -1

In C's formatted output this syntax denotes a minimum field width, not a
precision, so it will probably be surprising to many people.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] Use Python's "print" as a function, not as a keyword
From: Frans Klaver @ 2011-12-21  8:43 UTC (permalink / raw)
  To: Sebastian Morr; +Cc: git, srabbelier, Ævar Arnfjörð Bjarmason
In-Reply-To: <20111221021930.GA31364@thinkpad>

On Wed, Dec 21, 2011 at 3:19 AM, Sebastian Morr <sebastian@morr.cc> wrote:

> This has changed from Version 2.6 to Version 3.0. Replace all occurrences of
>
>    print ...
>
> with
>
>    print(...)
>
> and all occurrences of
>
>    print >> output, ...
>
> with
>
>    output.write(... + "\n")

While it's good to look forward, you shouldn't forget about testing on
python 2.6. Lots of people still stick to that and maybe even to
earlier versions.


>  if len(argv) < 2:
> -       print 'Usage:', argv[0], '<zipfile>...'
> +       print('Usage:', argv[0], '<zipfile>...')
>        exit(1)
>

Here I would use the % notation:
print('Usage: %s <zipfile>...' % argv[0])

Python 2.x would print a tuple:

>>> argv = ('import-zips.py',)
>>> print("Usage:", argv[0], '<zipfile>...')
('Usage:', 'import-zips.py', '<zipfile>...')

You could probably get away with

print('Usage: ' + argv[0] + ' <zipfile>...')

But that could probably become a readability issue. I would even
wonder if it's considered pythonic.

It happens a few times more:

>  if verbose:
> -    print 'tip is', tip
> +    print('tip is', tip)

> @@ -176,27 +176,27 @@ for cset in range(int(tip) + 1):
>     os.write(fdcomment, csetcomment)
>     os.close(fdcomment)
>
> -    print '-----------------------------------------'
> -    print 'cset:', cset
> -    print 'branch:', hgbranch[str(cset)]
> -    print 'user:', user
> -    print 'date:', date
> -    print 'comment:', csetcomment
> +    print('-----------------------------------------')
> +    print('cset:', cset)
> +    print('branch:', hgbranch[str(cset)])
> +    print('user:', user)
> +    print('date:', date)
> +    print('comment:', csetcomment)
>     if parent:
> -       print 'parent:', parent
> +       print('parent:', parent)
>     if mparent:
> -        print 'mparent:', mparent
> +        print('mparent:', mparent)
>     if tag:
> -        print 'tag:', tag
> -    print '-----------------------------------------'
> +        print('tag:', tag)
> +    print('-----------------------------------------')


>
>     # checkout the parent if necessary
>     if cset != 0:
>         if hgbranch[str(cset)] == "branch-" + str(cset):
> -            print 'creating new branch', hgbranch[str(cset)]
> +            print('creating new branch', hgbranch[str(cset)])
>             os.system('git checkout -b %s %s' % (hgbranch[str(cset)], hgvers[parent]))
>         else:
> -            print 'checking out branch', hgbranch[str(cset)]
> +            print('checking out branch', hgbranch[str(cset)])
>             os.system('git checkout %s' % hgbranch[str(cset)])
>
>     # merge
> @@ -205,7 +205,7 @@ for cset in range(int(tip) + 1):
>             otherbranch = hgbranch[mparent]
>         else:
>             otherbranch = hgbranch[parent]
> -        print 'merging', otherbranch, 'into', hgbranch[str(cset)]
> +        print('merging', otherbranch, 'into', hgbranch[str(cset)])
>         os.system(getgitenv(user, date) + 'git merge --no-commit -s ours "" %s %s' % (hgbranch[str(cset)], otherbranch))
>
>     # remove everything except .git and .hg directories
> @@ -229,12 +229,12 @@ for cset in range(int(tip) + 1):
>
>     # delete branch if not used anymore...
>     if mparent and len(hgchildren[str(cset)]):
> -        print "Deleting unused branch:", otherbranch
> +        print("Deleting unused branch:", otherbranch)
>         os.system('git branch -d %s' % otherbranch)
>
>     # retrieve and record the version
>     vvv = os.popen('git show --quiet --pretty=format:%H').read()
> -    print 'record', cset, '->', vvv
> +    print('record', cset, '->', vvv)
>     hgvers[str(cset)] = vvv
>
>  if hgnewcsets >= opt_nrepack and opt_nrepack != -1:
> @@ -243,7 +243,7 @@ if hgnewcsets >= opt_nrepack and opt_nrepack != -1:
>  # write the state for incrementals
>  if state:
>     if verbose:
> -        print 'Writing state'
> +        print('Writing state')
>     f = open(state, 'w')
>     pickle.dump(hgvers, f)
>
> diff --git a/contrib/p4import/git-p4import.py b/contrib/p4import/git-p4import.py
> index b6e534b..144fafc 100644
> --- a/contrib/p4import/git-p4import.py
> +++ b/contrib/p4import/git-p4import.py
> @@ -26,11 +26,11 @@ if s != default_int_handler:
>  def die(msg, *args):
>     for a in args:
>         msg = "%s %s" % (msg, a)
> -    print "git-p4import fatal error:", msg
> +    print("git-p4import fatal error:", msg)
>     sys.exit(1)
>

I think that's it for the print(...,...) stuff. I might have misssed
one or two though.


> diff --git a/git-remote-testgit.py b/git-remote-testgit.py
> index 3dc4851..9803214 100644
> --- a/git-remote-testgit.py
> +++ b/git-remote-testgit.py
> @@ -81,9 +81,9 @@ def do_capabilities(repo, args):
>     """Prints the supported capabilities.
>     """
>
> -    print "import"
> -    print "export"
> -    print "refspec refs/heads/*:%s*" % repo.prefix
> +    print("import")
> +    print("export")
> +    print("refspec refs/heads/*:%s*" % repo.prefix)
>
>     dirname = repo.get_base_path(repo.gitdir)
>
> @@ -92,11 +92,11 @@ def do_capabilities(repo, args):
>
>     path = os.path.join(dirname, 'testgit.marks')
>
> -    print "*export-marks %s" % path
> +    print("*export-marks %s" % path)
>     if os.path.exists(path):
> -        print "*import-marks %s" % path
> +        print("*import-marks %s" % path)
>
> -    print # end capabilities
> +    print() # end capabilities

print("") here. 2.x:

>>> print()
()



>
>
>  def do_list(repo, args):
> @@ -109,16 +109,16 @@ def do_list(repo, args):
>
>     for ref in repo.revs:
>         debug("? refs/heads/%s", ref)
> -        print "? refs/heads/%s" % ref
> +        print("? refs/heads/%s" % ref)
>
>     if repo.head:
>         debug("@refs/heads/%s HEAD" % repo.head)
> -        print "@refs/heads/%s HEAD" % repo.head
> +        print("@refs/heads/%s HEAD" % repo.head)
>     else:
>         debug("@refs/heads/master HEAD")
> -        print "@refs/heads/master HEAD"
> +        print("@refs/heads/master HEAD")
>
> -    print # end list
> +    print() # end list

print("")

Lots more to do, I'm afraid.

Cheers,
Frans

^ permalink raw reply

* Patches for message-digest support.
From: Bill Zaumen @ 2011-12-21  7:51 UTC (permalink / raw)
  To: git, peff, pclouds, gitster


I just sent a series of 6 patches, roughly similar to the ones I sent a
few weeks ago, but allowing a choice of message digests in addition to
a CRC (kept for testing purposes) - SHA-1, SHA-256, and SHA-512 with
more added easily.  The current default is SHA-256.  The use of SHA-1
for git object IDs is unchanged. Unlike the object-ID digest, the
additional digests do not include the Git object-header.  I also changed
a number of function names, using "digest" or "mdigest" in them.
Searching for the string "digest" is a good way of finding things.
Finally, I added a header to commit messages (conditionally compiled so
this can be turned off) that contains a digest of digests plus some
other fields.  I also broke it up into a series of smaller patches.

Just as a summary:

The first patch contains several new files.  It uses a data structure
for message digests that keeps the bytes of a digest aligned on
32 or 64 bit boundaries to allow fast comparisons. The digests are
stored long with a one-byte code indicating the digest type. The code
handles storing and looking up the digests, including support for
alternate object databases.

The second patch modifies some of the existing git files (the major
changes are in sha1_file.c and pack-write.c) for storing message digests
when an object or a pack index file is created.

The third patch modifies the files in the builtin directory that contain
the implementation of git commands for packing and pruning objects, and
for verifying pack files and counting objects.  The code does some
checks for hash collisions by comparing the digests.  At this point,
each git object will have a digest that can be looked up given the
object's ID.  This mapping is maintained as pack files are
created.

The fourth patch adds a digest header to commit messages.  This header
contains a digest of the digests for the commit's parents and for each
object in the commit's tree, and of the other fields in the commit.
The digest header, like the rest of the commit, is used in computing the
commit's object ID and matching digest.  A function verify_commit
checks the digest header by recomputing it and can be used as desired
for authentication or other purposes.

The fifth patch transfers the message digest corresponding to a SHA-1
ID during fetch or push operations to allow early detection of
collisions.  This is a fast test - a simple lookup - and can be turned
off by removing the "mds-check" capability.

The sixth patch contains documentation.

Bill

^ permalink raw reply

* [PATCH 6/6] Provide documentation for git message digest extensions
From: Bill Zaumen @ 2011-12-21  7:13 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

The documentation includes API documentation for commands
and technical documentation for the message-digest-related
changes.  The technical documentation is in

* Documentation/technical/collision-detect.txt
* Documentation/technical/pack-format.txt

The modified commands are documented in

* Documentation/git-count-objects.txt
* Documentation/git-index-pack.txt
* Documentation/git-verify-pack.txt

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 Documentation/git-count-objects.txt          |   12 +-
 Documentation/git-index-pack.txt             |   17 +-
 Documentation/git-verify-pack.txt            |   27 ++
 Documentation/technical/collision-detect.txt |  342 ++++++++++++++++++++++++++
 Documentation/technical/pack-format.txt      |   47 ++++
 5 files changed, 439 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/technical/collision-detect.txt

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 23c80ce..4cdbaf5 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -8,7 +8,7 @@ git-count-objects - Count unpacked number of objects and their disk consumption
 SYNOPSIS
 --------
 [verse]
-'git count-objects' [-v]
+'git count-objects' [-v] [-M]
 
 DESCRIPTION
 -----------
@@ -25,6 +25,16 @@ OPTIONS
 	objects, number of packs, disk space consumed by those packs,
 	and number of objects that can be removed by running
 	`git prune-packed`.
+-M::
+--count-md::
+	Report the number of loose objects with no stored message digests.
+	With the -v option, the number of missing "mds" files (these
+	contain the message digests for the SHA1 hashes in the corresponding
+	"idx" files) is reported, along with a count of the number of
+	mds files whose size is wrong (e.g., an index was created but the
+	existing MDS file was not updated) and a count of the number of
+	objects in pack files that do not have a stored message digest.
+	Values that are zero are not shown.
 
 GIT
 ---
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..c9389f4 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -11,14 +11,14 @@ SYNOPSIS
 [verse]
 'git index-pack' [-v] [-o <index-file>] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
-                 [<pack-file>]
+		 [-m <mds-file>] [<pack-file>]
 

 DESCRIPTION
 -----------
-Reads a packed archive (.pack) from the specified file, and
-builds a pack index file (.idx) for it.  The packed archive
-together with the pack index can then be placed in the
+Reads a packed archive (.pack) from the specified file, and builds a
+pack index file (.idx) and a pack mds file (.mds) for it.  The packed
+archive together with the pack index can then be placed in the
 objects/pack/ directory of a git repository.
 

@@ -35,6 +35,14 @@ OPTIONS
 	fails if the name of packed archive does not end
 	with .pack).
 
+-m <mds-file>::
+	Write the generated pack mds file into the specified.
+	file Without this option, the name of the pack mds
+	file is constructed from the name of packed archive
+	file by replacing .pack with .idx (and the program
+	fails if the name of packed archive does not end
+	with .pack).
+
 --stdin::
 	When this flag is provided, the pack is read from stdin
 	instead and a copy is then written to <pack-file>. If
@@ -74,7 +82,6 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
-
 Note
 ----
 
diff --git a/Documentation/git-verify-pack.txt b/Documentation/git-verify-pack.txt
index cd23076..f69ed3f 100644
--- a/Documentation/git-verify-pack.txt
+++ b/Documentation/git-verify-pack.txt
@@ -33,6 +33,19 @@ OPTIONS
 	Do not verify the pack contents; only show the histogram of delta
 	chain length.  With `--verbose`, list of objects is also shown.
 
+-M::
+--show-mds::
+	Show the message digests along with the 40-character object names
+	(SHA1 value in hexidecimal). Ignored if --stat-only is set. If
+	--verbose is not set, only the table indexed by object names is
+	shown, although the files will be verified.  The message digests
+	printed are the actual ones - if the MDS file does not contain these,
+	the verification will fail.  The message digests will be
+	prefaced with a two-byte code indicating the type of digest.
+	The values (n hexadecimal) are 01 for a CRC, 05 for SHA-1, 08
+	for SHA-256, and 10 for SHA-512.  If the digest stored does
+	not match the actual digest, the actual one is printed as well.
+
 \--::
 	Do not interpret any more arguments as options.
 
@@ -48,6 +61,20 @@ for objects that are not deltified in the pack, and
 
 for objects that are deltified.
 
+When the -M option is used, the offset-in-pack field is followed by an
+entry giving the message digest.  The format used is:
+
+      md=0xHEX_VALUE
+
+when a message digest exists, and
+
+     <no md>
+
+when a message digest does not exist.  These entries precede the depth
+entry for deltified objects.  A non-existent message digest will be shown
+only if the MDS file is missing - while the MDS-file format allows missing
+entries, the file will not be considered valid.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/collision-detect.txt b/Documentation/technical/collision-detect.txt
new file mode 100644
index 0000000..d5a4364
--- /dev/null
+++ b/Documentation/technical/collision-detect.txt
@@ -0,0 +1,342 @@
+Hash-Collision Detection using Message Digests
+=============================
+
+Initially Git used a SHA-1 hash as an object ID under the assumption
+that a hash collision would never occur in practice. While an
+accidental SHA-1 collision is extremely unlikely, it is possible,
+although very expensive, to generate multiple files with the same
+SHA-1 value in under 2^57 operations.  With computer performance
+increasing significantly from one year to the next, Git's assumptions
+about SHA-1 will eventually not hold in the case of a malicious
+attempt to damage a project.  One should note that just because the
+probability of a SHA-1 collision occurring accidentally is extremely
+low does not mean a priori that SHA-1 provides an adequate safety
+margin for preventing a malicious attempt to damage repositories and a
+discussion below outlines some of the issues regarding this
+possibility.
+
+While one could modify Git to use SHA-224, SHA-256, SHA-384, or
+SHA-512 instead of SHA-1, the change would have to support the
+original format as well (in order to deal with existing Git
+repositories). While one could convert an existing repository to use
+the new hash function, this would require rewriting every object,
+including trees and commits.  The outcome would be problematic given
+the existence of email and documentation that might name commits by
+their SHA-1 hashes. One should note that Git performs a byte-by-byte
+check for hash collisions when a pack file is indexed.  Unfortunately,
+during fetch or pull operations, Git tries to avoid copying objects
+when a peer already has a copy, and this is determined solely on the
+basis of SHA-1 hashes.
+
+The following describes a modification to Git's initial design that is
+(a) relatively easy to implement, (b) is compatible with and can
+interoperate with older versions of Git (both the program and the
+repositories) (c) has a small computational overhead, and (d)
+increases security substantially, with a goal of detecting hash
+collisions early and automatically.  Because the implementation is
+relatively simple and the overhead very low, it makes sense to
+incorporate this change (or some alternative) before the security
+issue becomes a serious problem.
+
+Although Git generally uses that assumption that there will never be a
+hash collision using SHA-1 in practice, under some circumstances, Git
+will detect collisions via a byte-by-byte comparison as objects are
+added to the repository or as pack files are indexed.  This test is
+performed when an index is built (via the Git pack-index command), but
+a byte-by-byte comparison was deemed too computationally expensive to
+use in all circumstances: with pack files in particular, simply
+extracting an object can require not only decompressing it, but
+handling a series of delta encodings.
+
+Collision detection has been extended by computing a message digest of
+the object's contents (i.e., excluding the Git header). These message
+digests are stored separately from Git objects and are used for an
+independent collision test - looking up the message digests using the
+SHA-1 IDs as a key can be done quickly, and comparing them is fast as
+well (the digests are aligned to allow 32-bit or 64-bit integer
+comparisions).  This extension is computationally cheap (timing the
+Git test suite (run via 'make test') showed only a small increase in
+running time and the extension is backwards compatible with existing
+Git repositories - if a MD is not available for a SHA-1 value, the
+implementation reverts to its former behavior and simply compares
+SHA-1 values.  The implementation allows message digests to be easily
+added.
+
+The implementation creates a directory in .git/objects named "mdsd",
+which contains sub-directories and file names identical to the
+sub-directories in objects used to store loose objects: a two
+character directory name, with a 38-character file name, the
+concatenation of which gives the SHA-1 hash for the object.  the files
+in sub-directories of "mdsd", however, simply contain a one-byte code
+indicating the type of message digest, followed by the digest in its
+binary representation as a sequence of bytes.  In addition, for each
+pack file (.../objects/pack/FILE.pack), there is a corresponding file
+named .../objects/pack/FILE.mds in addition to
+.../objects/pack/FILE.idx.  The MDS file contains the MDs, stored in
+the same order as the SHA-1 hashes in .../objects/pack/FILE.idx.  The
+format of the MDS file is described in pack-format.txt.
+
+Thus, the directory structure (only part of it is shown) is as
+follows:
+
+ .git---.
+	|
+	|-objects-.
+	|	  |--XX--.
+	.	  |	 |--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+	.	  |	 .
+	.	  |	 .
+		  |	 .
+		  .
+		  .
+		  .
+		  |-mdsd-.
+		  |	 |--XX--.
+		  |	 |	|--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+		  |	 .	.
+		  |	 .	.
+		  |	 .	.
+		  |
+		  |-pack-.
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.pack
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.idx
+		  |	 |--YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.mds
+		  |	 .
+		  |	 .
+		  |	 .
+		  |
+		  `-info-.
+			 .
+			 .
+
+The mds files are relatively short - an average of N+1 bytes per MD
+of length N (provided N is a multiple of 4)
+plus some fixed overhead due to a header and trailer, with the MDs
+listed in the same order as the SHA-1 values in the matching idx file
+(a function named nth_packed_object_digest has the same signature as
+the previously-defined function nth_packed_object_offset, so the
+procedure to look up the MD value from a pack file is the same).
+
+For fetch and push operations, the commands fetch-pack, send-pack,
+receive-pack, and upload-pack were modified so that various object IDs
+can have any one of the following formats, with each number
+represented in hexadecimal:
+
+		SHA1
+		SHA1-MD
+
+where SHA1 is the SHA-1 hash of a commit and MD the message digest
+(prefixed by a code indicating the type of digest) of the commit
+(uncompressed, not including the Git object header).
+
+Both receive-pack and upload-pack send a capability named "mds-check"
+to allow the two longer object IDs.  When the MDs are available, the
+longer formats are used, but are generated only by fetch-pack and
+send-pack: because of backwards-compatibility constraints,
+receive-pack and upload-pack cannot determine the capabilities of
+fetch-pack and send-pack when connected to a remote repository).  The
+collision checks during a fetch, push, or pull command are done by
+receive-pack and upload pack because send-pack and fetch-pack do not
+receive their peers' MD values - send-pack and fetch-pack cannot determine
+their peers' capabilities given the current design.
+
+Changes to the Commit Format
+----------------------------
+
+A new header is available, positioned as the last header. It's name is
+"digest" and its value is a hex representation of a message digest in
+which the first two bytes name the algorithm used to generate the
+digest and the remaining bytes are the digest itself.  This digest is
+a digest of other digests or SHA-1 values.  It starts by including the
+digests of each object in the commit's tree with the exception of a
+commit from a submodule - in that case, the SHA-1 value itself is
+used.  The tree is searched depth first, including the tree's root,
+which provides the first digest.  This is followed by the digest for
+each parent commit in the order listed in the commit.  Finally, the
+digest covers the author, committer, encoding, and the commit message,
+all excluding the terminating newline character in the header.
+
+Note that the digest header complicates any attempt at generate a hash
+collision for a commit message - if you change a field, you also have
+to change the digest in a specific way.
+
+Configuration
+-------------
+
+There are several variables in the Makefile:
+
+      * MDSDB should have the value 0. If an alternative implementation
+	is defined, other values will be available.
+
+      * MDIGEST_DEFAULT should be defined to explicitly set the
+	default message digest.
+
+      * PACKDB should be defined to turn on the packdb functions.
+
+      * PACKDB_TEST should be defined only for debugging.  It uses some
+	packdb functions where not necessary to verify that these work
+	properly.  Turning this test on decreases performance.
+
+      * COMMIT_DIGEST should be defined to add a digest header to
+	commit messages as described above
+
+      * COMMIT_DIGEST_TEST should be defined to force commit_tree to
+	compute the digest even when COMMIT_DIGEST is undefined, in
+	which case the digest will be computed but not included in a
+	commit.
+
+Key Functions
+-------------
+
+For general use, the functions documented in mdigest.h can extract
+data from message digests and compare them.  The function verify_commit
+will test a commit to make sure that its digest field matches the
+repository.  The function has_sha1_file_digest allows one to determine
+if a digest exists and obtain a copy of that digest.
+
+To find functions that are used in the implementation (e.g., if changes
+to the pack-file format become necessary), search for the type
+mdigest_t or variables width digest in their names (e.g., digestp).
+
+
+Implementation Details
+----------------------
+
+Functions to manipulate the message digest/MD database are declared
+in the file mdsdb.h.  The implementation as described above is in the
+file objd-mdsdb.c: it is thus easy to change the implementation of how
+these objects are stored with minimal impact on the rest of the source
+code.  The message digest structure and functions to manipulate it are
+declared in mdigest.h, with the implementation in mdigest.c.
+
+In pack-write.c, there is a new function named write_mds_file with the
+same function signature as write_idx_file.  Both are called in pairs
+(write_idx_file first) so that the idx file and mds file for the
+corresponding pack file will always be created.
+
+In commit.c, there is a new function that recursively traverses the
+tree associated with a commit and finds the "blob" entries and looks
+up those entries' message digests in order to compute a message digest
+of these message digests (which is faster than computing a message
+digest of all the bytes in the blobs associated with a commit).
+
+Various function names signatures in sha1_file.c were changed to take
+two additional arguments, the first a pointer to an int used as a flag
+to indicate whether a MD exists, and the second a pointer to a
+uint32_t containing the MD.  For backwards compatibility with
+previously existing functions, those functions had there names changed
+by adding "_extended" to them, with macros in cache.h defined so that
+existing code that does not need to obtain a MD would not be
+changed. There are a few additional functions added to sha1_file.c
+such as one to determine if there is an MD for a given SHA-1
+value. Many changes in the rest of Git that result from this simply
+change the arguments to these functions.  As a convention, most such
+arguments use names like objcrc32, objcrc32p, has_objcrc32 and
+has_objcrc32p in order to make it easy to find areas of the code
+implementing hash-collision detection using the git-grep command.
+
+A few data structures (notably struct pack_idx_entry and struct
+packed_git) contain fields used to store has_objcrc32 and objcrc32
+values or data associated with MDS files.  These are used while
+building new MDS files.
+
+Some of the Git commands (count-objects, index-pack, and verify-pack)
+have additional command-line options related to the MDs and mds
+files. This makes it possible to explicitly name an mds file being
+created and to request that various listings show both the MD
+values in addition to SHA-1 hashes (the MD values are not listed
+by default in case user-defined scripts assume the current behavior).
+
+For C files, changes were made to the following files (compared to
+commit f56564968 - v1.7.8-rc4) for the initial collision-detection
+implementation:
+
+       * builtin/count-objects.c
+       * builtin/fetch-pack.c
+       * builtin/index-pack.c
+       * builtin/init-db.c
+       * builtin/pack-objects.c
+       * builtin/pack-redundant.c
+       * builtin/prune-packed.c
+       * builtin/prune.c
+       * builtin/receive-pack.c
+       * builtin/send-pack.c
+       * builtin/verify-pack.c
+       * commit.c
+       * environment.c
+       * fast-import.c
+       * gdbm-packdb.c (new file)
+       * git.c
+       * hex.c
+       * http.c
+       * mdigest.c (new file)
+       * objd-mdsdb.c (new file)
+       * pack-write.c
+       * sha1_file.c
+       * upload-pack.c
+
+The other files had changes that reflected changes to function
+signatures.
+
+The header files that were modified are
+
+    * cache.h
+    * commit.h
+    * mdigest.h (new file)
+    * mdsdb.h (new file)
+    * pack.h
+    * packdb.h (new file)
+
+where the changes are mostly new function declarations, a few macros
+for backwards-compatibility, and a few additional fields in some
+data structures.
+
+Minor changes were made to the test suite: t0000-basic.sh,
+t5300-pack-object.sh, t5304-prune.sh, t5500-fetch-pack.sh, and
+t5510-fetch.sh.
+
+The packdb functions are conditionally compiled and by default are not
+used.  When used, these use GDBM to store MDs for SHA-1 hashes in
+cases in which the hash was not available - in this case the hash will
+be recomputed and stored for future use.  Testing indicates that
+packdb is not needed. It may be worth turning on during debugging to
+verify if a problem is discovered involving a missing MD. (As an
+aside, the packdb code is based on a test to see if GDBM would be
+efficient enough to store the MD values in general, thus avoiding
+the need to create "mds" files and reducing the number of files in the
+"mdsd" directory, but it turned out that performance was not
+acceptable.)
+
+Security-Issue Details
+----------------------
+
+Without hash-collision detection, Git has a possible risk of data
+corruption due to the obvious hash-collision vulnerability, so the
+issue is really whether a usable vulnerability exists. In the case
+of a single shared repository (one common repository shared by
+multiple developers), this risk is mitigated by Git's rule that
+an existing entry is never overridden. The situation is more complex
+when multiple repositories are used, as a race condition may exist.
+Also, the risk depends on whether or not developers exchange source
+files by some out-of-band mechanism (e.g., email) - when programming
+in Java, for example, it is customary to use the javadocs program to
+create API documentation from stylized comments in source files. A
+programmer might send a peer a copy of a source file and to review
+and correct the java-doc comments, providing an opporunity for the
+second developer to insert a modified file into the respository. The
+first developer would presumably check that the source code has not
+changed (an automated test for this might use the java compiler's
+"-g:none" option so that line-number data does not appear in the object
+file), and review the comments, but would not care about formatting
+(location of line breaks, etc., as the javadocs program generates HTML).
+
+In any event, recent research has shown that SHA-1 collisions can be
+found in 2^63 operations or less.  While one result claimed 2^53
+operations, the paper claiming that value was withdrawn from
+publication due to an error in the estimate. Another result claimed a
+complexity of between 2^51 and 2^57 operations, and still another
+claimed a complexity of 2^57.5 SHA-1 computations. A summary is
+available at
+<http://hackipedia.org/Checksums/SHA/html/SHA-1.htm#SHA-1>.
+This is within or close to the number of computations that can be
+managed by a well-funded organization.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1803e64..a2aad5f 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -158,3 +158,50 @@ Pack file entry: <+
     corresponding packfile.
 
     20-byte SHA1-checksum of all of the above.
+
+
+= pack-*.mds files contain message digests for objects.  The digests
+  are stored in the same order as the sha-1 values in the matching idx
+  file.  These files have the following format:
+
+  - A 6-byte magic number consisting the the characters "PKMDS" followed
+    by a NULL character (0).
+
+  - A one-byte version number (= 1)
+
+  - A one-byte field-length value for message digest fields, in units
+    of 4-byte words.  (The length of the message digest fields in bytes
+    is denoted as wbsize below)
+
+  - A set of blocks, each of which contains 4 entries encoded as follows:
+
+      * four one-byte fields (wcode fields), one per entry, for which
+	a zero value indicates that a matching entry does not exist
+	and a non-zero value indicates the type of message digest
+	encoded as follows:
+
+	   + 1 for a CRC  (used as a trivial case for performance testing)
+
+	   + 5 for SHA-1
+
+	   + 8 for SHA-256
+
+	   + 16 for SHA-512
+
+      * 4 wbsize-byte fields, each containing a message digest (by
+	convention which must be all NULL characters if the MD does
+	not exist).  For each field, the data it contains should start
+	at the first byte, padded with NULL characters if the field is
+	longer than the digest it stores.
+
+    For the set of all blocks, the nth one-byte field and the nth
+    wcode field store the values for the nth entry in the
+    file. The format ensures that each message digest starts on a
+    32-bit boundary, allowing 32-bit integer operations to be used in
+    copying or comparing values.
+
+  - A 20 byte SHA-1 hash of the SHA-1 hashes naming the objects whose
+    message digests are being stored, in the same order as they
+    appear in the corresponding idx file.
+
+  - A 20 byte SHA-1 hash of all of the above.
-- 
1.7.1

^ permalink raw reply related

* [PATCH 5/6] Add MD support for fetch, pull, and push.
From: Bill Zaumen @ 2011-12-21  7:12 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

A new capability, "mds-check" is defined. When present, a client will
(when possible and useful) send the server a SHA-1 value and a message
digest, separated by a '-'.  This is used to detect hash collisions,
with a goal of finding problems early if a malicious attempt is made
to forge commits with different commits with the same SHA-1 value in
different repositories.  It is a simple, fast test - a look-up and a
comparison.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 builtin/fetch-pack.c   |   29 +++++++++++-
 builtin/receive-pack.c |  117 +++++++++++++++++++++++++++++++++++++++++++-----
 builtin/send-pack.c    |   26 ++++++++++-
 http.c                 |   19 ++++++--
 t/t5500-fetch-pack.sh  |   10 +++--
 t/t5510-fetch.sh       |   12 ++++-
 upload-pack.c          |   11 ++++-
 7 files changed, 196 insertions(+), 28 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 6207ecd..2f5b7ef 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -18,6 +18,8 @@ static int prefer_ofs_delta = 1;
 static int no_done;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
+static int mds_check = 0;
+
 static struct fetch_pack_args args = {
 	/* .uploadpack = */ "git-upload-pack",
 };
@@ -390,9 +392,25 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 	flushes = 0;
 	retval = -1;
 	while ((sha1 = get_rev())) {
-		packet_buf_write(&req_buf, "have %s\n", sha1_to_hex(sha1));
-		if (args.verbose)
-			fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		if (mds_check) {
+			mdigest_t digest;
+			if (has_sha1_file_digest(sha1, &digest)) {
+				packet_buf_write(&req_buf, "have %s\n",
+						 sha1_to_hex_digest(sha1,
+								    &digest));
+
+			} else {
+				packet_buf_write(&req_buf, "have %s\n",
+						 sha1_to_hex(sha1));
+			}
+			if (args.verbose)
+				fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		} else {
+			packet_buf_write(&req_buf, "have %s\n",
+					 sha1_to_hex(sha1));
+			if (args.verbose)
+				fprintf(stderr, "have %s\n", sha1_to_hex(sha1));
+		}
 		in_vain++;
 		if (flush_at <= ++count) {
 			int ack;
@@ -807,6 +825,11 @@ static struct ref *do_fetch_pack(int fd[2],
 			fprintf(stderr, "Server supports ofs-delta\n");
 	} else
 		prefer_ofs_delta = 0;
+	if (server_supports("mds-check")) {
+		if (args.verbose)
+			fprintf(stderr, "Server supports mds-check\n");
+		mds_check = 1;
+	}
 	if (everything_local(&ref, nr_match, match)) {
 		packet_flush(fd[1]);
 		goto all_done;
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d2dcb7e..b9d1c1f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -122,7 +122,8 @@ static int show_ref(const char *path, const unsigned char *sha1, int flag, void
 	else
 		packet_write(1, "%s %s%c%s%s\n",
 			     sha1_to_hex(sha1), path, 0,
-			     " report-status delete-refs side-band-64k",
+			     " report-status delete-refs side-band-64k"
+			     " mds-check",
 			     prefer_ofs_delta ? " ofs-delta" : "");
 	sent_capabilities = 1;
 	return 0;
@@ -709,37 +710,131 @@ static struct command *read_head_info(void)
 	struct command *commands = NULL;
 	struct command **p = &commands;
 	for (;;) {
-		static char line[1000];
+		static char line[1500];
 		unsigned char old_sha1[20], new_sha1[20];
 		struct command *cmd;
 		char *refname;
 		int len, reflen;
+		int has_old_sha1_digest = 0;
+		int has_new_sha1_digest = 0;
+		mdigest_t old_sha1_digest;
+		mdigest_t new_sha1_digest;
+		mdigest_t digest;
+		int old_hashlen = 40;
+		int new_hashlen = 40;
+		int hashlen = 81; /* includes blank between two hashes */
+		int digest_field_len = 0;
+
+		mdigest_clear(&old_sha1_digest);
+		mdigest_clear(&new_sha1_digest);
 
 		len = packet_read_line(0, line, sizeof(line));
 		if (!len)
 			break;
 		if (line[len-1] == '\n')
 			line[--len] = 0;
-		if (len < 83 ||
-		    line[40] != ' ' ||
-		    line[81] != ' ' ||
-		    get_sha1_hex(line, old_sha1) ||
-		    get_sha1_hex(line + 41, new_sha1))
+		if (len > (old_hashlen + 1) && line[old_hashlen] == '-') {
+			digest_field_len = 1
+				+ get_hex_field_size(line+old_hashlen+1);
+		}
+		if (len > (old_hashlen + digest_field_len) &&
+		    line[old_hashlen] == '-') {
+			old_hashlen += digest_field_len;
+			hashlen += digest_field_len;
+			digest_field_len = 0;
+			if (len > (old_hashlen + 1)
+			    && line[old_hashlen] == '-') {
+				digest_field_len = 1 +
+					get_hex_field_size(line+old_hashlen+1);
+			}
+			if (len > (old_hashlen + digest_field_len + 1) &&
+			    line[old_hashlen] == '-') {
+				old_hashlen += digest_field_len;
+				hashlen += digest_field_len;
+			}
+		}
+		if (line[old_hashlen] != ' ') {
+			die("protocol error: expected old/new/ref, got '%s'",
+			    line);
+		}
+		digest_field_len = 0;
+		if (len > hashlen + 1 && line[hashlen] == '-') {
+			digest_field_len = 1 +
+			  get_hex_field_size(line+hashlen+1);
+		}
+		if (len > (hashlen + digest_field_len + 1) &&
+		    line[hashlen] == '-') {
+			new_hashlen += digest_field_len;
+			hashlen += digest_field_len;
+			digest_field_len = 0;
+			if (len > (hashlen + 1)
+			    && line[hashlen] == '-') {
+				digest_field_len = 1 +
+					get_hex_field_size(line+hashlen+1);
+			}
+			if (len > (hashlen + digest_field_len + 1) &&
+			    line[hashlen] == '-') {
+				new_hashlen += digest_field_len;
+				hashlen += digest_field_len;
+			}
+		}
+		if (line[hashlen] != ' ') {
 			die("protocol error: expected old/new/ref, got '%s'",
 			    line);
+		}
+
+		if (len < hashlen + 1 ||
+		    line[old_hashlen] != ' ' ||
+		    line[hashlen] != ' ' ||
+		    get_sha1_hex_digest(line, old_sha1,
+				     &has_old_sha1_digest, &old_sha1_digest) ||
+		    get_sha1_hex_digest(line + old_hashlen + 1, new_sha1,
+					&has_new_sha1_digest,
+					&new_sha1_digest)) {
+			die("protocol error: expected old/new/ref, got '%s'",
+			    line);
+		}
+
+		if (has_old_sha1_digest &&
+		    has_sha1_file_digest(old_sha1, &digest)) {
+			if (mdigest_tst(&old_sha1_digest, &digest)) {
+				die("hash collision for %s",
+				    sha1_to_hex(old_sha1));
+			}
+		}
+
+
+		if (has_new_sha1_digest &&
+		    has_sha1_file_digest(new_sha1, &digest)) {
+			if (mdigest_tst(&new_sha1_digest, &digest)) {
+				die("hash collision for %s",
+				    sha1_to_hex(new_sha1));
+			}
+		}
 
-		refname = line + 82;
+		refname = line + hashlen + 1;
 		reflen = strlen(refname);
-		if (reflen + 82 < len) {
+		if (reflen + hashlen + 1 < len) {
 			if (strstr(refname + reflen + 1, "report-status"))
 				report_status = 1;
 			if (strstr(refname + reflen + 1, "side-band-64k"))
 				use_sideband = LARGE_PACKET_MAX;
 		}
-		cmd = xcalloc(1, sizeof(struct command) + len - 80);
+		/*
+		 * Without the additional digests,
+		 *   old_hashlen + new_hashlen = 80
+		 *   hashlen = 81,
+		 *   hashlen + 1 = 82
+		 * which puts the same numeric values into the last argument
+		 * of xcalloc, and the second & third argument of memcpy
+		 * that were used in commit
+		 * fc14b89a7e6899b8ac3b5751ec2d8c98203ea4c2.
+		 */
+		cmd = xcalloc(1, sizeof(struct command) + len
+			      - (old_hashlen + new_hashlen));
 		hashcpy(cmd->old_sha1, old_sha1);
 		hashcpy(cmd->new_sha1, new_sha1);
-		memcpy(cmd->ref_name, line + 82, len - 81);
+		memcpy(cmd->ref_name, line + hashlen + 1, len - (hashlen));
 		*p = cmd;
 		p = &cmd->next;
 	}
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index cd1115f..1eb9704 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -250,6 +250,7 @@ int send_pack(struct send_pack_args *args,
 	int allow_deleting_refs = 0;
 	int status_report = 0;
 	int use_sideband = 0;
+	int mds_check = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -263,6 +264,8 @@ int send_pack(struct send_pack_args *args,
 		args->use_ofs_delta = 1;
 	if (server_supports("side-band-64k"))
 		use_sideband = 1;
+	if (server_supports("mds-check"))
+		mds_check = 1;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
@@ -298,8 +301,27 @@ int send_pack(struct send_pack_args *args,
 		if (args->dry_run) {
 			ref->status = REF_STATUS_OK;
 		} else {
-			char *old_hex = sha1_to_hex(ref->old_sha1);
-			char *new_hex = sha1_to_hex(ref->new_sha1);
+			char *old_hex, *new_hex;
+			if (mds_check) {
+				mdigest_t digest;
+				if (has_sha1_file_digest(ref->old_sha1,
+						      &digest)) {
+					old_hex = sha1_to_hex_digest
+						(ref->old_sha1, &digest);
+				} else {
+					old_hex = sha1_to_hex(ref->old_sha1);
+				}
+				if (has_sha1_file_digest(ref->new_sha1,
+						      &digest)) {
+					new_hex = sha1_to_hex_digest
+						(ref->new_sha1, &digest);
+				} else {
+					new_hex = sha1_to_hex(ref->new_sha1);
+				}
+			} else {
+				old_hex = sha1_to_hex(ref->old_sha1);
+				new_hex = sha1_to_hex(ref->new_sha1);
+			}
 
 			if (!cmds_sent && (status_report || use_sideband)) {
 				packet_buf_write(&req_buf, "%s %s %s%c%s%s",
diff --git a/http.c b/http.c
index 0ffd79c..e4e3ec7 100644
--- a/http.c
+++ b/http.c
@@ -1014,8 +1014,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	struct packed_git **lst;
 	struct packed_git *p = preq->target;
 	char *tmp_idx;
+	char *tmp_mds;
 	struct child_process ip;
-	const char *ip_argv[8];
+	const char *ip_argv[10];
 
 	close_pack_index(p);
 
@@ -1028,14 +1029,20 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	*lst = (*lst)->next;
 
 	tmp_idx = xstrdup(preq->tmpfile);
+	tmp_mds = xstrdup(preq->tmpfile);
 	strcpy(tmp_idx + strlen(tmp_idx) - strlen(".pack.temp"),
 	       ".idx.temp");
+	strcpy(tmp_mds + strlen(tmp_mds) - strlen(".pack.temp"),
+	       ".mds.temp");
+
 
 	ip_argv[0] = "index-pack";
 	ip_argv[1] = "-o";
 	ip_argv[2] = tmp_idx;
-	ip_argv[3] = preq->tmpfile;
-	ip_argv[4] = NULL;
+	ip_argv[3] = "-m";
+	ip_argv[4] = tmp_mds;
+	ip_argv[5] = preq->tmpfile;
+	ip_argv[6] = NULL;
 
 	memset(&ip, 0, sizeof(ip));
 	ip.argv = ip_argv;
@@ -1046,20 +1053,24 @@ int finish_http_pack_request(struct http_pack_request *preq)
 	if (run_command(&ip)) {
 		unlink(preq->tmpfile);
 		unlink(tmp_idx);
+		unlink(tmp_mds);
 		free(tmp_idx);
+		free(tmp_mds);
 		return -1;
 	}
 
 	unlink(sha1_pack_index_name(p->sha1));
 
 	if (move_temp_to_file(preq->tmpfile, sha1_pack_name(p->sha1))
-	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))) {
+	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))
+	 || move_temp_to_file(tmp_mds, sha1_pack_mds_name(p->sha1))) {
 		free(tmp_idx);
 		return -1;
 	}
 
 	install_packed_git(p);
 	free(tmp_idx);
+	free(tmp_mds);
 	return 0;
 }
 
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 9bf69e9..b6632d2 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -53,8 +53,8 @@ pull_to_client () {
 			git symbolic-ref HEAD refs/heads/`echo $heads \
 				| sed -e "s/^\(.\).*$/\1/"` &&
 
-			git fsck --full &&
-
+			git fsck --full  &&
+			test -z "`git count-objects -v -M | grep MD`" &&
 			mv .git/objects/pack/pack-* . &&
 			p=`ls -1 pack-*.pack` &&
 			git unpack-objects <$p &&
@@ -142,7 +142,8 @@ test_expect_success 'fsck in shallow repo' '
 test_expect_success 'simple fetch in shallow repo' '
 	(
 		cd shallow &&
-		git fetch
+		git fetch &&
+		test -z "`git count-objects -v -M | grep MD`"
 	)
 '
 
@@ -245,7 +246,8 @@ test_expect_success 'clone shallow object count' '
 		cd shallow &&
 		git count-objects -v
 	) > count.shallow &&
-	grep "^count: 52" count.shallow
+	grep "^count: 52" count.shallow  &&
+	test -z "`git count-objects -v -M | grep MD`"
 '
 
 test_done
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index e88dbd5..5e3b8c6 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -14,6 +14,12 @@ test_bundle_object_count () {
 	test "$2" = $(grep '^[0-9a-f]\{40\} ' verify.out | wc -l)
 }
 
+test_bundle_mds_count () {
+	git verify-pack -v -M "$1" >verify.out &&
+	test "$2" = $(grep '^[0-9a-f]\{40\} ' verify.out | grep -v "<no md>" | wc -l)
+}
+
+
 test_expect_success setup '
 	echo >file original &&
 	git add file &&
@@ -214,7 +220,8 @@ test_expect_success 'bundle 1 has only 3 files ' '
 		cat
 	) <bundle1 >bundle.pack &&
 	git index-pack bundle.pack &&
-	test_bundle_object_count bundle.pack 3
+	test_bundle_object_count bundle.pack 3 &&
+	test_bundle_mds_count bundle.pack 3
 '
 
 test_expect_success 'unbundle 2' '
@@ -237,7 +244,8 @@ test_expect_success 'bundle does not prerequisite objects' '
 		cat
 	) <bundle3 >bundle.pack &&
 	git index-pack bundle.pack &&
-	test_bundle_object_count bundle.pack 3
+	test_bundle_object_count bundle.pack 3 &&
+	test_bundle_mds_count bundle.pack 3
 '
 
 test_expect_success 'bundle should be able to create a full history' '
diff --git a/upload-pack.c b/upload-pack.c
index 6f36f62..1e77826 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -320,11 +320,18 @@ static int got_sha1(char *hex, unsigned char *sha1)
 {
 	struct object *o;
 	int we_knew_they_have = 0;
+	int has_sha1_digest, has_digest;
+	mdigest_t sha1_digest, digest;
 
-	if (get_sha1_hex(hex, sha1))
+	if (get_sha1_hex_digest(hex, sha1, &has_sha1_digest, &sha1_digest))
 		die("git upload-pack: expected SHA1 object, got '%s'", hex);
 	if (!has_sha1_file(sha1))
 		return -1;
+	has_digest = has_sha1_file_digest(sha1, &digest);
+	if (has_sha1_digest && has_digest
+	    && mdigest_tst(&digest, &sha1_digest)) {
+		die("git upload-pack: SHA1 collision on MD for %s", hex);
+	}
 
 	o = lookup_object(sha1);
 	if (!(o && o->parsed))
@@ -719,7 +726,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed" " mds-check";
 	struct object *o = parse_object(sha1);
 	const char *refname_nons = strip_namespace(refname);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 4/6] Add digests to commit objects.
From: Bill Zaumen @ 2011-12-21  7:12 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

When COMMIT_DIGEST is defined in the Makefile, a new
header is added to commits. The header is named 'digest'
and is a digest of the digests associated with the
commit's tree (computed recursively) and parents, and
of the other fields. This digest is included in the
SHA-1 hash computation and the commit's digest.

A function named verify_commit allows the digest to
be recomputed and checked.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 commit.c |  436 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 commit.h |   11 ++
 2 files changed, 443 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index b781274..ac0d492 100644
--- a/commit.c
+++ b/commit.c
@@ -6,11 +6,395 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
 
+struct commit_mds_context {
+	unsigned long missing;
+	mdigest_context_t context;
+#ifdef PACKDB
+	int packdb_opened;
+#endif /* PACKDB */
+};
+
+#if defined(COMMIT_DIGEST)||defined(PACKDB_TEST)||defined(COMMIT_DIGEST_TEST)
+static int get_objects_mds_f(const unsigned char *sha1,
+			  const char *basebuf, int baselen,
+			  const char *path, unsigned int mode, int stage,
+			  void *context)
+{
+	struct commit_mds_context *c = (struct commit_mds_context *)context;
+	mdigest_t digest;
+	unsigned long size;
+	int type;
+
+	if (S_ISGITLINK(mode)) {
+		/*
+		 * Submodule entry - SHA-1 of a commit in the submodule
+		 * with no entry in our repository.
+		 */
+		mdigest_Update(&c->context, sha1, 20);
+		return -1;
+	}
+	if (!has_sha1_file(sha1)) {
+		c->missing++;
+		return -1;
+	}
+	if (has_sha1_file_digest(sha1, &digest)) {
+		int wcode = get_mdigest_wcode(&digest);
+		unsigned char ucwcode = (unsigned char) (wcode & 0xff);
+		mdigest_Update(&c->context, &ucwcode, 1);
+		mdigest_Update(&c->context, get_mdigest_buffer(&digest),
+			       get_mdigest_len(&digest));
+#if defined(PACKDB) && defined (PACKDB_TEST)
+		{
+		  static int firsttime = 1;
+		  mdigest_t xdigest;
+		  if (!c->packdb_opened) {
+		    packdb_open();
+		    c->packdb_opened = 1;
+		    firsttime = 0;
+		  }
+		  if (firsttime) {
+		    if (!packdb_lookup(sha1, &xdigest)) {
+		      packdb_process(sha1, &digest);
+		      packdb_lookup(sha1, &xdigest);
+		      if (mdigest_tst(&digest, &xdigest)) {
+			die("digest for %s failed with packdb\n",
+			    sha1_to_hex(sha1));
+		      }
+		    }
+		  }
+		}
+#endif
+	} else {
+		enum object_type xtype;
+		unsigned long xsize;
+		mdigest_t xdigest;
+		unsigned char xsha1[20];
+		int wcode;
+		unsigned char ucwcode;
+#ifdef PACKDB
+		if (!c->packdb_opened) {
+			packdb_open();
+			c->packdb_opened = 1;
+		}
+		if (!packdb_lookup(sha1, &xdigest)) {
+#endif /* PACKDB */
+			void *data = read_sha1_file(sha1, &xtype, &xsize);
+			hash_sha1_file_extended(data, xsize,
+						typename(xtype),
+						xsha1,
+						&xdigest);
+			free(data);
+#ifdef PACKDB
+			packdb_process(sha1, &xdigest);
+		  }
+#endif /* PACKDB */
+		wcode = get_mdigest_wcode(&xdigest);
+		ucwcode = (unsigned char) (wcode & 0xff);
+		mdigest_Update(&c->context, &ucwcode, 1);
+		mdigest_Update(&c->context, get_mdigest_buffer(&xdigest),
+			       get_mdigest_len(&xdigest));
+	}
+	type = sha1_object_info(sha1, &size);
+	switch(type) {
+	case OBJ_TREE:
+	  return (S_ISDIR(mode))? READ_TREE_RECURSIVE: 0;
+	case OBJ_BLOB:
+		return 0;
+	default:
+		if (type <= OBJ_NONE) {
+		  c->missing++;
+		}
+		return 0;
+	}
+}
+
+/*
+ * Works with a tree or a commit sha1 - recursively traverses the trees
+ * and computes the CRC of each blob's CRC. With a commit sha1, the
+ * caller must provide the 'parents' list - we sometimes call this
+ * function with the commit-object's tree before the sha1 value is
+ * computed.
+ */
+static int get_objects_mds(const unsigned char *sha1,
+			   struct commit_list *parents,
+			   const char *author, size_t author_len,
+			   const char *committer, size_t committer_len,
+			   const char *encoding, size_t encoding_len,
+			   struct commit_extra_header *extra,
+			   const char *msg, size_t msg_len,
+			   mdigest_t *digestp)
+{
+	struct commit_mds_context context;
+	struct tree *tree = parse_tree_indirect(sha1);
+	struct pathspec ps;
+	mdigest_t xdigest;
+	struct commit_extra_header *extra_head = extra;
+	mdigest_Init(&context.context, MDIGEST_DEFAULT);
+	context.missing = 0;
+#ifdef PACKDB
+	context.packdb_opened = 0;
+#endif /* PACKDB */
+
+	if (tree == NULL) {
+		return -1;
+	} else {
+		init_pathspec(&ps, NULL);
+		parse_tree(tree);
+		if (has_sha1_file_digest(tree->object.sha1, &xdigest)) {
+			int wcode = get_mdigest_wcode(&xdigest);
+			unsigned char ucwcode =
+				(unsigned char) (wcode & 0xff);
+			mdigest_Update(&context.context, &ucwcode, 1);
+			mdigest_Update(&context.context,
+				       get_mdigest_buffer(&xdigest),
+				       get_mdigest_len(&xdigest));
+		} else {
+			enum object_type xtype;
+			unsigned long xsize;
+			mdigest_t xdigest;
+			unsigned char xsha1[20];
+			int wcode;
+			unsigned char ucwcode;
+#ifdef PACKDB
+			if (!context.packdb_opened) {
+				packdb_open();
+				context.packdb_opened = 1;
+			}
+			if (!packdb_lookup(sha1, &xdigest)) {
+#endif /* PACKDB */
+
+				void *data = read_sha1_file(sha1, &xtype,
+							    &xsize);
+				hash_sha1_file_extended(data, xsize,
+							typename(xtype),
+							xsha1,
+							&xdigest);
+				free(data);
+#ifdef PACKDB
+				packdb_process(sha1, &xdigest);
+			}
+#endif /* PACKDB */
+			wcode = get_mdigest_wcode(&xdigest);
+			ucwcode = (unsigned char)(wcode & 0xff);
+			mdigest_Update(&context.context, &ucwcode, 1);
+			mdigest_Update(&context.context,
+				       get_mdigest_buffer(&xdigest),
+				       get_mdigest_len(&xdigest));
+		}
+		read_tree_recursive(tree, "", 0, 0, &ps, get_objects_mds_f,
+				    &context);
+		while (parents) {
+			/*
+			 * Include the message digests of the parent commits.
+			 */
+			struct commit_list *next = parents->next;
+			if (!has_sha1_file(parents->item->object.sha1)) {
+				return -1;
+			} else if (has_sha1_file_digest
+				   (parents->item->object.sha1, &xdigest)) {
+				int wcode = get_mdigest_wcode(&xdigest);
+				unsigned char ucwcode =
+					(unsigned char) (wcode & 0xff);
+				mdigest_Update(&context.context, &ucwcode, 1);
+				mdigest_Update(&context.context,
+					       get_mdigest_buffer(&xdigest),
+					       get_mdigest_len(&xdigest));
+			} else {
+				enum object_type xtype;
+				unsigned long xsize;
+				mdigest_t xdigest;
+				unsigned char xsha1[20];
+				int wcode;
+				unsigned char ucwcode;
+#ifdef PACKDB
+				if (!context.packdb_opened) {
+					packdb_open();
+					context.packdb_opened = 1;
+				}
+				if (!packdb_lookup(sha1, &xdigest)) {
+#endif /* PACKDB */
+
+					void *data = read_sha1_file(sha1,
+								    &xtype,
+								    &xsize);
+					hash_sha1_file_extended(data, xsize,
+								typename(xtype),
+								xsha1,
+								&xdigest);
+					free(data);
+#ifdef PACKDB
+					packdb_process(sha1, &xdigest);
+				}
+#endif /* PACKDB */
+				wcode = get_mdigest_wcode(&xdigest);
+				ucwcode = (unsigned char)(wcode & 0xff);
+				mdigest_Update(&context.context, &ucwcode, 1);
+				mdigest_Update(&context.context,
+					       get_mdigest_buffer(&xdigest),
+					       get_mdigest_len(&xdigest));
+			}
+			parents = next;
+		}
+#ifdef PACKDB
+		if (context.packdb_opened) packdb_close();
+#endif /* PACKDB */
+		if (msg && author && committer && digestp) {
+			mdigest_Update(&context.context, author, author_len);
+			mdigest_Update(&context.context, committer,
+				       committer_len);
+			if (encoding) mdigest_Update(&context.context, encoding,
+						     encoding_len);
+			while (extra) {
+			  mdigest_Update(&context.context, extra->key,
+					 strlen(extra->key));
+			  mdigest_Update(&context.context, " ", 1);
+			  mdigest_Update(&context.context, extra->value,
+					 extra->len);
+			  mdigest_Update(&context.context, "\n", 1);
+			  extra = extra->next;
+			}
+			free_commit_extra_headers(extra_head);
+			mdigest_Update(&context.context, msg, msg_len);
+		}
+		if (digestp) mdigest_Final(digestp, &context.context);
+		return ((context.missing == 0)? 0: -1);
+	}
+}
+#endif /* defined(COMMIT_DIGEST)||defined(PACKDB_TEST)||defined(COMMIT_DIGEST_TEST) */
+
+int verify_commit(struct commit *commit) {
+#ifdef COMMIT_DIGEST
+	if (save_commit_buffer) {
+		mdigest_t edigest;
+		mdigest_t digest;
+		const char *author = NULL;
+		size_t author_len = 0;
+		const char *committer = NULL;
+		size_t committer_len = 0;
+		const char *encoding = NULL;
+		size_t encoding_len = 0;
+		const char *msg = NULL;
+		size_t msg_len = 0;
+		const char *mdstring = NULL;
+		size_t mdstring_len = 0;
+		const char *bufptr;
+		const char *tail;
+		struct commit_extra_header *extra = NULL;
+
+		parse_commit(commit);
+		extra = read_commit_extra_header_lines(commit->buffer,
+						       commit->buffer_len);
+		bufptr = commit->buffer;
+		tail = bufptr + commit->buffer_len;
+		while (*bufptr != '\n' && bufptr < tail) {
+			if (*bufptr == 'a') {
+				if ((bufptr + 7) < tail) {
+					if (memcmp(bufptr, "author ", 7) == 0) {
+						bufptr += 7;
+						author = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						author_len = bufptr - author;
+					} else while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else if (*bufptr == 'c') {
+				if ((bufptr + 10) < tail) {
+					if (memcmp(bufptr,
+						   "committer ", 10) == 0) {
+						bufptr += 10;
+						committer = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						committer_len =
+							bufptr - committer;
+					} else  while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else if(*bufptr == 'e') {
+				if ((bufptr + 9) < tail) {
+					if (memcmp(bufptr,
+						   "encoding ", 9) == 0) {
+						bufptr += 9;
+						encoding = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						encoding_len =
+							bufptr - encoding;
+					} else  while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else if (*bufptr == 'd') {
+				if ((bufptr + 7) < tail) {
+					if (memcmp(bufptr, "digest ", 7) == 0) {
+						bufptr += 7;
+						mdstring = bufptr;
+						while (*bufptr != '\n' &&
+						       bufptr < tail) {
+							bufptr++;
+						}
+						mdstring_len =
+							bufptr - mdstring;
+					} else  while (*bufptr != '\n' &&
+						      bufptr < tail) bufptr++;
+					if (bufptr < tail) bufptr++;
+				}
+			} else {
+				while (*bufptr != '\n' && bufptr < tail)
+					bufptr++;
+				if (bufptr < tail) bufptr++;
+			}
+		}
+		if (*bufptr == '\n' && bufptr < tail) {
+			bufptr++;
+			msg = bufptr;
+			msg_len = tail - bufptr;
+		}
+		if (mdstring &&
+		    get_mdigest_from_external_hex(&edigest, mdstring) < 0) {
+			return -1;
+		}
+		if (author && committer && msg) {
+			if (mdstring == NULL) return 0;
+			if (get_objects_mds(commit->object.sha1,
+					    commit->parents,
+					    author, author_len,
+					    committer, committer_len,
+					    encoding, encoding_len,
+					    extra,
+					    msg, msg_len,
+					    &digest)) {
+				return -1;
+			}
+			return mdigest_tst(&edigest, &digest);
+		} else {
+			return -1;
+		}
+	} else {
+	  return 0;
+	}
+#else /* COMMIT_DIGEST */
+	return 0;
+#endif /* COMMIT_DIGEST */
+}
+
 static struct commit *check_commit(struct object *obj,
 				   const unsigned char *sha1,
 				   int quiet)
@@ -325,6 +709,9 @@ int parse_commit(struct commit *item)
 	ret = parse_commit_buffer(item, buffer, size);
 	if (save_commit_buffer && !ret) {
 		item->buffer = buffer;
+#ifdef COMMIT_DIGEST
+		item->buffer_len = (size_t) size;
+#endif
 		return 0;
 	}
 	free(buffer);
@@ -916,6 +1303,9 @@ static inline int standard_header_field(const char *field, size_t len)
 {
 	return ((len == 4 && !memcmp(field, "tree ", 5)) ||
 		(len == 6 && !memcmp(field, "parent ", 7)) ||
+#ifdef COMMIT_DIGEST
+		(len == 6 && !memcmp(field, "digest ", 7)) ||
+#endif
 		(len == 6 && !memcmp(field, "author ", 7)) ||
 		(len == 9 && !memcmp(field, "committer ", 10)) ||
 		(len == 8 && !memcmp(field, "encoding ", 9)));
@@ -998,12 +1388,43 @@ int commit_tree_extended(const char *msg, unsigned char *tree,
 	int result;
 	int encoding_is_utf8;
 	struct strbuf buffer;
+	static char committer[1000];
+	const char *encoding = NULL;
+#if defined(COMMIT_DIGEST) || defined (COMMIT_DIGEST_TEST)
+	mdigest_t digest;
+#endif /* defined(COMMIT_DIGEST) || defined(PACKDB_TEST) */
+	/*
+	 * git_committer_info returns a static buffer of size 1000, so
+	 * we have to copy it - assume git_committer_info does necessary
+	 * buffer-overflow tests.
+	 */
+	strcpy (committer, git_committer_info(IDENT_ERROR_ON_NO_NAME));
 
 	assert_sha1_type(tree, OBJ_TREE);
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
+	if (!encoding_is_utf8)
+		encoding = git_commit_encoding;
 
+	/* Person/date information setup*/
+	if (!author)
+		author = git_author_info(IDENT_ERROR_ON_NO_NAME);
+#if defined(COMMIT_DIGEST) || defined(PACKDB_TEST) || defined(COMMIT_DIGEST_TEST)
+	/*
+	 * Have all the pieces so compute the message digest. We do it here
+	 * because the list 'parents' will be destroyed by the following loop.
+	 */
+	if (get_objects_mds(tree, parents,
+			    author, (author? strlen(author): 0),
+			    committer, strlen(committer),
+			    encoding, (encoding? strlen(encoding): 0),
+			    extra,
+			    msg, (msg? strlen(msg): 0),
+			    &digest)) {
+		die("could not compute message digest for commit");
+	}
+#endif /* defined(COMMIT_DIGEST) || defined(PACKDB_TEST)  || defined(COMMIT_DIGEST_TEST)*/
 	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
 	strbuf_addf(&buffer, "tree %s\n", sha1_to_hex(tree));
 
@@ -1023,17 +1444,19 @@ int commit_tree_extended(const char *msg, unsigned char *tree,
 	}
 
 	/* Person/date information */
-	if (!author)
-		author = git_author_info(IDENT_ERROR_ON_NO_NAME);
 	strbuf_addf(&buffer, "author %s\n", author);
-	strbuf_addf(&buffer, "committer %s\n", git_committer_info(IDENT_ERROR_ON_NO_NAME));
+	strbuf_addf(&buffer, "committer %s\n", committer);
+
 	if (!encoding_is_utf8)
 		strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
-
 	while (extra) {
 		add_extra_header(&buffer, extra);
 		extra = extra->next;
 	}
+#ifdef COMMIT_DIGEST
+	strbuf_addf(&buffer, "digest %s\n", mdigest_to_external_hex(&digest));
+#endif /* COMMIT_DIGEST */
+
 	strbuf_addch(&buffer, '\n');
 
 	/* And add the comment */
@@ -1045,6 +1468,11 @@ int commit_tree_extended(const char *msg, unsigned char *tree,
 
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
+#if defined(COMMIT_DIGEST) && defined (COMMIT_DIGEST_TEST)
+	if (verify_commit(lookup_commit(ret))) {
+	    die("commit verification failed for %s\n", sha1_to_hex(ret));
+	  }
+#endif
 	return result;
 }
 
diff --git a/commit.h b/commit.h
index 3745f12..7a91519 100644
--- a/commit.h
+++ b/commit.h
@@ -5,6 +5,7 @@
 #include "tree.h"
 #include "strbuf.h"
 #include "decorate.h"
+#include "mdigest.h"
 
 struct commit_list {
 	struct commit *item;
@@ -19,6 +20,9 @@ struct commit {
 	struct commit_list *parents;
 	struct tree *tree;
 	char *buffer;
+#ifdef COMMIT_DIGEST
+	size_t buffer_len;
+#endif
 };
 
 extern int save_commit_buffer;
@@ -218,4 +222,11 @@ struct merge_remote_desc {
  */
 struct commit *get_merge_parent(const char *name);
 
+/*
+ * Returns 0 if OK or if save_commit_buffer == 0 or if COMMIT_DIGEST was
+ * not defined during compilation; non-zero otherwise.  If a commit does
+ * not have a digest field, 0 is returned.
+ */
+extern int verify_commit(struct commit *commit);
+
 #endif /* COMMIT_H */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/6] Add MD support for packfiles, fast-import, and pruning.
From: Bill Zaumen @ 2011-12-21  7:11 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

The utilities for creating and querying pack files,
for the fast-import of files, and for pruning a
git repository were modified to support message digests
(either in individual files or in 'mds' files that
parallel pack index files).

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 builtin/count-objects.c   |   92 +++++++++++++++++++++++++++++-
 builtin/index-pack.c      |  139 +++++++++++++++++++++++++++++++++++++++++----
 builtin/pack-objects.c    |   81 ++++++++++++++++++++++++++-
 builtin/pack-redundant.c  |   14 ++++-
 builtin/prune-packed.c    |   21 ++++++-
 builtin/prune.c           |    1 +
 builtin/verify-pack.c     |   14 ++++-
 fast-import.c             |   77 ++++++++++++++++++++++++-
 git-repack.sh             |   12 +++-
 t/t5300-pack-object.sh    |   17 +++++-
 t/t5301-sliding-window.sh |   14 ++++-
 t/t5302-pack-index.sh     |    6 +-
 t/t5304-prune.sh          |   13 +++--
 t/t9300-fast-import.sh    |    8 ++-
 14 files changed, 467 insertions(+), 42 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index c37cb98..47135e1 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -8,6 +8,12 @@
 #include "dir.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "mdsdb.h"
+
+int mdsmode = 0;
+unsigned long has_loose_mds = 0;
+unsigned long loose_mds_missing = 0;
+
 
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
@@ -53,20 +59,37 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 			continue;
 		}
 		(*loose)++;
-		if (!verbose)
+		if (!verbose) {
+			if (mdsmode) {
+				if (get_sha1_hex(hex, sha1)) {
+					die("internal error");
+				} else if (mdsdb_lookup(NULL, sha1, NULL) > 0) {
+					has_loose_mds++;
+				} else {
+					loose_mds_missing++;
+				}
+			}
 			continue;
+		}
 		memcpy(hex, path+len, 2);
 		memcpy(hex+2, ent->d_name, 38);
 		hex[40] = 0;
 		if (get_sha1_hex(hex, sha1))
 			die("internal error");
+		if (mdsmode) {
+			if (mdsdb_lookup(NULL, sha1, NULL) > 0) {
+				has_loose_mds++;
+			} else {
+				loose_mds_missing++;
+			}
+		}
 		if (has_sha1_pack(sha1))
 			(*packed_loose)++;
 	}
 }
 
 static char const * const count_objects_usage[] = {
-	"git count-objects [-v]",
+	"git count-objects [-v] [-M]",
 	NULL
 };
 
@@ -80,6 +103,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	off_t loose_size = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, "be verbose"),
+		OPT_BOOLEAN('M', "count-md", &mdsmode,
+			    "count MDs (Message Digests)"),
 		OPT_END(),
 	};
 
@@ -90,6 +115,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	memcpy(path, objdir, len);
 	if (len && objdir[len-1] != '/')
 		path[len++] = '/';
+	mdsdb_open(NULL);
 	for (i = 0; i < 256; i++) {
 		DIR *d;
 		sprintf(path + len, "%02x", i);
@@ -100,10 +126,16 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 			      &loose, &loose_size, &packed_loose, &garbage);
 		closedir(d);
 	}
+	mdsdb_close(NULL);
 	if (verbose) {
 		struct packed_git *p;
 		unsigned long num_pack = 0;
 		off_t size_pack = 0;
+		unsigned long mds_mismatched = 0;
+		unsigned long missing_mdsfile_count = 0;
+		unsigned long mds_count = 0;
+		int wsize = 0;
+		mdigest_t digest;
 		if (!packed_git)
 			prepare_packed_git();
 		for (p = packed_git; p; p = p->next) {
@@ -114,6 +146,40 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 			packed += p->num_objects;
 			size_pack += p->pack_size + p->index_size;
 			num_pack++;
+			if (!mdsmode)
+				continue;
+			if (open_pack_mds(p)) {
+				missing_mdsfile_count++;
+				continue;
+			}
+			/*
+			 * Assume mds version 1 for now. We check that
+			 * the mds file has the right size and record if it
+			 * doesn't.  If it is the right size, we go through
+			 * all the entries and count the number of sha1 hashes
+			 * for which there is a recorded CRC.  We do not
+			 * check if the CRC is the right one for the
+			 * corresponding object: run git pack-verify to do
+			 * that.
+			 */
+			if (p->mds_size > 7) {
+			  wsize = ((unsigned char *)(p->mds_data))[7] * 4;
+			}
+			if (p->mds_size == (size_t)8 +
+			    (((size_t)
+			      ((p->num_objects)/4 + (p->num_objects % 4 != 0))
+			      * (size_t)4 * (size_t)(1 + wsize)) +
+			     (size_t)(20 * 2))) {
+				for (i = 0; i < p->num_objects; i++) {
+					mds_count +=
+					  (nth_packed_object_mdigest(p,
+								      i,
+								      &digest)
+					   == 1);
+				}
+			} else {
+			  mds_mismatched++;
+			}
 		}
 		printf("count: %lu\n", loose);
 		printf("size: %lu\n", (unsigned long) (loose_size / 1024));
@@ -122,9 +188,31 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		printf("size-pack: %lu\n", (unsigned long) (size_pack / 1024));
 		printf("prune-packable: %lu\n", packed_loose);
 		printf("garbage: %lu\n", garbage);
+		if (mdsmode) {
+			if (missing_mdsfile_count) {
+				printf("missing MD (Message Digest) "
+				       "files: %lu\n",
+					missing_mdsfile_count);
+			}
+			if (mds_mismatched)
+				printf("MD (Message Digest) files with"
+				       " wrong size: %lu "
+				       "(file extension = .mds)\n",
+					mds_mismatched);
+			if (packed != mds_count) {
+				printf("missing MD (Message Digest)"
+				       " count: %lu\n",
+				       packed - mds_count);
+			}
+		}
 	}
 	else
 		printf("%lu objects, %lu kilobytes\n",
 		       loose, (unsigned long) (loose_size / 1024));
+	if (mdsmode && loose_mds_missing) {
+		assert(loose == (loose_mds_missing + has_loose_mds));
+		printf("%lu loose objects with no MD (Message Digest)\n",
+		       loose_mds_missing);
+	}
 	return 0;
 }
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 98025da..127f879 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1,3 +1,4 @@
+#include <unistd.h>
 #include "builtin.h"
 #include "delta.h"
 #include "pack.h"
@@ -23,6 +24,14 @@ struct object_entry {
 	int base_object_no;
 };
 
+static int sha1_compare(const void *_a, const void *_b)
+{
+	struct object_entry *a = (struct object_entry *)_a;
+	struct object_entry *b = (struct object_entry *)_b;
+	return hashcmp(a->idx.sha1, b->idx.sha1);
+}
+
+
 union delta_base {
 	unsigned char sha1[20];
 	off_t offset;
@@ -447,9 +456,10 @@ static void find_delta_children(const union delta_base *base,
 }
 
 static void sha1_object(const void *data, unsigned long size,
-			enum object_type type, unsigned char *sha1)
+			enum object_type type, unsigned char *sha1,
+			mdigest_t *digestp)
 {
-	hash_sha1_file(data, size, typename(type), sha1);
+	hash_sha1_file_extended(data, size, typename(type), sha1, digestp);
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
@@ -549,7 +559,8 @@ static void resolve_delta(struct object_entry *delta_obj,
 	if (!result->data)
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
-		    delta_obj->idx.sha1);
+		    delta_obj->idx.sha1, &(delta_obj->idx.digest));
+	delta_obj->idx.has_digest = 1;
 	nr_resolved_deltas++;
 }
 
@@ -643,8 +654,12 @@ static void parse_pack_objects(unsigned char *sha1)
 			nr_deltas++;
 			delta->obj_no = i;
 			delta++;
-		} else
-			sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+		} else {
+		  sha1_object(data, obj->size, obj->type, obj->idx.sha1,
+			      &(obj->idx.digest));
+		  obj->idx.has_digest = 1;
+		}
+
 		free(data);
 		display_progress(progress, i+1);
 	}
@@ -804,6 +819,7 @@ static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved)
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
+		  const char *final_mds_name, const char *curr_mds_name,
 		  const char *keep_name, const char *keep_msg,
 		  unsigned char *sha1)
 {
@@ -866,6 +882,18 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	} else
 		chmod(final_index_name, 0444);
 
+	if (final_mds_name != curr_mds_name) {
+		if (!final_mds_name) {
+			snprintf(name, sizeof(name), "%s/pack/pack-%s.mds",
+				 get_object_directory(), sha1_to_hex(sha1));
+			final_mds_name = name;
+		}
+		if (move_temp_to_file(curr_mds_name, final_mds_name))
+			die("cannot store mds file");
+	} else
+		chmod(final_mds_name, 0444);
+
+
 	if (!from_stdin) {
 		printf("%s\n", sha1_to_hex(sha1));
 	} else {
@@ -972,18 +1000,46 @@ static void read_idx_option(struct pack_idx_option *opts, const char *pack_name)
 	free(p);
 }
 
-static void show_pack_info(int stat_only)
+static void show_pack_info(int stat, int stat_only, int show_mds,
+			   int mds_file_exists, const char *path)
 {
 	int i, baseobjects = nr_objects - nr_deltas;
 	unsigned long *chain_histogram = NULL;
+	void *data = NULL;
+	size_t mds_size = 0;
+	struct packed_git pg;
+
+	if (mds_file_exists) {
+		int fd = git_open_noatime(path);
+		size_t required_size = 0;
+		struct stat st;
+		if (fd >= 0) {
+			if (fstat(fd, &st)) {
+				close(fd);
+			} else {
+				mds_size = xsize_t(st.st_size);
+				data = xmmap(NULL, mds_size,
+						PROT_READ, MAP_PRIVATE, fd, 0);
+				close(fd);
+				required_size = required_git_packed_mds_size
+					(path, data, nr_objects, mds_size);
+				if (required_size == 0) {
+					munmap(data, mds_size);
+					data = NULL;
+				}
+			}
+		}
+		if (data == NULL) mds_file_exists = 0;
+		pg.mds_data = data;
+	}
 
-	if (deepest_delta)
+	if (stat && deepest_delta)
 		chain_histogram = xcalloc(deepest_delta, sizeof(unsigned long));
 
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
 
-		if (is_delta_type(obj->type))
+		if (chain_histogram && is_delta_type(obj->type))
 			chain_histogram[obj->delta_depth - 1]++;
 		if (stat_only)
 			continue;
@@ -992,12 +1048,41 @@ static void show_pack_info(int stat_only)
 		       typename(obj->real_type), obj->size,
 		       (unsigned long)(obj[1].idx.offset - obj->idx.offset),
 		       (uintmax_t)obj->idx.offset);
+		if (show_mds) {
+			if (mds_file_exists) {
+				mdigest_t digest;
+				int has_digest = nth_packed_object_mdigest
+					(&pg, i, &digest);
+				if (has_digest) {
+					printf(" md=%s",
+					       mdigest_to_external_hex
+					       (&digest));
+					if (obj->idx.has_digest) {
+						if (mdigest_tst
+						    (&digest,
+						     &(obj->idx.digest))) {
+							printf
+							(" (should be %s) ",
+							 mdigest_to_external_hex
+							 (&(obj->idx.digest)));
+						}
+					}
+				} else {
+					printf(" <no md>      ");
+				}
+			} else {
+				printf(" <no md>      ");
+			}
+		}
 		if (is_delta_type(obj->type)) {
 			struct object_entry *bobj = &objects[obj->base_object_no];
 			printf(" %u %s", obj->delta_depth, sha1_to_hex(bobj->idx.sha1));
 		}
 		putchar('\n');
 	}
+	if (data) munmap(data, mds_size);
+	if (!stat)
+		return;
 
 	if (baseobjects)
 		printf("non delta: %d object%s\n",
@@ -1015,10 +1100,12 @@ static void show_pack_info(int stat_only)
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, stat = 0;
-	const char *curr_pack, *curr_index;
-	const char *index_name = NULL, *pack_name = NULL;
+	int show_mds = 0;
+	const char *curr_pack, *curr_index, *curr_mds;
+	const char *index_name = NULL, *pack_name = NULL, *mds_name = NULL;;
 	const char *keep_name = NULL, *keep_msg = NULL;
 	char *index_name_buf = NULL, *keep_name_buf = NULL;
+	char *mds_name_buf = NULL;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
@@ -1052,6 +1139,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				verify = 1;
 				stat = 1;
 				stat_only = 1;
+			} else if (!strcmp(arg, "-M") ||
+				   !strcmp(arg, "--show-mds")) {
+				verify = 1;
+				show_mds = 1;
 			} else if (!strcmp(arg, "--keep")) {
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
@@ -1075,6 +1166,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				if (index_name || (i+1) >= argc)
 					usage(index_pack_usage);
 				index_name = argv[++i];
+			} else if (!strcmp(arg, "-m")) {
+				if (mds_name || (i+1) >= argc)
+					usage(index_pack_usage);
+				mds_name = argv[++i];
 			} else if (!prefixcmp(arg, "--index-version=")) {
 				char *c;
 				opts.version = strtoul(arg + 16, &c, 10);
@@ -1108,6 +1203,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		strcpy(index_name_buf + len - 5, ".idx");
 		index_name = index_name_buf;
 	}
+	if (!mds_name && pack_name) {
+		int len = strlen(pack_name);
+		if (!has_extension(pack_name, ".pack"))
+			die("packfile name '%s' does not end with '.pack'",
+			    pack_name);
+		mds_name_buf = xmalloc(len);
+		memcpy(mds_name_buf, pack_name, len - 5);
+		strcpy(mds_name_buf + len - 5, ".mds");
+		mds_name = mds_name_buf;
+	}
 	if (keep_msg && !keep_name && pack_name) {
 		int len = strlen(pack_name);
 		if (!has_extension(pack_name, ".pack"))
@@ -1170,24 +1275,34 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		check_objects();
 
-	if (stat)
-		show_pack_info(stat_only);
+	if (stat || show_mds) {
+		int mds_file_exists = !access(mds_name, R_OK);
+		if (mds_file_exists && show_mds) {
+		  qsort (objects, nr_objects, sizeof (struct object_entry),
+			 sha1_compare);
+		}
+		show_pack_info(stat, stat_only, show_mds, mds_file_exists,
+			       mds_name);
+	}
 
 	idx_objects = xmalloc((nr_objects) * sizeof(struct pack_idx_entry *));
 	for (i = 0; i < nr_objects; i++)
 		idx_objects[i] = &objects[i].idx;
 	curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_sha1);
+	curr_mds = write_mds_file(mds_name, idx_objects, nr_objects, &opts,pack_sha1);
 	free(idx_objects);
 
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
+		      mds_name, curr_mds,
 		      keep_name, keep_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
 	free(objects);
 	free(index_name_buf);
+	free(mds_name_buf);
 	free(keep_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 96c1680..ccfe824 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -17,6 +17,7 @@
 #include "progress.h"
 #include "refs.h"
 #include "thread-utils.h"
+#include "mdsdb.h"
 
 static const char pack_usage[] =
   "git pack-objects [ -q | --progress | --all-progress ]\n"
@@ -560,6 +561,8 @@ static struct object_entry **compute_write_order(void)
 		objects[i].filled = 0;
 		objects[i].delta_child = NULL;
 		objects[i].delta_sibling = NULL;
+		objects[i].idx.has_digest = 0;
+		mdigest_clear(&objects[i].idx.digest);
 	}
 
 	/*
@@ -684,8 +687,28 @@ static void write_pack_file(void)
 
 		if (!pack_to_stdout) {
 			struct stat st;
+#if 1
 			char tmpname[PATH_MAX];
 
+#else
+			const char *idx_tmp_name;
+			const char *mds_tmp_name;
+			char tmpname[PATH_MAX];
+
+			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
+						      &pack_idx_opts, sha1);
+			mds_tmp_name = write_mds_file(NULL, written_list, nr_written,
+						      &pack_idx_opts, sha1);
+
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
+				 base_name, sha1_to_hex(sha1));
+			free_pack_by_name(tmpname);
+			if (adjust_shared_perm(pack_tmp_name))
+				die_errno("unable to make temporary pack file readable");
+			if (rename(pack_tmp_name, tmpname))
+				die_errno("unable to rename temporary pack file");
+#endif
+
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -707,6 +730,7 @@ static void write_pack_file(void)
 						tmpname, strerror(errno));
 			}
 
+#if 1
 			/* Enough space for "-<sha-1>.pack"? */
 			if (sizeof(tmpname) <= strlen(base_name) + 50)
 				die("pack base name '%s' too long", base_name);
@@ -714,6 +738,25 @@ static void write_pack_file(void)
 			finish_tmp_packfile(tmpname, pack_tmp_name,
 					    written_list, nr_written,
 					    &pack_idx_opts, sha1);
+#else
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.idx",
+				 base_name, sha1_to_hex(sha1));
+			if (adjust_shared_perm(idx_tmp_name))
+				die_errno("unable to make temporary index file readable");
+			if (rename(idx_tmp_name, tmpname))
+				die_errno("unable to rename temporary index file");
+
+			snprintf(tmpname, sizeof(tmpname), "%s-%s.mds",
+				 base_name, sha1_to_hex(sha1));
+			if (adjust_shared_perm(mds_tmp_name))
+				die_errno("unable to make temporary mds file readable");
+			if (rename(mds_tmp_name, tmpname))
+				die_errno("unable to rename temporary mds file");
+
+
+			free((void *) idx_tmp_name);
+			free((void *) mds_tmp_name);
+#endif
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
@@ -830,6 +873,8 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 	off_t found_offset = 0;
 	int ix;
 	unsigned hash = name_hash(name);
+	int hasdigest;
+	mdigest_t digest;
 
 	ix = nr_objects ? locate_object_entry_hash(sha1) : -1;
 	if (ix >= 0) {
@@ -846,7 +891,11 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 		return 0;
 
 	for (p = packed_git; p; p = p->next) {
-		off_t offset = find_pack_entry_one(sha1, p);
+		off_t offset;
+		hasdigest = 0;
+		mdigest_clear(&digest);
+		offset = find_pack_entry_one_extended(sha1, p,
+						   &hasdigest, &digest);
 		if (offset) {
 			if (!found_pack) {
 				if (!is_pack_valid(p)) {
@@ -874,7 +923,37 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 
 	entry = objects + nr_objects++;
 	memset(entry, 0, sizeof(*entry));
+	if (hasdigest == 0) {
+		/*
+		 * We pick up MDs for local objects (we already checked the
+		 * pack files).  If that doesn't work, we compute it from
+		 * scratch (which should occur rarely if at all).
+		 */
+		mdsdb_open(NULL);
+		switch (mdsdb_lookup(NULL, sha1, &digest)) {
+		case 1:
+			hasdigest = 1;
+			break;
+		default:
+			hasdigest = 0;
+		}
+		mdsdb_close(NULL);
+		if (!hasdigest) {
+		  enum object_type type;
+		  unsigned long size;
+		  unsigned char sbuf[20];
+		  void *buf = read_sha1_file(sha1, &type, &size);
+		  if (buf) {
+			const char *stype = typename(type);
+			hash_sha1_file_extended(buf, size, stype, sbuf,
+						&digest);
+			hasdigest = 1;
+		  }
+		}
+	}
 	hashcpy(entry->idx.sha1, sha1);
+	entry->idx.has_digest = hasdigest;
+	entry->idx.digest = digest;
 	entry->hash = hash;
 	if (type)
 		entry->type = type;
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index f5c6afc..c09397c 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -6,6 +6,7 @@
 *
 */
 
+#include <unistd.h>
 #include "builtin.h"
 
 #define BLKSIZE 512
@@ -682,9 +683,16 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 	}
 	pl = red = pack_list_difference(local_packs, min);
 	while (pl) {
-		printf("%s\n%s\n",
-		       sha1_pack_index_name(pl->pack->sha1),
-		       pl->pack->pack_name);
+		char *mdsfile = sha1_pack_mds_name(pl->pack->sha1);
+		if (!access(mdsfile, F_OK)) {
+		  printf("%s\n%s\n%s\n", mdsfile,
+			       sha1_pack_index_name(pl->pack->sha1),
+			       pl->pack->pack_name);
+		} else {
+			printf("%s\n%s\n",
+			       sha1_pack_index_name(pl->pack->sha1),
+			       pl->pack->pack_name);
+		}
 		pl = pl->next;
 	}
 	if (verbose)
diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index f9463de..ec5dfe8 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -43,8 +43,11 @@ void prune_packed_objects(int opts)
 {
 	int i;
 	static char pathname[PATH_MAX];
+	static char mds_pathname[PATH_MAX];
 	const char *dir = get_object_directory();
+	const char *mdsdir = get_object_mds_directory();
 	int len = strlen(dir);
+	int mdslen = strlen(mdsdir);
 
 	if (opts == VERBOSE)
 		progress = start_progress_delay("Removing duplicate objects",
@@ -55,16 +58,26 @@ void prune_packed_objects(int opts)
 	memcpy(pathname, dir, len);
 	if (len && pathname[len-1] != '/')
 		pathname[len++] = '/';
+	memcpy(mds_pathname, mdsdir, mdslen);
+	if (mdslen && mds_pathname[mdslen-1] != '/')
+		mds_pathname[mdslen++] = '/';
 	for (i = 0; i < 256; i++) {
 		DIR *d;
+		DIR *mds_d;
 
 		display_progress(progress, i + 1);
 		sprintf(pathname + len, "%02x/", i);
 		d = opendir(pathname);
-		if (!d)
-			continue;
-		prune_dir(i, d, pathname, len + 3, opts);
-		closedir(d);
+		sprintf(mds_pathname + len, "%02x/", i);
+		mds_d = opendir(mds_pathname);
+		if (d) {
+			prune_dir(i, d, pathname, len + 3, opts);
+			closedir(d);
+		}
+		if (mds_d) {
+			prune_dir(i, mds_d, mds_pathname, mdslen + 3, opts);
+			closedir(mds_d);
+		}
 	}
 	stop_progress(&progress);
 }
diff --git a/builtin/prune.c b/builtin/prune.c
index 58d7cb8..25dde51 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -165,6 +165,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	mark_reachable_objects(&revs, 1, progress);
 	stop_progress(&progress);
 	prune_object_dir(get_object_directory());
+	prune_object_dir(get_object_mds_directory());
 
 	prune_packed_objects(show_only);
 	remove_temporary_files(get_object_directory());
diff --git a/builtin/verify-pack.c b/builtin/verify-pack.c
index e841b4a..b94a11e 100644
--- a/builtin/verify-pack.c
+++ b/builtin/verify-pack.c
@@ -5,14 +5,16 @@
 
 #define VERIFY_PACK_VERBOSE 01
 #define VERIFY_PACK_STAT_ONLY 02
+#define SHOW_MDS 04
 
 static int verify_one_pack(const char *path, unsigned int flags)
 {
 	struct child_process index_pack;
-	const char *argv[] = {"index-pack", NULL, NULL, NULL };
+	const char *argv[] = {"index-pack", NULL, NULL, NULL, NULL };
 	struct strbuf arg = STRBUF_INIT;
 	int verbose = flags & VERIFY_PACK_VERBOSE;
 	int stat_only = flags & VERIFY_PACK_STAT_ONLY;
+	int show_mds = ((flags & SHOW_MDS) != 0)  && !stat_only;
 	int err;
 
 	if (stat_only)
@@ -22,6 +24,8 @@ static int verify_one_pack(const char *path, unsigned int flags)
 	else
 		argv[1] = "--verify";
 
+	if (show_mds) argv[2] = "-M";
+
 	/*
 	 * In addition to "foo.pack" we accept "foo.idx" and "foo";
 	 * normalize these forms to "foo.pack" for "index-pack --verify".
@@ -31,7 +35,7 @@ static int verify_one_pack(const char *path, unsigned int flags)
 		strbuf_splice(&arg, arg.len - 3, 3, "pack", 4);
 	else if (!has_extension(arg.buf, ".pack"))
 		strbuf_add(&arg, ".pack", 5);
-	argv[2] = arg.buf;
+	argv[2 + show_mds] = arg.buf;
 
 	memset(&index_pack, 0, sizeof(index_pack));
 	index_pack.argv = argv;
@@ -46,6 +50,10 @@ static int verify_one_pack(const char *path, unsigned int flags)
 			if (!stat_only)
 				printf("%s: ok\n", arg.buf);
 		}
+	} else if (show_mds) {
+		printf("%s: listed (%s)\n-----------------\n", arg.buf,
+		       (err? "bad": "ok"));
+
 	}
 	strbuf_release(&arg);
 
@@ -67,6 +75,8 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
 			VERIFY_PACK_VERBOSE),
 		OPT_BIT('s', "stat-only", &flags, "show statistics only",
 			VERIFY_PACK_STAT_ONLY),
+		OPT_BIT('M', "show-mds", &flags, "show message digests / CRCs",
+			SHOW_MDS),
 		OPT_END()
 	};
 
diff --git a/fast-import.c b/fast-import.c
index 4b9c4b7..f672a76 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -165,6 +165,11 @@ Format of STDIN stream:
 #include "exec_cmd.h"
 #include "dir.h"
 
+#ifdef PACKDB
+#include "packdb.h"
+#endif
+
+
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
 #define DEPTH_BITS 13
@@ -558,6 +563,8 @@ static struct object_entry *new_object(unsigned char *sha1)
 
 	e = blocks->next_free++;
 	hashcpy(e->idx.sha1, sha1);
+	e->idx.has_digest = 0;
+	mdigest_clear(&e->idx.digest);
 	return e;
 }
 
@@ -904,9 +911,34 @@ static const char *create_index(void)
 	return tmpfile;
 }
 
-static char *keep_pack(const char *curr_index_name)
+static const char *create_mds(void)
+{
+	const char *tmpfile;
+	struct pack_idx_entry **mds, **c, **last;
+	struct object_entry *e;
+	struct object_entry_pool *o;
+
+	/* Build the table of object IDs. */
+	mds = xmalloc(object_count * sizeof(*mds));
+	c = mds;
+	for (o = blocks; o; o = o->next_pool)
+		for (e = o->next_free; e-- != o->entries;)
+			if (pack_id == e->pack_id)
+				*c++ = &e->idx;
+	last = mds + object_count;
+	if (c != last)
+		die("internal consistency error creating the mds file");
+
+	tmpfile = write_mds_file(NULL, mds, object_count, &pack_idx_opts, pack_data->sha1);
+	free(mds);
+	return tmpfile;
+}
+
+
+static char *keep_pack(const char *curr_index_name, const char *curr_mds_name)
 {
 	static char name[PATH_MAX];
+	static char rname[PATH_MAX];
 	static const char *keep_msg = "fast-import";
 	int keep_fd;
 
@@ -927,6 +959,13 @@ static char *keep_pack(const char *curr_index_name)
 	if (move_temp_to_file(curr_index_name, name))
 		die("cannot store index file");
 	free((void *)curr_index_name);
+
+	snprintf(rname, sizeof(rname), "%s/pack/pack-%s.mds",
+		 get_object_directory(), sha1_to_hex(pack_data->sha1));
+	if (move_temp_to_file(curr_mds_name, rname))
+		die("cannot store index file");
+	free((void *)curr_mds_name);
+
 	return name;
 }
 
@@ -951,6 +990,7 @@ static void end_packfile(void)
 	if (object_count) {
 		unsigned char cur_pack_sha1[20];
 		char *idx_name;
+		const char *n1, *n2;
 		int i;
 		struct branch *b;
 		struct tag *t;
@@ -961,7 +1001,9 @@ static void end_packfile(void)
 				    pack_data->pack_name, object_count,
 				    cur_pack_sha1, pack_size);
 		close(pack_data->pack_fd);
-		idx_name = keep_pack(create_index());
+		n1 = create_index();
+		n2 = create_mds();
+		idx_name = keep_pack(n1, n2);
 
 		/* Register the packfile with core git's machinery. */
 		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
@@ -1021,6 +1063,8 @@ static int store_object(
 	unsigned long hdrlen, deltalen;
 	git_SHA_CTX c;
 	git_zstream s;
+	mdigest_t digest;
+	mdigest_context_t mdc;
 
 	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
 		(unsigned long)dat->len) + 1;
@@ -1028,10 +1072,28 @@ static int store_object(
 	git_SHA1_Update(&c, hdr, hdrlen);
 	git_SHA1_Update(&c, dat->buf, dat->len);
 	git_SHA1_Final(sha1, &c);
+	mdigest_Init(&mdc, MDIGEST_DEFAULT);
+	mdigest_Update(&mdc, (unsigned char *)(dat->buf), dat->len);
+	mdigest_Final(&digest, &mdc);
+
+	if (has_sha1_file(sha1)) {
+		mdigest_t old_digest;
+		if (has_sha1_file_digest(sha1, &old_digest)) {
+		  if (mdigest_tst(&digest,&old_digest)) {
+				die("hash collision on %s [fast-import]",
+				    sha1_to_hex(sha1));
+			}
+		}
+	}
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
 	e = insert_object(sha1);
+	e->idx.has_digest = 1;
+	e->idx.digest = digest;
+#ifdef PACKDB
+	packdb_process(sha1, &digest);
+#endif
 	if (mark)
 		insert_mark(mark, e);
 	if (e->idx.offset) {
@@ -1157,6 +1219,8 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	unsigned char *out_buf = xmalloc(out_sz);
 	struct object_entry *e;
 	unsigned char sha1[20];
+	mdigest_t digest;
+	mdigest_context_t mdc;
 	unsigned long hdrlen;
 	off_t offset;
 	git_SHA_CTX c;
@@ -1177,6 +1241,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 		die("impossibly large object header");
 
 	git_SHA1_Init(&c);
+	mdigest_Init(&mdc, MDIGEST_DEFAULT);
 	git_SHA1_Update(&c, out_buf, hdrlen);
 
 	crc32_begin(pack_file);
@@ -1199,6 +1264,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 				die("EOF in data (%" PRIuMAX " bytes remaining)", len);
 
 			git_SHA1_Update(&c, in_buf, n);
+			mdigest_Update(&mdc, in_buf, n);
 			s.next_in = in_buf;
 			s.avail_in = n;
 			len -= n;
@@ -1225,11 +1291,14 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	}
 	git_deflate_end(&s);
 	git_SHA1_Final(sha1, &c);
+	mdigest_Final(&digest, &mdc);
 
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
 	e = insert_object(sha1);
+	e->idx.has_digest = 1;
+	e->idx.digest =  digest;
 
 	if (mark)
 		insert_mark(mark, e);
@@ -1828,6 +1897,8 @@ static void read_marks(void)
 			if (type < 0)
 				die("object not found: %s", sha1_to_hex(sha1));
 			e = insert_object(sha1);
+			e->idx.has_digest =
+				has_sha1_file_digest(sha1, &e->idx.digest);
 			e->type = type;
 			e->pack_id = MAX_PACK_ID;
 			e->idx.offset = 1; /* just not zero! */
@@ -2896,6 +2967,8 @@ static struct object_entry *dereference(struct object_entry *oe,
 			die("object not found: %s", sha1_to_hex(sha1));
 		/* cache it! */
 		oe = insert_object(sha1);
+		oe->idx.has_digest =
+			has_sha1_file_digest(sha1, &oe->idx.digest);
 		oe->type = type;
 		oe->pack_id = MAX_PACK_ID;
 		oe->idx.offset = 1;
diff --git a/git-repack.sh b/git-repack.sh
index 624feec..7602853 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -91,6 +91,7 @@ if [ -z "$names" ]; then
 	say Nothing new to pack.
 fi
 
+
 # Ok we have prepared all new packfiles.
 
 # First see if there are packs of the same name and if so
@@ -100,7 +101,7 @@ rollback=
 failed=
 for name in $names
 do
-	for sfx in pack idx
+	for sfx in pack idx mds
 	do
 		file=pack-$name.$sfx
 		test -f "$PACKDIR/$file" || continue
@@ -148,15 +149,22 @@ do
 	fullbases="$fullbases pack-$name"
 	chmod a-w "$PACKTMP-$name.pack"
 	chmod a-w "$PACKTMP-$name.idx"
+	(chmod a-w "$PACKTMP-$name.mds" 2>/dev/null || exit 0 )
 	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
 	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
 	exit
+	if test -f "$PACKTMP-$name.mds"
+	then
+		mv -f "$PACKTMP-$name.mds"  "$PACKDIR/pack-$name.mds" \
+		    2>/dev/null || exit
+	fi
 done
 
 # Remove the "old-" files
 for name in $names
 do
 	rm -f "$PACKDIR/old-pack-$name.idx"
+	rm -f "$PACKDIR/old-pack-$name.mds"
 	rm -f "$PACKDIR/old-pack-$name.pack"
 done
 
@@ -172,7 +180,7 @@ then
 		  do
 			case " $fullbases " in
 			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+			*)	rm -f "$e.pack" "$e.idx" "$e.mds" "$e.keep" ;;
 			esac
 		  done
 		)
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 602806d..1b72d46 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -54,7 +54,7 @@ cd "$TRASH/.git2"
 
 test_expect_success \
     'check unpack without delta' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f  -print) | grep -v mdsd | grep -v packdb |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -84,7 +84,7 @@ unset GIT_OBJECT_DIRECTORY
 cd "$TRASH/.git2"
 test_expect_success \
     'check unpack with REF_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f -print) | grep -v mdsd | grep -v packdb |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -114,7 +114,7 @@ unset GIT_OBJECT_DIRECTORY
 cd "$TRASH/.git2"
 test_expect_success \
     'check unpack with OFS_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
+    '(cd ../.git && find objects -type f -print) | grep -v mdsd | grep -v packdb |
      while read path
      do
          cmp $path ../.git/$path || {
@@ -211,6 +211,17 @@ test_expect_success \
 			test-3-${packname_3}.idx'
 
 test_expect_success \
+    'verify pack -v -M' \
+    'test -z "`git verify-pack -v -M test-1-${packname_1}.idx \
+			test-2-${packname_2}.idx \
+			test-3-${packname_3}.idx | grep \<no\ md\>`" &&
+     test 0 != `git verify-pack -v -M test-1-${packname_1}.idx | grep md= | wc -l` &&
+     test -z "`git verify-pack -v -M test-1-${packname_1}.idx | grep "should be"`" &&
+     (x=`git verify-pack -v -M test-1-${packname_1}.idx | wc -l`
+     y=`git verify-pack -v -M test-1-${packname_1}.idx |grep -v \<no\ md\> | wc -l`
+     test $x = $y)'
+
+test_expect_success \
     'verify-pack catches mismatched .idx and .pack files' \
     'cat test-1-${packname_1}.idx >test-3.idx &&
      cat test-2-${packname_2}.pack >test-3.pack &&
diff --git a/t/t5301-sliding-window.sh b/t/t5301-sliding-window.sh
index 2fc5af6..ec0d72f 100755
--- a/t/t5301-sliding-window.sh
+++ b/t/t5301-sliding-window.sh
@@ -22,13 +22,19 @@ test_expect_success \
      git repack -a -d &&
      test "`git count-objects`" = "0 objects, 0 kilobytes" &&
      pack1=`ls .git/objects/pack/*.pack` &&
-     test -f "$pack1"'
+     test -f "$pack1"  &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'verify-pack -v, defaults' \
     'git verify-pack -v "$pack1"'
 
 test_expect_success \
+    'verify-pack -v -M, defaults' \
+    'git verify-pack -v -M "$pack1" | grep "<no md>" > tmp
+     test -z "`cat tmp`"'
+
+test_expect_success \
     'verify-pack -v, packedGitWindowSize == 1 page' \
     'git config core.packedGitWindowSize 512 &&
      git verify-pack -v "$pack1"'
@@ -49,12 +55,14 @@ test_expect_success \
      test "`git count-objects`" = "0 objects, 0 kilobytes" &&
      pack2=`ls .git/objects/pack/*.pack` &&
      test -f "$pack2" &&
-     test "$pack1" \!= "$pack2"'
+     test "$pack1" \!= "$pack2" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'verify-pack -v, defaults' \
     'git config --unset core.packedGitWindowSize &&
      git config --unset core.packedGitLimit &&
-     git verify-pack -v "$pack2"'
+     git verify-pack -v "$pack2" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_done
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index f8fa924..da10200 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -37,12 +37,14 @@ test_expect_success \
 test_expect_success \
     'pack-objects with index version 1' \
     'pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
-     git verify-pack -v "test-1-${pack1}.pack"'
+     git verify-pack -v "test-1-${pack1}.pack" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'pack-objects with index version 2' \
     'pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
-     git verify-pack -v "test-2-${pack2}.pack"'
+     git verify-pack -v "test-2-${pack2}.pack" &&
+     test -z "`git count-objects -v -M | grep MD`"'
 
 test_expect_success \
     'both packs should be identical' \
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index d645328..86075a7 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -37,7 +37,8 @@ test_expect_success 'prune stale packs' '
 	git prune --expire 1.day &&
 	test -f $orig_pack &&
 	test -f .git/objects/tmp_2.pack &&
-	! test -f .git/objects/tmp_1.pack
+	! test -f .git/objects/tmp_1.pack  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -50,7 +51,8 @@ test_expect_success 'prune --expire' '
 	test-chmtime =-86500 $BLOB_FILE &&
 	git prune --expire 1.day &&
 	test $before = $(git count-objects | sed "s/ .*//") &&
-	! test -f $BLOB_FILE
+	! test -f $BLOB_FILE  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -64,7 +66,8 @@ test_expect_success 'gc: implicit prune --expire' '
 	test-chmtime =-$((2*$week+1)) $BLOB_FILE &&
 	git gc &&
 	test $before = $(git count-objects | sed "s/ .*//") &&
-	! test -f $BLOB_FILE
+	! test -f $BLOB_FILE  &&
+	test -z "`git count-objects -v -M | grep MD`"
 
 '
 
@@ -78,8 +81,8 @@ test_expect_success 'gc: refuse to start with invalid gc.pruneExpire' '
 test_expect_success 'gc: start with ok gc.pruneExpire' '
 
 	git config gc.pruneExpire 2.days.ago &&
-	git gc
-
+	git gc &&
+	test -z "`git count-objects -v -M | grep MD`"
 '
 
 test_expect_success 'prune: prune nonsense parameters' '
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 438aaf6..63b4a13 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -109,6 +109,12 @@ test_expect_success \
 	'A: verify pack' \
 	'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
 
+test_expect_success \
+	'A: verify pack -v -M --- all objects have CRCs' \
+	'for p in .git/objects/pack/*.pack;
+	do git verify-pack -v -M $p | grep "<no md>" > tmp;
+	   test -z "`cat tmp`" || exit; done'
+
 cat >expect <<EOF
 author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
@@ -1504,7 +1510,7 @@ INPUT_END
 test_expect_success \
 	'O: blank lines not necessary after other commands' \
 	'git fast-import <input &&
-	 test 8 = `find .git/objects/pack -type f | wc -l` &&
+	 test 8 = `find .git/objects/pack -type f | grep -v .mds | wc -l` &&
 	 test `git rev-parse refs/tags/O3-2nd` = `git rev-parse O3^` &&
 	 git log --reverse --pretty=oneline O3 | sed s/^.*z// >actual &&
 	 test_cmp expect actual'
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/6] Add caching of message digests for objects.
From: Bill Zaumen @ 2011-12-21  7:10 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

Message digests are created when git objects are created.
The digests are stored in either their own files or in
an "mds" file that goes with a pack file's index file.
Most of the changes are in sha1_file.c, with a function
to create an mds file in pack-write.c. Macros in cache.h
allow the previous function calls to be used - some now
take a pointer to a digest as an argument. Hex.c was
modified to print message digests in hexadecimal, and
a test script was modified to account for a new directory
in objects.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 Makefile          |  121 ++++++++++++
 builtin/init-db.c |   17 ++
 cache.h           |   72 ++++++-
 environment.c     |   57 ++++++
 git.c             |   14 ++-
 hex.c             |  106 ++++++++++-
 pack-write.c      |  120 ++++++++++++
 pack.h            |    3 +
 sha1_file.c       |  560 +++++++++++++++++++++++++++++++++++++++++++++++------
 t/t0000-basic.sh  |   13 +-
 10 files changed, 1012 insertions(+), 71 deletions(-)

diff --git a/Makefile b/Makefile
index 9470a10..759df5c 100644
--- a/Makefile
+++ b/Makefile
@@ -278,6 +278,92 @@ all::
 # dependency rules.
 #
 # Define NATIVE_CRLF if your platform uses CRLF for line endings.
+#
+#
+# Set MDSDB to indicate the database type for the DB mapping SHA1
+# values to the MDs (Message Digests) of the objects git stores.
+# Valid values are:
+#
+#   0 for storing each local-object MD in its own file.
+#
+# [more to be added as needed - a legal value is mandatory].
+#
+# Note: the values for MDSDB are determined by preprocessor directives
+# defined in mdsdb.h This constant must be defined so that necessary
+# files are compiled.
+#
+MDSDB = 0
+
+#
+# Define MDIGEST_DEFAULT to set the default type of MD for authentication and
+# hash-collision detection.  Legal values
+# are:
+#      MDIGEST_CRC - use a CRC (used only as a minimal digest for performance
+#                    testing).
+#
+#     MDIGEST_SHA1 - use a SHA-1 digest.
+#
+#   MDIGEST_SHA256 - use a SHA-256 digest.
+#
+#   MDIGEST_SHA512 - use a SHA-512 digest.
+#
+# (additional ones may be added as needed.)
+#
+# Note: the message digests computed are for uncompressed objects, not
+# including the Git object-header.  If not set, a default defined in the
+# file mdigest.h will be used.
+#
+MDIGEST_DEFAULT = MDIGEST_SHA256
+
+#
+# Define PACKDB to use a GDBM-like database for storing message
+# digests compactly when those digests are not available using the
+# normal mechanisms.  As an example, if an alternate object database
+# is used and if it was created using an older version of git, message
+# digests may not be available, and git by design cannot modify an
+# alternate object database, so the message digests cannot be added to
+# it.  If PACKDB is not defined, at certain points (e.g., during a
+# commit, the digest for an object in an alternate object database
+# will be calculated each time.  When PACKDB is defined, the object's
+# digest is calculated once and stored in the packdb database.  GDBM
+# is too slow for use in general, but it is adequate for handling
+# unusual cases.
+#
+# Valid values are:
+#
+#                0 - use GDBM to implement the database.
+#    [not defined] - do nothing.
+#
+# [more can be added as needed].
+#
+PACKDB =
+
+# Define PACKDB_TEST in order to turn on an inefficient
+# test for PACKDB functions.  This code will add an entry to the packdb
+# database during commits when such an entry is not necessary and then
+# will read it back to make sure the data was added correctly. The option
+# has no effect if PACKDB is not defined.
+#
+# NOTE: this option should not be used in a released version of Git.
+#
+PACKDB_TEST =
+
+# Define COMMIT_DIGEST to include a 'digest' header in a commit. The header
+# will contain a 2-character code indicating the digest type, followed
+# immediately by the digest.  We are delaying turning this on by default
+# until the test scripts are updated, as the test scripts include explicit
+# file lengths and SHA-1 values.
+
+COMMIT_DIGEST =
+
+# Define COMMIT_DIGEST_TEST to force get_objects_mds to be called even if
+# COMMIT_DIGEST is not defined (in which case the digest header will not
+# appear in the commit object created).
+#
+# NOTE: this option should not be used in a released version of Git.
+#
+
+COMMIT_DIGEST_TEST =
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -536,7 +622,9 @@ LIB_H += blob.h
 LIB_H += builtin.h
 LIB_H += bulk-checkin.h
 LIB_H += cache.h
+LIB_H += mdigest.h
 LIB_H += cache-tree.h
+LIB_H += mdsdb.h
 LIB_H += color.h
 LIB_H += commit.h
 LIB_H += compat/bswap.h
@@ -711,6 +799,7 @@ LIB_OBJS += sequencer.o
 LIB_OBJS += sha1-array.o
 LIB_OBJS += sha1-lookup.o
 LIB_OBJS += sha1_file.o
+LIB_OBJS += mdigest.o
 LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
@@ -836,6 +925,7 @@ BUILTIN_OBJS += builtin/write-tree.o
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
 EXTLIBS =
 
+
 #
 # Platform specific tweaks
 #
@@ -1721,6 +1811,37 @@ ifeq ($(PYTHON_PATH),)
 NO_PYTHON=NoThanks
 endif
 
+ifdef MDSDB
+BASIC_CFLAGS += -DBLOB_MDS_CHECK
+endif
+
+ifdef COMMIT_DIGEST
+BASIC_CFLAGS += -DCOMMIT_DIGEST
+endif
+
+ifdef COMMIT_DIGEST_TEST
+BASIC_CFLAGS += -DCOMMIT_DIGEST_TEST
+endif
+
+ifdef MDIGEST_DEFAULT
+BASIC_CFLAGS += -DMDIGEST_DEFAULT=$(MDIGEST_DEFAULT)
+endif
+
+ifeq ($(MDSDB), 0)
+BASIC_CFLAGS += -DMDSDB=$(MDSDB)
+LIB_OBJS += objd-mdsdb.o
+endif
+
+ifeq ($(PACKDB), 0)
+BASIC_CFLAGS += -DPACKDB
+LIB_OBJS += gdbm-packdb.o
+EXTLIBS += -lgdbm
+endif
+
+ifdef PACKDB_TEST
+BASIC_CFLAGS += -DPACKDB_TEST
+endif
+
 QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
 QUIET_SUBDIR1  =
 
diff --git a/builtin/init-db.c b/builtin/init-db.c
index d07554c..6d5ec0f 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -7,6 +7,10 @@
 #include "builtin.h"
 #include "exec_cmd.h"
 #include "parse-options.h"
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 #ifndef DEFAULT_GIT_TEMPLATE_DIR
 #define DEFAULT_GIT_TEMPLATE_DIR "/usr/share/git-core/templates"
@@ -309,6 +313,19 @@ static void create_object_directory(void)
 	strcpy(path+len, "/info");
 	safe_create_dir(path, 1);
 
+#if (MDSDB == 0)
+	strcpy(path+len, "/mdsd");
+	safe_create_dir(path, 1);
+#endif
+	/*
+	 * In case the call in environent.c failed to initialize
+	 * (missing directory?) or somehow wasn't called at all.
+	 */
+	mdsdb_init();
+	mdigest_init();
+#ifdef PACKDB
+	packdb_init();
+#endif
 	free(path);
 }
 
diff --git a/cache.h b/cache.h
index 7d93df6..17e3dd4 100644
--- a/cache.h
+++ b/cache.h
@@ -16,6 +16,7 @@
 #define git_SHA1_Final	SHA1_Final
 #endif
 
+#include "mdigest.h"
 #include <zlib.h>
 typedef struct git_zstream {
 	z_stream z;
@@ -433,6 +434,10 @@ extern int is_inside_work_tree(void);
 extern int have_git_dir(void);
 extern const char *get_git_dir(void);
 extern char *get_object_directory(void);
+extern char *get_object_mds_directory(void);
+#ifdef PACKDB
+extern char *get_object_packdb_node(void);
+#endif
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
 extern int set_git_dir(const char *path);
@@ -541,8 +546,15 @@ extern int ce_path_match(const struct cache_entry *ce, const struct pathspec *pa
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
-extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
-extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
+
+#define index_fd(sha1,fd,st,type,path,flags)			\
+	index_fd_extended((sha1), NULL, (fd), (st), (type), (path), (flags))
+extern int index_fd_extended(unsigned char *sha1, mdigest_t *mdigestp, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
+
+#define index_path(sha1, path, st, flags) \
+	index_path_extended((sha1), NULL, (path), (st), (flags))
+extern int index_path_extended(unsigned char *sha1, mdigest_t *mdigestp, const char *path
+, struct stat *st, unsigned flags);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
 #define REFRESH_REALLY		0x0001	/* ignore_valid */
@@ -670,6 +682,7 @@ extern char *git_path_submodule(const char *path, const char *fmt, ...)
 extern char *sha1_file_name(const unsigned char *sha1);
 extern char *sha1_pack_name(const unsigned char *sha1);
 extern char *sha1_pack_index_name(const unsigned char *sha1);
+extern char *sha1_pack_mds_name(const unsigned char *sha1);
 extern const char *find_unique_abbrev(const unsigned char *sha1, int);
 extern const unsigned char null_sha1[20];
 
@@ -769,9 +782,18 @@ static inline const unsigned char *lookup_replace_object(const unsigned char *sh
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern int sha1_object_info(const unsigned char *, unsigned long *);
-extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
-extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
-extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
+
+#define hash_sha1_file(buf,len,type,sha1) \
+	hash_sha1_file_extended((buf), (len), (type), (sha1), NULL)
+extern int hash_sha1_file_extended(const void *buf, unsigned long len, const char *type, unsigned char *sha1, mdigest_t *mdigestp);
+
+#define write_sha1_file(buf,len,type,return_sha1) \
+	write_sha1_file_extended((buf), (len), (type), (return_sha1), NULL)
+extern int write_sha1_file_extended(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1, mdigest_t *mdigestp);
+
+#define pretend_sha1_file(buf,len,type,sha1) \
+	pretend_sha1_file_extended((buf), (len), (type), (sha1), NULL)
+extern int pretend_sha1_file_extended(void *, unsigned long, enum object_type, unsigned char *, mdigest_t *mdigestp);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
@@ -783,13 +805,18 @@ extern int do_check_packed_object_crc;
 /* for development: log offset of pack access */
 extern const char *log_pack_access;
 
-extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
+#define check_sha1_signature(sha1,buf,size,type) \
+	check_sha1_signature_extended((sha1), NULL, (buf), (size), (type))
+extern int check_sha1_signature_extended(const unsigned char *sha1, mdigest_t *mdigestp, void *buf, unsigned long size, const char *type);
 
 extern int move_temp_to_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
+extern int has_sha1_file_digest(const unsigned char *sha1, mdigest_t *mdigestp);
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
+extern int has_loose_object_nonlocal_digest(const unsigned char *sha1,
+					    mdigest_t *mdigestp);
 
 extern int has_pack_index(const unsigned char *sha1);
 
@@ -831,8 +858,19 @@ static inline int get_sha1_with_context(const char *str, unsigned char *sha1, st
  * null-terminated string.
  */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
-
+extern int get_sha1_hex_digest(const char *hex, unsigned char *sha1,
+			       int *has_digest, mdigest_t *digestp);
+/*
+ * get_mdigest_from_external_hex assumes hex is terminated by something that is
+ * not alphanumeric, so the string does not have to be null terminated.
+ */
+extern int get_mdigest_from_external_hex(mdigest_t *digestp, const char *hex);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern char *sha1_to_hex_digest(const unsigned char *sha1,
+				const mdigest_t *digestp); /* static buffer result! */
+
+extern int get_hex_field_size(char *hex);
+
 extern int read_ref_full(const char *filename, unsigned char *sha1,
 			 int reading, int *flags);
 extern int read_ref(const char *filename, unsigned char *sha1);
@@ -978,10 +1016,13 @@ extern struct packed_git {
 	off_t pack_size;
 	const void *index_data;
 	size_t index_size;
+	const void *mds_data;
+	size_t mds_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
 	unsigned char *bad_object_sha1;
 	int index_version;
+	int mds_version;
 	time_t mtime;
 	int pack_fd;
 	unsigned pack_local:1,
@@ -996,6 +1037,8 @@ struct pack_entry {
 	off_t offset;
 	unsigned char sha1[20];
 	struct packed_git *p;
+	int has_mdigest;
+	mdigest_t mdigest;
 };
 
 struct ref {
@@ -1050,6 +1093,11 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 
 extern void pack_report(void);
 extern int open_pack_index(struct packed_git *);
+extern int open_pack_mds(struct packed_git *p);
+extern int git_open_noatime(const char *name);
+extern size_t required_git_packed_mds_size(const char *path,
+					   void *data, uint32_t nobjects,
+					   size_t actual_size);
 extern void close_pack_index(struct packed_git *);
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
@@ -1058,8 +1106,16 @@ extern void free_pack_by_name(const char *);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *, int, int);
 extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
+extern int nth_packed_object_mdigest(const struct packed_git *p, uint32_t n,
+				      mdigest_t *mdigestp);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
-extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
+
+#define find_pack_entry_one(sha1,p) find_pack_entry_one_extended((sha1),(p), NULL, NULL)
+extern off_t find_pack_entry_one_extended(const unsigned char *,
+					  struct packed_git *,
+					  int *has_mdigestp,
+					  mdigest_t *mdigestp);
+
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
diff --git a/environment.c b/environment.c
index c93b8f4..5ced2e2 100644
--- a/environment.c
+++ b/environment.c
@@ -10,6 +10,10 @@
 #include "cache.h"
 #include "refs.h"
 #include "fmt-merge-msg.h"
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 char git_default_email[MAX_GITNAME];
 char git_default_name[MAX_GITNAME];
@@ -77,6 +81,10 @@ static size_t namespace_len;
 static const char *git_dir;
 static char *git_object_dir, *git_index_file, *git_graft_file;
 
+static char *git_object_mds_dir;
+#ifdef PACKDB
+static char *git_object_packdb_node;
+#endif
 /*
  * Repository-local GIT_* environment variables
  * Remember to update local_repo_env_size in cache.h when
@@ -118,6 +126,11 @@ static char *expand_namespace(const char *raw_namespace)
 
 static void setup_git_env(void)
 {
+	static char cwdbuf[PATH_MAX];
+	int ocn_len;
+#ifdef PACKDB
+	int opn_len;
+#endif
 	git_dir = getenv(GIT_DIR_ENVIRONMENT);
 	git_dir = git_dir ? xstrdup(git_dir) : NULL;
 	if (!git_dir) {
@@ -131,6 +144,31 @@ static void setup_git_env(void)
 		git_object_dir = xmalloc(strlen(git_dir) + 9);
 		sprintf(git_object_dir, "%s/objects", git_dir);
 	}
+	ocn_len = strlen(git_object_dir) + 8 + strlen(getcwd(cwdbuf, PATH_MAX));
+	git_object_mds_dir = xmalloc(ocn_len);
+	memset(git_object_mds_dir, 0, ocn_len);
+	sprintf(git_object_mds_dir, "%s/mdsd", git_object_dir);
+	if (git_object_mds_dir[0] != '/') {
+		int ocn_offset = (git_object_mds_dir[0] == '.' &&
+				  git_object_mds_dir[1] == '/')? 2:0;
+		memset(git_object_mds_dir, 0, ocn_len);
+		sprintf(git_object_mds_dir, "%s/%s/mdsd",
+			getcwd(cwdbuf, PATH_MAX), git_object_dir + ocn_offset);
+	}
+#ifdef PACKDB
+	opn_len = strlen(git_object_dir)
+		+ 10 + strlen(getcwd(cwdbuf, PATH_MAX));
+	git_object_packdb_node = xmalloc(opn_len);
+	memset(git_object_packdb_node, 0, opn_len);
+	sprintf(git_object_packdb_node, "%s/packdb", git_object_dir);
+	if (git_object_packdb_node[0] != '/') {
+		int opn_offset = (git_object_mds_dir[0] == '.' &&
+				  git_object_mds_dir[1] == '/')? 2:0;
+		memset(git_object_packdb_node, 0, opn_len);
+		sprintf(git_object_packdb_node, "%s/%s/packdb",
+			getcwd(cwdbuf, PATH_MAX), git_object_dir + opn_offset);
+	}
+#endif
 	git_index_file = getenv(INDEX_ENVIRONMENT);
 	if (!git_index_file) {
 		git_index_file = xmalloc(strlen(git_dir) + 7);
@@ -143,6 +181,11 @@ static void setup_git_env(void)
 		read_replace_refs = 0;
 	namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
 	namespace_len = strlen(namespace);
+	mdsdb_init();
+	mdigest_init();
+#ifdef PACKDB
+	packdb_init();
+#endif
 }
 
 int is_bare_repository(void)
@@ -210,6 +253,20 @@ char *get_object_directory(void)
 	return git_object_dir;
 }
 
+char *get_object_mds_directory(void) {
+	if (!git_object_mds_dir)
+		setup_git_env();
+	return git_object_mds_dir;
+}
+
+#ifdef PACKDB
+char *get_object_packdb_node(void) {
+	if (!git_object_packdb_node)
+		setup_git_env();
+	return git_object_packdb_node;
+}
+#endif
+
 int odb_mkstemp(char *template, size_t limit, const char *pattern)
 {
 	int fd;
diff --git a/git.c b/git.c
index fb9029c..f43328f 100644
--- a/git.c
+++ b/git.c
@@ -4,7 +4,10 @@
 #include "help.h"
 #include "quote.h"
 #include "run-command.h"
-
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 const char git_usage_string[] =
 	"git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]\n"
 	"           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]\n"
@@ -279,6 +282,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 	struct stat st;
 	const char *prefix;
 
+	static int mdsdb_need_atexit = 1;
+
+	if (mdsdb_need_atexit) {
+#ifdef PACKDB
+		atexit(packdb_finish);
+#endif
+		atexit(mdsdb_finish);
+		mdsdb_need_atexit = 0;
+	}
 	prefix = NULL;
 	help = argc == 2 && !strcmp(argv[1], "-h");
 	if (!help) {
diff --git a/hex.c b/hex.c
index 9ebc050..46c8b8b 100644
--- a/hex.c
+++ b/hex.c
@@ -1,3 +1,4 @@
+#include <ctype.h>
 #include "cache.h"
 
 const signed char hexval_table[256] = {
@@ -56,10 +57,55 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 	return 0;
 }
 
+int get_mdigest_from_external_hex(mdigest_t *digestp, const char *hex)
+{
+	int max = 0;
+	int wcode = 0, blen;
+	const char *ptr = hex;
+	char ch1, ch2;
+	unsigned char *out = digestp->buffer.buffer;
+
+	if (isalnum((ch1 = *(ptr++))) && isalnum((ch2 = *(ptr++)))) {
+		unsigned int val = (hexval(ch1) << 4) | hexval(ch2);
+		if (val & ~0xff)
+			return -1;
+		wcode = (int) val;
+	}
+	blen = get_mdigest_required_len(wcode);
+
+	while (isalnum((ch1 = *(ptr++))) && isalnum((ch2 = *(ptr++)))) {
+		unsigned int val = (hexval(ch1) << 4) | hexval(ch2);
+		if (val & ~0xff)
+			return -1;
+		*(out++) = val;
+		max += 2;
+	}
+	if (max != 2 * blen) return -1;
+	mdigest_load(digestp, wcode, NULL);
+	return max + 2;		/* add the 2 chars for wcode */
+}
+
+int get_sha1_hex_digest(const char *hex, unsigned char *sha1,
+			int *has_digest, mdigest_t *digestp)
+{
+	int result = get_sha1_hex(hex, sha1);
+	if (result) return result;
+	if (hex[40] == '-') {
+		int cnt = get_mdigest_from_external_hex(digestp, hex+41 );
+		*has_digest = (cnt > 0);
+		if (!*has_digest) return -1;
+	} else {
+		*has_digest = 0;
+		mdigest_clear(digestp);
+	}
+	return 0;
+}
+
+
 char *sha1_to_hex(const unsigned char *sha1)
 {
 	static int bufno;
-	static char hexbuffer[4][50];
+	static char hexbuffer[4][50 + 2 + (MAX_DIGEST_LENGTH * 4)];
 	static const char hex[] = "0123456789abcdef";
 	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
 	int i;
@@ -73,3 +119,61 @@ char *sha1_to_hex(const unsigned char *sha1)
 
 	return buffer;
 }
+
+char *mdigest_to_hex(const mdigest_t *digestp) {
+	static int bufno;
+	static char hexbuffer[4][(MAX_DIGEST_LENGTH *2) + 1];
+	static const char hex[] = "0123456789abcdef";
+	const unsigned char *inbuf = get_mdigest_buffer(digestp);
+	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
+	int i;
+	int len = get_mdigest_len(digestp);
+
+	for (i = 0; i < len; i++) {
+		unsigned int val = *inbuf++;
+		*buf++ = hex[val >> 4];
+		*buf++ = hex[val & 0xf];
+	}
+	*buf = '\0';
+
+	return buffer;
+
+}
+
+char *mdigest_to_external_hex(const mdigest_t *digestp) {
+	static int bufno;
+	static char hexbuffer[4][((MAX_DIGEST_LENGTH + 1) * 2) + 1];
+	static const char hex[] = "0123456789abcdef";
+	const unsigned char *inbuf = get_mdigest_buffer(digestp);
+	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
+	int i;
+	int len = get_mdigest_len(digestp);
+	int wcode = get_mdigest_wcode(digestp);
+	unsigned int wval = wcode & 0xff;
+	*buf++ = hex[wval >> 4];
+	*buf++ = hex[wval & 0xf];
+	for (i = 0; i < len; i++) {
+		unsigned int val = *inbuf++;
+		*buf++ = hex[val >> 4];
+		*buf++ = hex[val & 0xf];
+	}
+	*buf = '\0';
+
+	return buffer;
+
+}
+
+char *sha1_to_hex_digest(const unsigned char *sha1, const mdigest_t *digestp)
+{
+	char *result = sha1_to_hex(sha1);
+	sprintf(result+40, "-%s", mdigest_to_external_hex(digestp));
+	return result;
+}
+
+int get_hex_field_size(char *hex) {
+	int tmp;
+	if (!isalnum(hex[0]) || !isalnum(hex[1])) return -1;
+	unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+	tmp = get_mdigest_required_len((int) (val & 0xff));
+	return (tmp < 0)? tmp: 2 * (tmp + 1);
+}
diff --git a/pack-write.c b/pack-write.c
index de2bd01..fe461a5 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -194,6 +194,117 @@ off_t write_pack_header(struct sha1file *f, uint32_t nr_entries)
 	return sizeof(hdr);
 }
 
+const char *write_mds_file(const char *crc_name,
+			   struct pack_idx_entry **objects,
+			   int nr,
+			   const struct pack_idx_option *opts,
+			   unsigned char *sha1)
+{
+	static unsigned char buffer[4 + 4 * MAX_DIGEST_LENGTH];
+	unsigned char *base = buffer;
+	int i, j, fd;
+	struct sha1file *f;
+	int wsize = get_mdigest_wsize_by_type(MDIGEST_DEFAULT);
+	int wbsize = wsize * 4;
+
+	for (i = 0; i < nr; i += 4) {
+		if (objects[i]->has_digest) {
+			int ws = get_mdigest_wsize(&(objects[i]->digest));
+			if (wsize < ws) wsize = ws;
+		}
+	}
+	wbsize = wsize * 4;
+	if (nr) {
+		/*
+		 * Sort just in case objects not already sorted.
+		 */
+		qsort(objects, nr, sizeof(objects[0]), sha1_compare);
+	}
+
+	if (opts->flags & WRITE_IDX_VERIFY) {
+		assert(crc_name);
+		f = sha1fd_check(crc_name);
+		if (f == NULL) {
+			/*
+			 * For backwards-compatability, assume a missing
+			 * mds file is OK.
+			 */
+			return crc_name;
+		}
+	} else {
+		if (!crc_name) {
+			static char tmpfile[PATH_MAX];
+			fd = odb_mkstemp(tmpfile, sizeof(tmpfile),
+					 "pack/tmp_mds_XXXXXX");
+			crc_name = xstrdup(tmpfile);
+		} else {
+			unlink(crc_name);
+			fd = open(crc_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+		}
+		if (fd < 0)
+			die_errno("unable to create '%s'", crc_name);
+		f = sha1fd(fd, crc_name);
+	}
+
+	*(base++) = 'P';
+	*(base++) = 'K';
+	*(base++) = 'M';
+	*(base++) = 'D';
+	*(base++) = 'S';
+	*(base++) = 0;
+	*(base++) = 1; /* version number */
+	*(base++) = (unsigned char) wsize; /* wcode */
+	sha1write(f, buffer, base - buffer);
+	base = buffer;
+
+	for (i = 0; i < nr; i += 4) {
+		int lim = ((nr-i) > 3)? 4: nr-i;
+		int has[4];
+		mdigest_t crc[4];
+		for (j = 0; j < lim; j++) {
+			if (objects[i+j]->has_digest) {
+				has[j] = get_mdigest_wcode
+					(&(objects[i+j]->digest));
+				crc[j] = objects[i+j]->digest;
+			} else {
+				has[j] =
+				  (has_sha1_file_digest(objects[i + j]->sha1,
+							&crc[j]) == 1);
+				if (has[j]) {
+					has[j] = get_mdigest_wcode(&crc[j]);
+				}
+			}
+		}
+		for (j = 0; j < 4; j++) {
+			if (j < lim) {
+				*(base)++ = has[j];
+			} else {
+				has[j] = 0;
+				mdigest_clear(&crc[j]);
+				*(base++) = 0;
+			}
+		}
+		for (j = 0; j < 4; j += 1) {
+			if (j < lim) {
+				if (has[j])
+					mdigest_to_buffer(base, &crc[j],
+							  wbsize);
+				else
+					memset(base, 0, wbsize);
+			} else {
+				memset(base, 0, wbsize);
+			}
+			base += wbsize;
+		}
+		sha1write(f, buffer, base - buffer);
+		base = buffer;
+	}
+	sha1write(f, sha1, 20);
+	sha1close(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
+			    ? CSUM_CLOSE : CSUM_FSYNC));
+	return crc_name;
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
@@ -351,6 +462,7 @@ void finish_tmp_packfile(char *name_buffer,
 			 unsigned char sha1[])
 {
 	const char *idx_tmp_name;
+	const char *mds_tmp_name;
 	char *end_of_name_prefix = strrchr(name_buffer, 0);
 
 	if (adjust_shared_perm(pack_tmp_name))
@@ -358,8 +470,12 @@ void finish_tmp_packfile(char *name_buffer,
 
 	idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
 				      pack_idx_opts, sha1);
+	mds_tmp_name = write_mds_file(NULL, written_list, nr_written,
+				      pack_idx_opts, sha1);
 	if (adjust_shared_perm(idx_tmp_name))
 		die_errno("unable to make temporary index file readable");
+	if (adjust_shared_perm(mds_tmp_name))
+		die_errno("unable to make temporary index file readable");
 
 	sprintf(end_of_name_prefix, "%s.pack", sha1_to_hex(sha1));
 	free_pack_by_name(name_buffer);
@@ -370,6 +486,10 @@ void finish_tmp_packfile(char *name_buffer,
 	sprintf(end_of_name_prefix, "%s.idx", sha1_to_hex(sha1));
 	if (rename(idx_tmp_name, name_buffer))
 		die_errno("unable to rename temporary index file");
+	sprintf(end_of_name_prefix, "%s.mds", sha1_to_hex(sha1));
+	if (rename(mds_tmp_name, name_buffer))
+		die_errno("unable to rename temporary mds file");
 
 	free((void *)idx_tmp_name);
+	free((void *)mds_tmp_name);
 }
diff --git a/pack.h b/pack.h
index aa6ee7d..759d2f4 100644
--- a/pack.h
+++ b/pack.h
@@ -70,6 +70,8 @@ struct pack_idx_entry {
 	unsigned char sha1[20];
 	uint32_t crc32;
 	off_t offset;
+	int has_digest;
+	mdigest_t digest;
 };
 

@@ -77,6 +79,7 @@ struct progress;
 typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
+extern const char *write_mds_file(const char *mds_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, unsigned char *sha1);
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, uint32_t);
diff --git a/sha1_file.c b/sha1_file.c
index f291f3f..e176d53 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -19,6 +19,10 @@
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
 #include "bulk-checkin.h"
+#include "mdsdb.h"
+#ifdef PACKDB
+#include "packdb.h"
+#endif
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -41,6 +45,7 @@ const unsigned char null_sha1[20];
  */
 static struct cached_object {
 	unsigned char sha1[20];
+	unsigned char md_as_array[sizeof (mdigest_t)];
 	enum object_type type;
 	void *buf;
 	unsigned long size;
@@ -49,6 +54,7 @@ static int cached_object_nr, cached_object_alloc;
 
 static struct cached_object empty_tree = {
 	EMPTY_TREE_SHA1_BIN_LITERAL,
+	{0,},
 	OBJ_TREE,
 	"",
 	0
@@ -223,11 +229,18 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 	return sha1_get_pack_name(sha1, &name, &base, "idx");
 }
 
+char *sha1_pack_mds_name(const unsigned char *sha1)
+{
+	static char *name, *base;
+
+	return sha1_get_pack_name(sha1, &name, &base, "mds");
+}
+
+
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
 static void read_info_alternates(const char * alternates, int depth);
-static int git_open_noatime(const char *name);
 
 /*
  * Prepare alternate object database registry.
@@ -416,6 +429,7 @@ void prepare_alt_odb(void)
 	link_alt_odb_entries(alt, alt + strlen(alt), PATH_SEP, NULL, 0);
 
 	read_info_alternates(get_object_directory(), 0);
+	mdsdb_init_alts();
 }
 
 static int has_loose_object_local(const unsigned char *sha1)
@@ -442,6 +456,53 @@ static int has_loose_object(const unsigned char *sha1)
 	       has_loose_object_nonlocal(sha1);
 }
 
+static int has_loose_object_local_digest(const unsigned char *sha1,
+				      mdigest_t *digestp)
+{
+	int status;
+	mdsdb_open(NULL);
+	status = mdsdb_lookup(NULL, sha1, digestp) > 0;
+	mdsdb_close(NULL);
+	return status;
+}
+
+int has_loose_object_nonlocal_digest(const unsigned char *sha1,
+				  mdigest_t *digestp)
+{
+	struct alternate_object_database *alt;
+
+	if (digestp == NULL) return 0;
+	prepare_alt_odb();
+	for (alt = alt_odb_list; alt; alt = alt->next) {
+		fill_sha1_path(alt->name, sha1);
+		if (!access(alt->base, F_OK)) {
+			mdigest_t xdigest;
+			/* Use the crc corresponding to the hash */
+			mdsdb_t dbf;
+			int status;
+			dbf = mdsdb_open_alt(alt);
+			status = mdsdb_lookup(dbf, sha1,
+					      (digestp? digestp: &xdigest));
+			mdsdb_close(dbf);
+			switch (status) {
+			case 0: return 0;
+			case 1: return 1;
+			case -1:
+			default:
+				return 0;
+			}
+		}
+	}
+	return 0;
+}
+
+static int has_loose_object_digest(const unsigned char *sha1,
+				   mdigest_t *digestp)
+{
+	return has_loose_object_local_digest(sha1, digestp) ||
+	       has_loose_object_nonlocal_digest(sha1, digestp);
+}
+
 static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
@@ -575,6 +636,87 @@ static int check_packed_git_idx(const char *path,  struct packed_git *p)
 	return 0;
 }
 
+size_t required_git_packed_mds_size(const char *path, void *data,
+				    uint32_t nobjects,
+				    size_t actual_size) {
+	unsigned char *base;
+	int wsize, version;
+	size_t required_size;
+	if (actual_size < 8) {
+		error("mds file %s is too small", path);
+		return 0;
+	}
+
+	base = data;
+	if ((*(base++) != 'P')
+	    || (*(base++) != 'K')
+	    || (*(base++) != 'M')
+	    || (*(base++) != 'D')
+	    || (*(base++) != 'S')
+	    || (*(base++) != 0)) {
+		error("mds file %s corrupted (bad header)",
+			     path);
+		return 0;
+
+	}
+	if ((version = *(base++)) != 1) {
+		error("mds file %s uses an unrecognized version %d",
+		      path, version);
+		return 0;
+	}
+	wsize = (*(base++)) * 4;
+	if (wsize == 0) {
+		/* must be positive and a multiple of 4 */
+		error("mds file %s corrupted (bad wsize field)",
+			     path);
+		return 0;
+	}
+	required_size = (size_t)8 +
+	  ((size_t)((nobjects)/4 + (nobjects % 4 != 0))
+	   * (size_t)(4 * (1 + wsize))) + (size_t)(20 * 2);
+	if (required_size != actual_size) {
+		error("mds file %s not the right size: %ld != %ld",
+		      path, (long)actual_size, (long)required_size);
+		return 0;
+	}
+	return required_size;
+}
+
+static int check_packed_git_mds(const char *path, struct packed_git *p)
+{
+	void *mds_map;
+	size_t mds_size, required_size;
+	unsigned char *base;
+	int fd = git_open_noatime(path);
+	int version;
+	struct stat st;
+	if (fd < 0)
+		return -1;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+	mds_size = xsize_t(st.st_size);
+	if (mds_size < 8 + 20 + 20) {
+		close(fd);
+		return error("mds file %s is too small", path);
+	}
+	mds_map = xmmap(NULL, mds_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	base = mds_map;
+	required_size = required_git_packed_mds_size(path, mds_map,
+						     p->num_objects,
+						     mds_size);
+	if (required_size == 0) {
+		munmap(mds_map, mds_size);
+		return -1;
+	}
+	p->mds_data = mds_map;
+	p->mds_size = mds_size;
+	p->mds_version = version;
+	return 0;
+}
+
 int open_pack_index(struct packed_git *p)
 {
 	char *idx_name;
@@ -590,6 +732,20 @@ int open_pack_index(struct packed_git *p)
 	return ret;
 }
 
+int open_pack_mds(struct packed_git *p) {
+	char *mds_name;
+	int ret;
+
+	if (p->mds_data)
+		return 0;
+
+	mds_name = xstrdup(p->pack_name);
+	strcpy(mds_name + strlen(mds_name) - strlen(".pack"), ".mds");
+	ret = check_packed_git_mds(mds_name, p);
+	free(mds_name);
+	return ret;
+}
+
 static void scan_windows(struct packed_git *p,
 	struct packed_git **lru_p,
 	struct pack_window **lru_w,
@@ -691,6 +847,15 @@ void close_pack_index(struct packed_git *p)
 	if (p->index_data) {
 		munmap((void *)p->index_data, p->index_size);
 		p->index_data = NULL;
+		p->index_size = 0;
+	}
+}
+
+void close_pack_mds(struct packed_git *p) {
+	if (p->mds_data) {
+		munmap((void *)p->mds_data, p->mds_size);
+		p->mds_data = NULL;
+		p->mds_size = 0;
 	}
 }
 
@@ -718,6 +883,7 @@ void free_pack_by_name(const char *pack_name)
 				pack_open_fds--;
 			}
 			close_pack_index(p);
+			close_pack_mds(p);
 			free(p->bad_object_sha1);
 			*pp = p->next;
 			free(p);
@@ -741,6 +907,10 @@ static int open_packed_git_1(struct packed_git *p)
 
 	if (!p->index_data && open_pack_index(p))
 		return error("packfile %s index unavailable", p->pack_name);
+	/*
+	 * Assume an mds file might not be available - backwards compatibility
+	 */
+	if (!p->mds_data) open_pack_mds(p);
 
 	if (!pack_max_fds) {
 		struct rlimit lim;
@@ -1142,14 +1312,23 @@ static const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
 	return NULL;
 }
 
-int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long size, const char *type)
+int check_sha1_signature_extended(const unsigned char *sha1,
+				  mdigest_t *digestp,
+				  void *map, unsigned long size,
+				  const char *type)
 {
 	unsigned char real_sha1[20];
-	hash_sha1_file(map, size, type, real_sha1);
-	return hashcmp(sha1, real_sha1) ? -1 : 0;
+	mdigest_t rdigest;
+	hash_sha1_file_extended(map, size, type, real_sha1,
+				((digestp == NULL)? NULL: &rdigest));
+	int ret = hashcmp(sha1, real_sha1) ? -1 : 0;
+	if (digestp && ret == 0) {
+		ret = mdigest_tst(digestp, &rdigest);
+	}
+	return ret;
 }
 
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
 {
 	static int sha1_file_open_flag = O_NOATIME;
 
@@ -1926,15 +2105,48 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
 	}
 }
 
-off_t find_pack_entry_one(const unsigned char *sha1,
-				  struct packed_git *p)
+int nth_packed_object_mdigest(const struct packed_git *p, uint32_t n,
+			       mdigest_t *digestp)
+{
+	int r;
+	unsigned char *base = (unsigned char *)(p->mds_data);
+	int wsize; /*size in bytes per MDS field, stored as 32-bit words */
+	int wcode;
+
+	if (base == NULL) return 0;
+
+	base += 7;
+	wsize = (*(base++)) * 4;
+	if (wsize == 0) {
+		/* must be positive to store a digest */
+		return -1;
+	}
+	base += (n / 4) * (uint32_t)(4 * (1 + wsize));
+	r = n % 4;
+	wcode = base[r];
+	if (wcode == 0) return 0;
+	base += 4;
+	base += wsize * r;
+	mdigest_load(digestp, wcode, base);
+	return 1;
+}
+
+
+
+off_t find_pack_entry_one_extended(const unsigned char *sha1,
+				   struct packed_git *p,
+				   int *has_digestp, mdigest_t *digestp)
 {
 	const uint32_t *level1_ofs = p->index_data;
 	const unsigned char *index = p->index_data;
+	const unsigned char *mds = p->mds_data;
 	unsigned hi, lo, stride;
 	static int use_lookup = -1;
 	static int debug_lookup = -1;
 
+	if (has_digestp) *has_digestp = 0;
+	if (digestp) mdigest_clear(digestp);
+
 	if (debug_lookup < 0)
 		debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
 
@@ -1944,6 +2156,11 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		level1_ofs = p->index_data;
 		index = p->index_data;
 	}
+
+	if (!mds) {
+		open_pack_mds(p);
+	}
+
 	if (p->index_version > 1) {
 		level1_ofs += 2;
 		index += 8;
@@ -1979,8 +2196,14 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		if (debug_lookup)
 			printf("lo %u hi %u rg %u mi %u\n",
 			       lo, hi, hi - lo, mi);
-		if (!cmp)
+		if (!cmp) {
+			if (has_digestp && digestp)
+				*(has_digestp) =
+				  (nth_packed_object_mdigest(p,
+							     mi,
+							     digestp) == 1);
 			return nth_packed_object_offset(p, mi);
+		}
 		if (cmp > 0)
 			hi = mi;
 		else
@@ -2029,7 +2252,9 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
 					goto next;
 		}
 
-		offset = find_pack_entry_one(sha1, p);
+		offset = find_pack_entry_one_extended(sha1, p,
+						      &(e->has_mdigest),
+						      &(e->mdigest));
 		if (offset) {
 			/*
 			 * We are about to tell the caller where they can
@@ -2175,14 +2400,33 @@ static void *read_packed_sha1(const unsigned char *sha1,
 	return data;
 }
 
-int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
-		      unsigned char *sha1)
+int pretend_sha1_file_extended(void *buf, unsigned long len,
+			       enum object_type type,
+			       unsigned char *sha1, mdigest_t *digestp)
 {
-	struct cached_object *co;
+	struct cached_object *co = NULL;
+	mdigest_t dgst;
+	int has_dgst = 0;
 
-	hash_sha1_file(buf, len, typename(type), sha1);
-	if (has_sha1_file(sha1) || find_cached_object(sha1))
+	hash_sha1_file_extended(buf, len, typename(type), sha1, &dgst);
+	if (has_sha1_file(sha1) || (co = find_cached_object(sha1))) {
+		mdigest_t old_dgst;
+		if (!has_sha1_file_digest(sha1, &old_dgst)) {
+			if (co != NULL) {
+				memcpy(&old_dgst,co->md_as_array,
+				       sizeof (mdigest_t));
+				has_dgst = 1;
+			}
+		} else {
+			has_dgst = 1;
+		}
+		if (has_dgst && mdigest_tst(&old_dgst, &dgst)) {
+			  die("SHA1 COLLISION FOUND FOR %s "
+			      "(dummy commit when running blame?)",
+			      sha1_to_hex(sha1));
+		}
 		return 0;
+	}
 	if (cached_object_alloc <= cached_object_nr) {
 		cached_object_alloc = alloc_nr(cached_object_alloc);
 		cached_objects = xrealloc(cached_objects,
@@ -2193,8 +2437,10 @@ int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
 	co->size = len;
 	co->type = type;
 	co->buf = xmalloc(len);
+	memcpy(co->md_as_array, &dgst, sizeof (mdigest_t));
 	memcpy(co->buf, buf, len);
 	hashcpy(co->sha1, sha1);
+	if (digestp) *digestp = dgst;
 	return 0;
 }
 
@@ -2316,11 +2562,11 @@ void *read_object_with_reference(const unsigned char *sha1,
 }
 
 static void write_sha1_file_prepare(const void *buf, unsigned long len,
-                                    const char *type, unsigned char *sha1,
-                                    char *hdr, int *hdrlen)
+				    const char *type, unsigned char *sha1,
+				    mdigest_t *digestp,
+				    char *hdr, int *hdrlen)
 {
 	git_SHA_CTX c;
-
 	/* Generate the header */
 	*hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
 
@@ -2329,6 +2575,12 @@ static void write_sha1_file_prepare(const void *buf, unsigned long len,
 	git_SHA1_Update(&c, hdr, *hdrlen);
 	git_SHA1_Update(&c, buf, len);
 	git_SHA1_Final(sha1, &c);
+	if (digestp) {
+		mdigest_context_t mdc;
+		mdigest_Init(&mdc, MDIGEST_DEFAULT);
+		mdigest_Update(&mdc, buf, len);
+		mdigest_Final(digestp, &mdc);
+	}
 }
 
 /*
@@ -2384,12 +2636,13 @@ static int write_buffer(int fd, const void *buf, size_t len)
 	return 0;
 }
 
-int hash_sha1_file(const void *buf, unsigned long len, const char *type,
-                   unsigned char *sha1)
+int hash_sha1_file_extended(const void *buf, unsigned long len,
+			    const char *type,
+			    unsigned char *sha1, mdigest_t *digestp)
 {
 	char hdr[32];
 	int hdrlen;
-	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	write_sha1_file_prepare(buf, len, type, sha1, digestp, hdr, &hdrlen);
 	return 0;
 }
 
@@ -2443,10 +2696,14 @@ static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
 	return fd;
 }
 
-static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
+
+static int write_loose_object(const unsigned char *sha1, mdigest_t *digestp,
+			      char *hdr, int hdrlen,
 			      const void *buf, unsigned long len, time_t mtime)
 {
 	int fd, ret;
+	mdigest_t digest;
+	mdigest_context_t mdc;
 	unsigned char compressed[4096];
 	git_zstream stream;
 	git_SHA_CTX c;
@@ -2469,7 +2726,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	stream.next_out = compressed;
 	stream.avail_out = sizeof(compressed);
 	git_SHA1_Init(&c);
-
+	mdigest_Init(&mdc, MDIGEST_DEFAULT);
 	/* First header.. */
 	stream.next_in = (unsigned char *)hdr;
 	stream.avail_in = hdrlen;
@@ -2484,23 +2741,30 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 		unsigned char *in0 = stream.next_in;
 		ret = git_deflate(&stream, Z_FINISH);
 		git_SHA1_Update(&c, in0, stream.next_in - in0);
+		mdigest_Update(&mdc, in0, stream.next_in - in0);
 		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
 			die("unable to write sha1 file");
 		stream.next_out = compressed;
 		stream.avail_out = sizeof(compressed);
 	} while (ret == Z_OK);
+	mdigest_Final(&digest, &mdc);
 
 	if (ret != Z_STREAM_END)
-		die("unable to deflate new object %s (%d)", sha1_to_hex(sha1), ret);
+		die("unable to deflate new object %s (%d)",
+		    sha1_to_hex(sha1), ret);
 	ret = git_deflate_end_gently(&stream);
 	if (ret != Z_OK)
-		die("deflateEnd on object %s failed (%d)", sha1_to_hex(sha1), ret);
+		die("deflateEnd on object %s failed (%d)",
+		    sha1_to_hex(sha1), ret);
 	git_SHA1_Final(parano_sha1, &c);
 	if (hashcmp(sha1, parano_sha1) != 0)
-		die("confused by unstable object source data for %s", sha1_to_hex(sha1));
-
+		die("confused by unstable object source data for %s",
+		    sha1_to_hex(sha1));
+	if (digestp && mdigest_tst(digestp, &digest)) {
+		die("confused by unstable object source data "
+		    "(digest mismatch) for %s", sha1_to_hex(sha1));
+	}
 	close_sha1_file(fd);
-
 	if (mtime) {
 		struct utimbuf utb;
 		utb.actime = mtime;
@@ -2510,24 +2774,41 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 				tmpfile, strerror(errno));
 	}
 
-	return move_temp_to_file(tmpfile, filename);
+	ret = move_temp_to_file(tmpfile, filename);
+	if (ret == 0) {
+		mdsdb_open(NULL);
+		mdsdb_process((mdsdb_t)NULL, sha1, &digest);
+		mdsdb_close(NULL);
+	}
+	return ret;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
+int write_sha1_file_extended(const void *buf, unsigned long len,
+			     const char *type, unsigned char *returnsha1,
+			     mdigest_t *digestp)
 {
 	unsigned char sha1[20];
 	char hdr[32];
 	int hdrlen;
+	mdigest_t newdigest;
 
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	write_sha1_file_prepare(buf, len, type, sha1, &newdigest, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (has_sha1_file(sha1))
+	if (digestp) *digestp = newdigest;
+	if (has_sha1_file(sha1)) {
+		mdigest_t old_digest;
+		if (has_sha1_file_digest(sha1, &old_digest)) {
+			if (mdigest_tst(&newdigest, &old_digest)) {
+				die("hash collision");
+			}
+		}
 		return 0;
-	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
+	}
+	return write_loose_object(sha1, &newdigest, hdr, hdrlen, buf, len, 0);
 }
 
 int force_object_loose(const unsigned char *sha1, time_t mtime)
@@ -2538,6 +2819,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 	char hdr[32];
 	int hdrlen;
 	int ret;
+	mdigest_t * const digestp = NULL;
 
 	if (has_loose_object(sha1))
 		return 0;
@@ -2545,7 +2827,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 	if (!buf)
 		return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
 	hdrlen = sprintf(hdr, "%s %lu", typename(type), len) + 1;
-	ret = write_loose_object(sha1, hdr, hdrlen, buf, len, mtime);
+	ret = write_loose_object(sha1, digestp, hdr, hdrlen, buf, len, mtime);
 	free(buf);
 
 	return ret;
@@ -2574,6 +2856,85 @@ int has_sha1_file(const unsigned char *sha1)
 	return has_loose_object(sha1);
 }
 
+int has_sha1_file_digest(const unsigned char *sha1, mdigest_t *digestp)
+{
+	struct pack_entry e;
+	/*
+	 * builtin/send-pack.c uses a null SHA1 (all bytes zero) to
+	 * indicate that a SHA-1 hash does not exist.  We explicitly
+	 * return 0 for this case, for correct behavior even if we
+	 * somehow get that value into the database.
+	 */
+	if (!hashcmp(sha1, null_sha1)) return 0;
+	if (find_pack_entry(sha1, &e)) {
+		if (e.has_mdigest) {
+			if (digestp) *digestp = e.mdigest;
+			return 1;
+		} else {
+#ifdef PACKDB
+			if (e.p && e.p->pack_local) {
+				/*
+				 * We have a local pack file, but could not
+				 * find the CRC, so we first check if the
+				 * CRC is still stored for loose objects.
+				 * Then we try packdb (separate database for
+				 * packed objects) and if it is not there, we
+				 * compute it from scratch and add it to
+				 * packdb.
+				 */
+				if (has_loose_object_local_digest(sha1,
+							       digestp)) {
+					return 1;
+				} else {
+					int status ;
+					packdb_open();
+					status = (packdb_lookup(sha1,
+								digestp)
+						  == 1);
+					if (status == 0) {
+						unsigned long len;
+						enum object_type type;
+						mdigest_t digest;
+						mdigest_context_t mdc;
+						mdigest_Init(&mdc,
+							     MDIGEST_DEFAULT);
+						void *buf = read_sha1_file
+							(sha1, &type, &len);
+						mdigest_Update(&mdc, buf, len);
+						mdigest_Final(&digest, &mdc);
+						switch(packdb_process
+						       (sha1, &digest)) {
+						case 0:
+							if (digestp)
+								*digestp
+								 = digest;
+							status = 1;
+							break;
+						case 1:
+							error("packdb insert"
+							      " botched");
+							status = 0;
+							break;
+						case -1:
+							error("packdb failed");
+							status = 0;
+							break;
+						}
+					}
+					packdb_close();
+					return status;
+				}
+			} else {
+				return 0;
+			}
+#else
+			return has_loose_object_local_digest(sha1, digestp);
+#endif
+		}
+	}
+	return has_loose_object_digest(sha1, digestp);
+}
+
 static void check_tree(const void *buf, size_t size)
 {
 	struct tree_desc desc;
@@ -2602,7 +2963,8 @@ static void check_tag(const void *buf, size_t size)
 		die("corrupt tag");
 }
 
-static int index_mem(unsigned char *sha1, void *buf, size_t size,
+static int index_mem(unsigned char *sha1, mdigest_t *digestp,
+		     void *buf, size_t size,
 		     enum object_type type,
 		     const char *path, unsigned flags)
 {
@@ -2631,24 +2993,27 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 		if (type == OBJ_TAG)
 			check_tag(buf, size);
 	}
-
 	if (write_object)
-		ret = write_sha1_file(buf, size, typename(type), sha1);
+		ret = write_sha1_file_extended(buf, size, typename(type), sha1,
+					       digestp);
 	else
-		ret = hash_sha1_file(buf, size, typename(type), sha1);
+		ret = hash_sha1_file_extended(buf, size, typename(type), sha1,
+				     digestp);
 	if (re_allocated)
 		free(buf);
 	return ret;
 }
 
-static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
+static int index_pipe(unsigned char *sha1, mdigest_t *digestp,
+		      int fd, enum object_type type,
 		      const char *path, unsigned flags)
 {
 	struct strbuf sbuf = STRBUF_INIT;
 	int ret;
 
 	if (strbuf_read(&sbuf, fd, 4096) >= 0)
-		ret = index_mem(sha1, sbuf.buf, sbuf.len, type,	path, flags);
+		ret = index_mem(sha1, digestp, sbuf.buf, sbuf.len, type,
+				path, flags);
 	else
 		ret = -1;
 	strbuf_release(&sbuf);
@@ -2657,24 +3022,26 @@ static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
 
 #define SMALL_FILE_SIZE (32*1024)
 
-static int index_core(unsigned char *sha1, int fd, size_t size,
+static int index_core(unsigned char *sha1, mdigest_t *digestp,
+		      int fd, size_t size,
 		      enum object_type type, const char *path,
 		      unsigned flags)
 {
 	int ret;
 
 	if (!size) {
-		ret = index_mem(sha1, NULL, size, type, path, flags);
+		ret = index_mem(sha1, digestp, NULL, size, type, path, flags);
 	} else if (size <= SMALL_FILE_SIZE) {
 		char *buf = xmalloc(size);
 		if (size == read_in_full(fd, buf, size))
-			ret = index_mem(sha1, buf, size, type, path, flags);
+			ret = index_mem(sha1, digestp,
+					buf, size, type, path, flags);
 		else
 			ret = error("short read %s", strerror(errno));
 		free(buf);
 	} else {
 		void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-		ret = index_mem(sha1, buf, size, type, path, flags);
+		ret = index_mem(sha1, digestp, buf, size, type, path, flags);
 		munmap(buf, size);
 	}
 	return ret;
@@ -2692,41 +3059,122 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
  * avoid mmaping it in core is to deal with large binary blobs, and
  * by definition they do _not_ want to get any conversion.
  */
-static int index_stream(unsigned char *sha1, int fd, size_t size,
+static int index_stream(unsigned char *sha1, mdigest_t *digestp,
+			int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
-	return index_bulk_checkin(sha1, fd, size, type, path, flags);
+#if 1
+	int result = index_bulk_checkin(sha1, fd, size, type, path, flags);
+	if (digestp) {
+		if (result || !has_sha1_file_digest(sha1, digestp)) {
+			mdigest_clear(digestp);
+		}
+	}
+	return result;
+#else
+	struct child_process fast_import;
+	char export_marks[512];
+	const char *argv[] = { "fast-import", "--quiet", export_marks, NULL };
+	char tmpfile[512];
+	char fast_import_cmd[512];
+	char buf[512];
+	int len, tmpfd;
+
+	strcpy(tmpfile, git_path("hashstream_XXXXXX"));
+	tmpfd = git_mkstemp_mode(tmpfile, 0600);
+	if (tmpfd < 0)
+		die_errno("cannot create tempfile: %s", tmpfile);
+	if (close(tmpfd))
+		die_errno("cannot close tempfile: %s", tmpfile);
+	sprintf(export_marks, "--export-marks=%s", tmpfile);
+
+	memset(&fast_import, 0, sizeof(fast_import));
+	fast_import.in = -1;
+	fast_import.argv = argv;
+	fast_import.git_cmd = 1;
+	if (start_command(&fast_import))
+		die_errno("index-stream: git fast-import failed");
+
+	len = sprintf(fast_import_cmd, "blob\nmark :1\ndata %lu\n",
+		      (unsigned long) size);
+	write_or_whine(fast_import.in, fast_import_cmd, len,
+		       "index-stream: feeding fast-import");
+	while (size) {
+		char buf[10240];
+		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
+		ssize_t actual;
+
+		actual = read_in_full(fd, buf, sz);
+		if (actual < 0)
+			die_errno("index-stream: reading input");
+		if (write_in_full(fast_import.in, buf, actual) != actual)
+			die_errno("index-stream: feeding fast-import");
+		size -= actual;
+	}
+	if (close(fast_import.in))
+		die_errno("index-stream: closing fast-import");
+	if (finish_command(&fast_import))
+		die_errno("index-stream: finishing fast-import");
+
+	tmpfd = open(tmpfile, O_RDONLY);
+	if (tmpfd < 0)
+		die_errno("index-stream: cannot open fast-import mark");
+	len = read(tmpfd, buf, sizeof(buf));
+	if (len < 0)
+		die_errno("index-stream: reading fast-import mark");
+	if (close(tmpfd) < 0)
+		die_errno("index-stream: closing fast-import mark");
+	if (unlink(tmpfile))
+		die_errno("index-stream: unlinking fast-import mark");
+	if (len != 44 ||
+	    memcmp(":1 ", buf, 3) ||
+	    get_sha1_hex(buf + 3, sha1))
+		die_errno("index-stream: unexpected fast-import mark: <%s>", buf);
+	/*
+	 * since we got a sha1 value from fast-import, an mds file was
+	 * created, so we can just look up the digest.  Just in case, we
+	 * clear the digest if the lookup failed.
+	 */
+	if (digestp) {
+		if (!has_sha1_file_digest(sha1, digestp)) {
+			mdigest_clear(digestp);
+		}
+	}
+	return 0;
+#endif
 }
 
-int index_fd(unsigned char *sha1, int fd, struct stat *st,
-	     enum object_type type, const char *path, unsigned flags)
+int index_fd_extended(unsigned char *sha1, mdigest_t *digestp,
+		      int fd, struct stat *st,
+		      enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
 	size_t size = xsize_t(st->st_size);
 
 	if (!S_ISREG(st->st_mode))
-		ret = index_pipe(sha1, fd, type, path, flags);
+		ret = index_pipe(sha1, digestp, fd, type, path, flags);
 	else if (size <= big_file_threshold || type != OBJ_BLOB)
-		ret = index_core(sha1, fd, size, type, path, flags);
+		ret = index_core(sha1, digestp, fd, size, type, path, flags);
 	else
-		ret = index_stream(sha1, fd, size, type, path, flags);
+		ret = index_stream(sha1, digestp,
+				   fd, size, type, path, flags);
 	close(fd);
 	return ret;
 }
 
-int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags)
+int index_path_extended(unsigned char *sha1, mdigest_t *digestp, const char *path, struct stat *st, unsigned flags)
 {
 	int fd;
 	struct strbuf sb = STRBUF_INIT;
-
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
 		fd = open(path, O_RDONLY);
 		if (fd < 0)
 			return error("open(\"%s\"): %s", path,
 				     strerror(errno));
-		if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+		if (index_fd_extended(sha1, digestp, fd, st,
+				      OBJ_BLOB, path, flags) < 0)
 			return error("%s: failed to insert into database",
 				     path);
 		break;
@@ -2737,8 +3185,10 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 			             errstr);
 		}
 		if (!(flags & HASH_WRITE_OBJECT))
-			hash_sha1_file(sb.buf, sb.len, blob_type, sha1);
-		else if (write_sha1_file(sb.buf, sb.len, blob_type, sha1))
+			hash_sha1_file_extended(sb.buf, sb.len, blob_type, sha1,
+				       digestp);
+		else if (write_sha1_file_extended(sb.buf, sb.len, blob_type,
+						  sha1, digestp))
 			return error("%s: failed to insert into database",
 				     path);
 		strbuf_release(&sb);
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index f4e8f43..53e1b7d 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -34,17 +34,18 @@ fi
 # git init has been done in an empty repository.
 # make sure it is empty.
 
-find .git/objects -type f -print >should-be-empty
+find .git/objects -type f -a  ! -name mdsd -a ! -name packdb -print >should-be-empty
 test_expect_success \
     '.git/objects should be empty after git init in an empty repo.' \
     'cmp -s /dev/null should-be-empty'
 
-# also it should have 2 subdirectories; no fan-out anymore, pack, and info.
-# 3 is counting "objects" itself
-find .git/objects -type d -print >full-of-directories
+# also it should have 3 subdirectories;
+# no fan-out anymore, pack, and info and mdsd.
+# 4 (listed by find) is the result of counting "objects" as well.
+find .git/objects \( -type d -o -name mdsd  \) -print >full-of-directories
 test_expect_success \
-    '.git/objects should have 3 subdirectories.' \
-    'test $(wc -l < full-of-directories) = 3'
+    '.git/objects should have 3 subdirectories or files.' \
+    'test $(wc -l < full-of-directories) = 4'
 
 ################################################################
 # Test harness
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/6] Add the mdigest, mdsdb, and packdb modules.
From: Bill Zaumen @ 2011-12-21  7:08 UTC (permalink / raw)
  To: git, peff, pclouds, gitster

* The mdigest module manipulates and creates message digests. The
files are mdigest.h and mdigest.c. These can be modified to
include additional digests.

* The mdsdb module stores message digests for loose objects. The
files are mdsdb.h and objd-mdsdb.c. The variable MDSDB in the
Makefile selects the implementation (currently only objd-mdsdb.c, which
keeps each digest in its own file).

* The packdb module stores digests in a space-efficient form. It is
intended for unusual conditions (e.g., caching a digest for an object
in an alternative object database that does not support digests). The
implementation provided uses GDBM, however it is easy to add alternative
implementation.  The choice of implementations is determined by the
value of the PACKDB variable in the Makefile (if undefined, packdb is
not used).

These modules are mostly self-contained - there is little interaction
with the rest of git beyond calling functions such as "die" for a few
error conditions.  Documentation for the functions in these modules
is in the header files.

The Makefile changes will be in a subsequent commit.

Signed-off-by: Bill Zaumen <bill.zaumen+git@gmail.com>
---
 gdbm-packdb.c |  249 +++++++++++++++++++++++++++++++++++++++++
 mdigest.c     |  221 +++++++++++++++++++++++++++++++++++++
 mdigest.h     |  334 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mdsdb.h       |  192 ++++++++++++++++++++++++++++++++
 objd-mdsdb.c  |  340 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 packdb.h      |   93 ++++++++++++++++
 6 files changed, 1429 insertions(+), 0 deletions(-)
 create mode 100644 gdbm-packdb.c
 create mode 100644 mdigest.c
 create mode 100644 mdigest.h
 create mode 100644 mdsdb.h
 create mode 100644 objd-mdsdb.c
 create mode 100644 packdb.h

diff --git a/gdbm-packdb.c b/gdbm-packdb.c
new file mode 100644
index 0000000..6443ec5
--- /dev/null
+++ b/gdbm-packdb.c
@@ -0,0 +1,249 @@
+#include<sys/types.h>
+#include<sys/stat.h>
+#include <sys/param.h>
+#include<stdio.h>
+#include<string.h>
+#include<malloc.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <time.h>
+#include <pthread.h>
+#include <errno.h>
+#include <gdbm.h>
+
+#include "cache.h"
+#include "packdb.h"
+
+static void nsleep() {
+#if _POSIX_C_SOURCE >= 199309L
+	struct timespec ts;
+	ts.tv_sec = 0;
+	ts.tv_nsec = 100000;
+	nanosleep(&ts, NULL);
+#else
+	sleep(1);
+#endif
+}
+
+static int initialized = 0;
+
+static GDBM_FILE dbf = NULL;
+char *dbf_name;
+static int dbf_depth = 0;
+
+pthread_mutex_t gdbm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void packdb_close_nolock(void);
+
+void packdb_init(void) {
+	char *last;
+	pthread_mutex_lock(&gdbm_mutex);
+	if (initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return;
+	}
+	dbf_name = get_object_packdb_node();
+	last = rindex(dbf_name, '/');
+	*last = 0;
+	if (!access(dbf_name, R_OK|W_OK|X_OK)) {
+		initialized = 1;
+	}
+	*last = '/';
+	pthread_mutex_unlock(&gdbm_mutex);
+}
+
+int packdb_initialized(void) {
+  return initialized;
+}
+
+static void packdb_open_nolock(void) {
+	if (dbf_depth == 0) {
+	AGAIN_W:
+		dbf = gdbm_open(dbf_name, 0, GDBM_WRCREAT, PERM_GROUP, NULL);
+		if (dbf == NULL && gdbm_errno == GDBM_CANT_BE_WRITER) {
+			nsleep();
+			goto AGAIN_W;
+		}
+	}
+	dbf_depth++;
+}
+
+void packdb_open(void) {
+	pthread_mutex_lock(&gdbm_mutex);
+	packdb_open_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+}
+
+
+int packdb_lookup(const unsigned char *sha1, mdigest_t *digestp) {
+	datum key;
+	datum ovalue;
+	pthread_mutex_lock(&gdbm_mutex);
+
+	if (!initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+
+	packdb_open_nolock();
+	if (dbf == NULL) {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	ovalue = gdbm_fetch(dbf, key);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+
+	if (ovalue.dptr == NULL) return 0;
+	if (digestp) {
+		int len;
+		int wsize = (int) *(unsigned char *)(ovalue.dptr);
+		unsigned char *buffer = (unsigned char *)(ovalue.dptr + 1);
+		len = get_mdigest_required_len(wsize);
+		if (len + 1 > ovalue.dsize)
+			die("existing db entry for %s corrupted [1], len = %d,"
+			    " ovalue.dsize = %d",
+			    sha1_to_hex(sha1), len, ovalue.dsize);
+		mdigest_load(digestp, wsize, buffer);
+	}
+	free(ovalue.dptr);
+	/* if (digestp) *digestp = (old_digest); */
+	return 1;
+}
+
+int packdb_remove(const unsigned char *sha1) {
+	datum key;
+	int result;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized)  || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+	packdb_open_nolock();
+	result = gdbm_delete(dbf, key);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+	return result;
+}
+
+
+int packdb_process(const unsigned char *sha1, mdigest_t *digestp) {
+	datum key;
+	datum nvalue;
+	datum ovalue;
+	mdigest_t newdigest = (*digestp);
+	mdigest_t old_digest;
+	newdigest.hdr.lhdr.wcode = get_mdigest_wcode(digestp);
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized) || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	key.dptr = (char *)sha1;
+	key.dsize = 20;
+
+	nvalue.dptr = (char *)&(newdigest.hdr.lhdr.wcode);
+	nvalue.dsize = 1 + get_mdigest_len(digestp);
+
+	packdb_open_nolock();
+	ovalue = gdbm_fetch(dbf, key);
+	if (dbf == dbf && ovalue.dptr == NULL) {
+		int status;
+		status = gdbm_store(dbf, key, nvalue, GDBM_INSERT);
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		switch (status) {
+		case 0:
+			return 0;
+		case -1:
+		  error("could not enter crc into database - key = %s",
+		      sha1_to_hex(sha1));
+		      return -1;
+		case 1:
+			return 1;
+		}
+		return -1;	/* should not occur */
+	} else if (ovalue.dptr == NULL) {
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		return 0;
+	} else {
+		int wcode, len;
+		unsigned char *buffer;
+		packdb_close_nolock();
+		pthread_mutex_unlock(&gdbm_mutex);
+		wcode = (int) *(unsigned char *)ovalue.dptr;
+		len = get_mdigest_required_len(wcode);
+		if (len + 1 > ovalue.dsize)
+			die("existing db entry for %s corrupted [2]",
+			    sha1_to_hex(sha1));
+		buffer = (unsigned char *) ovalue.dptr + 1;
+		mdigest_load(&old_digest, wcode, buffer);
+		free(ovalue.dptr);
+		/*
+		 * Both old_digest and newdigest are in network byte order.
+		 */
+		if (mdigest_tst(&old_digest, digestp)) {
+			die("SHA1  COLLISION WHEN INSERTING OBJECT %s",
+			    sha1_to_hex(sha1));
+			return -1;
+		}
+		return 1;
+	}
+}
+
+int packdb_reorganize() {
+	int status;
+	pthread_mutex_lock(&gdbm_mutex);
+	if ((!initialized)  || dbf == NULL) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return -1;
+	}
+	packdb_open_nolock();
+	status = gdbm_reorganize(dbf);
+	packdb_close_nolock();
+	pthread_mutex_unlock(&gdbm_mutex);
+	return status;
+}
+
+
+static void packdb_close_nolock(void) {
+	  if (!initialized) {
+		return;
+	  }
+	  dbf_depth--;
+	  if (dbf_depth == 0 && dbf != NULL) {
+		gdbm_close(dbf);
+		dbf = NULL;
+	  }
+	  if (dbf_depth < 0) {
+		die("packdb dbf_depth %d < 0", dbf_depth);
+	  }
+	  return;
+}
+
+void packdb_close(void) {
+	  pthread_mutex_lock(&gdbm_mutex);
+	  packdb_close_nolock();
+	  pthread_mutex_unlock(&gdbm_mutex);
+}
+
+void packdb_finish(void) {
+	pthread_mutex_lock(&gdbm_mutex);
+	if (!initialized) {
+		pthread_mutex_unlock(&gdbm_mutex);
+		return;
+	}
+	if (dbf != NULL) gdbm_close(dbf);
+	dbf = NULL;
+	dbf_depth = 0;
+	initialized = 0;
+	pthread_mutex_unlock(&gdbm_mutex);
+}
diff --git a/mdigest.c b/mdigest.c
new file mode 100644
index 0000000..94c72a2
--- /dev/null
+++ b/mdigest.c
@@ -0,0 +1,221 @@
+#include "mdigest.h"
+
+struct mdigest_config {
+	const enum mdigest_type mdt; /* table key - must match index */
+	const char *name;		    /* print name */
+	const int wcode;		    /* wsize code for MDS files*/
+	const int wsize;			    /* word size for MDS file entry */
+	const int blen;			    /* byte length of digest (used) */
+};
+
+static const struct mdigest_config mdigest_table[] = {
+	{MDIGEST_CRC, "CRC-32", MDIGEST_CRC_WCODE, 1, 4},
+	{MDIGEST_SHA1, "SHA-1", MDIGEST_SHA1_WCODE, 5, 20},
+	{MDIGEST_SHA256, "SHA-256", MDIGEST_SHA256_WCODE, 8, 32},
+	{MDIGEST_SHA512, "SHA-512", MDIGEST_SHA512_WCODE, 16, 64},
+};
+
+/*
+ * Table indexed by wcode;
+ */
+struct mdigest_config_aux {
+	int wcode;		    /* wcode code for MDS files*/
+	int wsize;		    /* word size for MDS file entry */
+	int blen;		    /* byte length of digest (used) */
+	enum mdigest_type mdt;	    /* message digest type */
+};
+
+static struct mdigest_config_aux *mdigest_aux_table;
+
+void mdigest_init(void) {
+	int i;
+	int n = sizeof(mdigest_table)
+	  / sizeof(struct mdigest_config);
+	int m = 0;
+
+	for (i = 0; i < n; i++) {
+		if (m < mdigest_table[i].wcode)
+			m = mdigest_table[i].wcode;
+	}
+	m += 1;			/* table size is one more than the max index */
+	mdigest_aux_table = (struct mdigest_config_aux *)
+	  xcalloc(m, sizeof (struct mdigest_config_aux));
+
+	for (i = 0; i < n; i++) {
+		int wc, ws, bl;
+		enum mdigest_type mdt;
+		wc = mdigest_table[i].wcode;
+		ws = mdigest_table[i].wsize;
+		bl = mdigest_table[i].blen;
+		mdt = mdigest_table[i].mdt;
+		
+		mdigest_aux_table[wc].wcode = wc;
+		mdigest_aux_table[wc].wsize = ws;
+		mdigest_aux_table[wc].blen = bl;
+		mdigest_aux_table[wc].mdt = mdt;
+	}
+}
+
+int get_mdigest_wsize(mdigest_t *mdigestp) {
+	return mdigest_table[mdigestp->hdr.info.mdt].wsize;
+}
+
+const char *get_mdigest_name(enum mdigest_type mdt)
+{
+	return mdigest_table[mdt].name;
+}
+
+int get_mdigest_wcode(const mdigest_t *digestp) {
+	return mdigest_table[digestp->hdr.info.mdt].wcode;
+}
+
+int get_mdigest_wcode_by_type(enum mdigest_type type) {
+	return mdigest_table[type].wcode;
+}
+
+int get_mdigest_wsize_by_type(enum mdigest_type type) {
+	return mdigest_table[type].wsize;
+}
+
+int get_mdigest_required_len_by_type(enum mdigest_type type) {
+	return mdigest_table[type].blen;
+}
+
+int mdigest_to_buffer(unsigned char *buffer, mdigest_t *digestp, int blen)
+{
+	int len = blen;
+	if (len > digestp->hdr.info.len) len = digestp->hdr.info.len;
+	memcpy(buffer, digestp->buffer.buffer, len);
+	memset(buffer + len, 0, blen - len);
+	return len;
+}
+
+
+int mdigest_tst(mdigest_t *md1p, mdigest_t *md2p)
+{
+	int i, n;
+	int n32;
+	unsigned char *x, *y;
+	uint32_t *x32, *y32;
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+	int n64;
+	uint64_t *x64, *y64;
+#endif	
+	if (md1p->hdr.info.mdt - md2p->hdr.info.mdt) return -1;
+	n = md1p->hdr.info.len;
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+	n64 = n/8;
+	x64 = (md1p->buffer.buffer64);
+	y64 = (md2p->buffer.buffer64);
+	for (i = 0; i < n64; i++) {
+		if (x64[i] != y64[i]) return -1;
+	}
+	i *= 2;
+#else
+	i = 0;
+#endif	
+	n32 = n/4;
+	if (i != n32) {
+		x32 = (md1p->buffer.buffer32);
+		y32 = (md2p->buffer.buffer32);
+		while (i < n32) {
+			if (x32[i] != y32[i]) return -1;
+			i++;
+		}
+		i *= 4;
+		if (i < n) {
+			x = (md1p->buffer.buffer);
+			y = (md2p->buffer.buffer);
+			while (i < n) {
+				if (x[i] != y[i]) return -1;
+				i++;
+			}
+		}
+	}
+	return 0;
+
+}
+
+void mdigest_Init(mdigest_context_t *ctx,
+			 enum mdigest_type mdt)
+{
+	ctx->mdt = mdt;
+	switch (mdt) {
+	case MDIGEST_CRC:
+		ctx->context.crc32 = crc32(0, NULL, 0);
+		break;
+	case MDIGEST_SHA1:
+		git_SHA1_Init(&ctx->context.sha1);
+		break;
+	case MDIGEST_SHA256:
+		EVP_MD_CTX_init(&ctx->context.evp);
+		EVP_DigestInit_ex(&ctx->context.evp, EVP_sha256(), NULL);
+		break;
+	case MDIGEST_SHA512:
+		EVP_MD_CTX_init(&ctx->context.evp);
+		EVP_DigestInit_ex(&ctx->context.evp, EVP_sha512(), NULL);
+		break;
+	}
+}
+
+void mdigest_Update(mdigest_context_t *ctx,
+			   const void *dataIn,
+			   unsigned long len)
+{
+	switch (ctx->mdt) {
+	case MDIGEST_CRC:
+		ctx->context.crc32 = crc32(ctx->context.crc32, dataIn, len);
+		break;
+	case MDIGEST_SHA1:
+		git_SHA1_Update(&(ctx->context.sha1), dataIn, len);
+		break;
+	case MDIGEST_SHA256:
+	case MDIGEST_SHA512:
+		EVP_DigestUpdate(&(ctx->context.evp), dataIn, len);
+		break;
+	}
+}
+
+void mdigest_Final(mdigest_t *digest,
+			  mdigest_context_t *ctx)
+{
+	enum mdigest_type mdt = ctx->mdt;
+	digest->hdr.info.mdt = mdt;
+	digest->hdr.info.len = mdigest_table[mdt].blen;
+	switch (mdt) {
+	case MDIGEST_CRC:
+		digest->buffer.crc32 = htonl(ctx->context.crc32);
+		break;
+	case MDIGEST_SHA1:
+		git_SHA1_Final(digest->buffer.buffer, &(ctx->context.sha1));
+		break;
+	case MDIGEST_SHA256:
+	case MDIGEST_SHA512:
+		EVP_DigestFinal_ex(&(ctx->context.evp),
+				  digest->buffer.buffer, NULL);
+		break;
+	}
+}
+
+int mdigest_load(mdigest_t *digestp, int wcode, unsigned char *buffer)
+{
+	int len;
+	if (mdigest_aux_table[wcode].wcode == 0) return -1;
+	digestp->hdr.info.mdt = mdigest_aux_table[wcode].mdt;
+	len = mdigest_aux_table[wcode].blen;
+	digestp->hdr.info.len = len;
+	/*
+	 * If buffer is NULL, we assume that one has already copied
+	 * the data into the buffer so we only had to set up the
+	 * other fields in the mdigest_t structure.
+	 */
+	if (buffer != NULL)
+		memcpy(digestp->buffer.buffer, buffer, len);
+	return 0;
+}
+
+int get_mdigest_required_len(int wcode) {
+	if (mdigest_aux_table[wcode].wcode == 0) return 0;
+	return mdigest_aux_table[wcode].blen;
+}
+
diff --git a/mdigest.h b/mdigest.h
new file mode 100644
index 0000000..8a1d713
--- /dev/null
+++ b/mdigest.h
@@ -0,0 +1,334 @@
+/*
+ * This file is included in cache.h, so the following is just in
+ * case cache.h is not included at all.
+ */
+#include "cache.h"
+/*
+ * Define here because cache.h needs some of the typedefs below.
+ */
+#ifndef MDIGEST_H
+#define MDIGEST_H
+#include <stdint.h>
+
+/**
+ * Enumeration to list supported message digests.
+ */
+enum mdigest_type {
+	MDIGEST_CRC,
+	MDIGEST_SHA1,
+	MDIGEST_SHA256,
+	MDIGEST_SHA512
+};
+
+/*
+ * Constants defining wcode values.  These are used in external
+ * representations of a digest to code the digest type.  The maximum
+ * number of digests supported is 255 (0 is reserved to indicate an
+ * unknown or uninitialized digest type).
+ */
+#define MDIGEST_CRC_WCODE	1
+#define	MDIGEST_SHA1_WCODE	5
+#define	MDIGEST_SHA256_WCODE	8
+#define	MDIGEST_SHA512_WCODE	16
+
+/*
+ * Standard digest.
+ */
+#ifndef MDIGEST_DEFAULT
+#define MDIGEST_DEFAULT MDIGEST_SHA256
+#endif
+
+#define MAX_DIGEST_LENGTH 64	/* set to maximum length we'll support */
+
+/**
+ * Message digest data structure.
+ * Holds a message digest along with some additional information.
+ * The hdr.info field specifies the digest type and the digest length in bytes.
+ * The hdr.lhdr.wcode field provides a code indicating the digest type
+ * as stored externally (this is set temporarily for a couple of
+ * operations and in general should not be used.
+ * The buffer union allows the buffer to be viewed as an unsigned, 32-bit
+ * integer, an array of unsigned characters, an array of 32-bit unsigned
+ * integers, or an array of 64-bit unsigned integers (on machines that
+ * support 64-bit unsigned integer arithmetic.
+ *
+ * In nearly all cases, one should use the access functions.
+ */
+typedef struct mdigest {
+	union {
+	  	struct mdigest_info {
+			enum mdigest_type mdt;
+			int len;
+		} info;
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+		uint64_t align64; /* to allow 64-bit operations on buffers */
+#define MDIGEST_SPACER_SIZE ((sizeof (struct mdigest_info) > 8)? \
+			     (sizeof (struct mdigest_info) - 1): 7)
+#else
+#define MDIGEST_SPACER_SIZE (sizeof (struct mdigest_info) - 1)
+#endif
+	  	/*
+		 * This is used destructively when a digest is
+		 * about to be written to disk (e.g., for the
+		 * loose object digests). The address of the
+		 * wcode member will provide a buffer prefaced
+		 * by a byte containing a wcode tag to indicate
+		 * the digest type.
+		 */
+		struct {
+			unsigned char spacer[MDIGEST_SPACER_SIZE];
+			unsigned char wcode;
+		} lhdr;		/* header for loose objects */
+	} hdr;
+	union {
+		uint32_t crc32;
+		unsigned char buffer[MAX_DIGEST_LENGTH];
+		uint32_t buffer32[MAX_DIGEST_LENGTH/4];
+#if (_POSIX_V6_LP64_OFF64 || _POSIX_V6_LPBIG_OFFBIG)
+		uint64_t buffer64[MAX_DIGEST_LENGTH/8];
+#endif  
+	} buffer;
+} mdigest_t;
+
+/**
+ * Get the message digest type of a message digest.
+ * Arguments:
+ *     mdp - a pointer to the message digest.
+ *
+ * Returns:
+ *     the type of the message digest.
+ *
+ * Precoditions:
+ *     the message digest must have been created (by calling the function
+ *     mdigest_Final).
+ */
+static inline enum mdigest_type get_mdigest_type(const mdigest_t *mdp) {
+	return mdp->hdr.info.mdt;
+}
+
+/**
+ * Get the length of a message digest.
+ * Arguments:
+ *     mdp - a pointer to the message digest.
+ *
+ * Returns:
+ *    the length in bytes of the message digest
+ *
+ * Precoditions:
+ *     the message digest must have been created (by calling the function
+ *     mdigest_Final).
+ */
+static inline int get_mdigest_len(const mdigest_t *mdp) {
+	return mdp->hdr.info.len;
+}
+
+
+/**
+ * Get the buffer containing a message digest.
+ * Arguments:
+ *     mdp - a pointer to the message digest.
+ *
+ * Returns:
+ *    the buffer (unsigned char array) of the message digest.
+ *
+ * Precoditions:
+ *     the message digest must have been created (by calling the function
+ *     mdigest_Final).
+ */
+static inline const unsigned char *get_mdigest_buffer(const mdigest_t *mdp) {
+  return mdp->buffer.buffer;
+}
+
+/**
+ * Test to see if two message digests are identical.
+ *
+ * Arguments:
+ *       md1p - a pointer to the first digest
+ *       md2p - a pointer to the second digest
+ * Returns:
+ *   0 if the digests match; -1 otherwise
+ */
+extern int mdigest_tst(mdigest_t *md1p, mdigest_t *md2p);
+
+/**
+ * Get the print-name of a message digest.
+ * Arguments:
+ *   mdt  the message digest's type
+ * Returns:
+ *   The name of the digest, suitable for printing or displaying.
+ */
+extern const char *get_mdigest_name(enum mdigest_type mdt);
+
+
+/**
+ * Message digest context.
+ * This is data structure maintains the state of a message-digest
+ * computation.  Each field in the union specifies the context needed
+ * for a particular digest or set of digests.
+ */
+typedef struct  mdigest_context {
+	union {
+  		uint32_t crc32;		/* minimal digest for testing. */
+  		git_SHA_CTX sha1;	/* SHA-1 (git-internal impl) */
+		EVP_MD_CTX evp;		/* For openssl EVP digest functions */
+		/* Add the others later */
+	} context;
+	enum mdigest_type mdt;
+} mdigest_context_t;
+
+/**
+ * Initialize the mdigest module.
+ */
+extern void mdigest_init(void);
+
+/*
+ *  Modeled after Git SHA-1 API, which follows that used by openssl
+ */
+
+/**
+ * Initialize a message digest context.
+ * Arguments:
+ *   ctx - a pointer to the context to initialize
+ *   mdt - the type of the message digest.
+ */
+extern void mdigest_Init(mdigest_context_t *ctx,
+				enum mdigest_type mdt);
+
+
+/**
+ * Update a message digest context
+ * Arguments:
+ *      ctx - a pointer to the context to initialize
+ *   dataIn - the data to add to the digest
+ *      len - the length of dataIn
+ * 
+ */
+extern void mdigest_Update(mdigest_context_t *ctx,
+			   const void *dataIn,
+			   unsigned long len);
+
+/**
+ * Complete and provide a message digest.
+ * Arguments:
+ *   digest - the data structure that will store the message digest.
+ *      ctx - a pointer to the context to initialize
+ */
+extern void mdigest_Final(mdigest_t *digest, mdigest_context_t *ctx);
+
+/**
+ * Initialize a digest given external data representing the digest.
+ *
+ * Arguments:
+ *    digestp - a pointer to the message digest to initialize
+ *      wcode - the code (external representation) representing the 
+ *              message type
+ *     buffer - the digest itself, as a sequence of bytes.
+ *  
+ * Returns:
+ *    0 on success, -1 on error
+ */
+extern int mdigest_load(mdigest_t *digestp, int wcode, unsigned char *buffer);
+
+/**
+ * Get the required message-digest length.
+ * Arguments:
+ *  wcode - a code (external representation) indicating the type of digest.
+ * Returns:
+ *   The message-digest length in bytes; 0 for unrecognized codes
+ */
+extern int get_mdigest_required_len(int wcode);
+
+/**
+ * Get the message-digest code for a message digest.
+ * Arguments:
+ *      digestp - a pointer to the message digest
+ * Return:
+ *    the message digest code
+ */
+extern int get_mdigest_wcode(const mdigest_t *digestp);
+
+
+/**
+ * Get the message-digest code for a message digest type.
+ * Arguments:
+ *      type - the type of a message digest
+ * Return:
+ *    the message digest code
+ */
+extern int get_mdigest_wcode_by_type(enum mdigest_type type);
+
+/**
+ * Get the word size for a digest given the digest type
+ * Arguments:
+ *   type - the type of the message digest
+ * Returns:
+ *  the size in 32-bit words of a message digest of a given type.
+ *
+ */
+extern int get_mdigest_wsize_by_type(enum mdigest_type type);
+
+/**
+ * Get the word size for a digest
+ * Arguments:
+ *   digestp - a pointer to a messag digest.
+ * Returns:
+ *  the size in 32-bit words of a message digest of a given type.
+ */
+extern int get_mdigest_wsize(mdigest_t *digestp);
+
+/**
+ * Get the required message-digest length by type.
+ * Arguments:
+ *   type - the type of the message digest
+ * Returns:
+ *   The message-digest length in bytes
+ */
+extern int get_mdigest_required_len_by_type(enum mdigest_type type);
+
+/**
+ * Copy a digest to a buffer.
+ * If the buffer is longer than required, it will be padded with null
+ * bytes.  If it is shorter than required, only the number of bytes
+ * given by blen will be copied.
+ * Arguments:
+ *    buffer - the buffer to store the digest, represented as a
+ *             sequence of bytes
+ *   digestp - the digest
+ *      blen - the length of the buffer
+ * Returns:
+ *    the number of bytes from the digest copied into the buffer
+ */
+extern int mdigest_to_buffer(unsigned char *buffer, 
+			     mdigest_t *digestp, int blen);
+
+/**
+ * Get the hexadecimal representation of a message digest.
+ * Arguments:
+ *    digestp - a pointer to the digest
+ * Returns:
+ *   a sequence of hexadecimal characters containing the digest
+ */
+extern char *mdigest_to_hex(const mdigest_t *digestp);  /* static buffer result! */
+
+/**
+ * Get the tagged hexadecimal representation of a message digest.
+ * The first two bytes represents the wcode value giving the digest
+ * type.
+ * Arguments:
+ *    digestp - a pointer to the digest
+ * Returns:
+ *   a sequence of hexadecimal characters containing the digest, prefaced
+ *   by two hexadecimal digits representing the type of digest
+ */
+extern char *mdigest_to_external_hex(const mdigest_t *digestp);  /* static buffer result! */
+
+/**
+ * Clear a message digest
+ * Arguments:
+ *   digestp - a pointer to a messag digest.
+ */
+static inline void mdigest_clear(mdigest_t *digestp) {
+	memset(digestp, 0, sizeof(mdigest_t));
+}
+
+#endif /* MDIGEST_H */
diff --git a/mdsdb.h b/mdsdb.h
new file mode 100644
index 0000000..5502985
--- /dev/null
+++ b/mdsdb.h
@@ -0,0 +1,192 @@
+#ifndef MDSDB_H
+#define MDSDB_H
+
+/**
+ * MD (Message Digest) Database Support.
+ *
+ * This module maintains a database mapping SHA-1 object keys to MDs
+ * (Message Digests) for purposes of detecting hash collisions.  The
+ * MDs are stored in the database as a sequence of bytes, prefaced
+ * by a one-byte code giving the MD type).  The functions allow for
+ * initialization, queries, adding new entries (with a collision
+ * check), and managing access to alternate databases.  The entries
+ * in the database correspond to Git loose objects
+ *
+ * The preprocessor symbol MDSDB determines the implementation of the
+ * module.
+ * Values:
+ *   0 - implement using directories and files - the first byte of a
+ *       SHA1 hash determines a subdirectory of ../objects/mdsd, and
+ *       the remaining bytes determine the file name, with the names
+ *       consisting of the hexadecimal representation of each byte's
+ *       value. The files then contain  a one byte code that determines
+*        the type of the MD, followed by the MD itself as a sequence of
+*        bytes. A value of 1 implies that packdb will also be used when
+*        creating pa
+ */
+
+#include<stdint.h>
+
+#include "cache.h"
+
+#if (MDSDB == 0)
+/**
+ * Opaque data type - because the typedef is for a pointer, we
+ * don't need the structure defined in files that use the pointer.
+ * We do need it defined somewhere, in this case in the file
+ * objd-mdsdb.c, which is the only place the fields are used.
+ */
+typedef struct objd_mdsdb *mdsdb_t;
+#endif
+
+/**
+ *  Initialize the database.
+ *  This opens a database file in the objects directory named mdsd,
+ *  used to store MDS of objects (uncompressed, excluding the header)
+ *  for hash-collision detection.
+ */
+extern void mdsdb_init(void);
+
+/**
+ * Check if the database has been initialized.
+ * Returns:
+ *   1 if mdsdb_init has been called; false otherwise.
+ */
+extern int mdsdb_initialized(void);
+
+/**
+ * Initializes alternative databases by adding them to a table with
+ * these databases closed.
+ */
+extern void mdsdb_init_alts();
+
+
+/**
+ * Open a database file.
+ *
+ * The default database can be read or written. alternate database
+ * files are read-only databases.  Multiple calls without intervening
+ * calls to mdsdb_close for a given argument will result in the same
+ * object being returned each successive time.  The pathname must match
+ * one stored by a call to mdsdb_init_alts.
+ *
+ * Arguments:
+ *    pathname - the pathname of the file; NULL for the default db;
+ *
+ * Returns:
+ *    the database (NULL indicates the default)
+ */
+extern mdsdb_t mdsdb_open(char *pathname);
+
+/**
+ * Open a database file given an alterate object database pointer.
+ *
+ * The default database can be read or written. alternate database
+ * files are read-only databases.  Multiple calls without intervening
+ * calls to mdsdb_close for a given argument will result in the same
+ * object being returned each successive time The argument must match
+ * an alternate object database pointer stored by a precding call to
+ * mdsdb_init_alts.
+ *
+ * Arguments:
+ *    alt - an alternate object database pointer (which provides the
+ *          pathname).
+ *
+ * Returns:
+ *    the database (NULL indicates the default)
+ */
+extern mdsdb_t mdsdb_open_alt(struct alternate_object_database *alt);
+
+/**
+ * Lookup a MD from a database.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *    digestp - a pointer to a uint32_t to store the returned value when
+ *              an entry in the database exists.
+ *
+ * Returns:
+ *   0 if no entry, 1 if there is an existing entry.
+ */
+extern int mdsdb_lookup(mdsdb_t dbf, const unsigned char *sha1,
+			mdigest_t *digestp);
+
+/**
+ * Remove a MD from a database.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *
+ * Returns:
+ *   0 on success; -1 if the entry did not exist or if an entry
+ *   could not be deleted
+ */
+extern int mdsdb_remove(mdsdb_t dbf, const unsigned char *sha1);
+
+/**
+ * Process a MD for a SHA-1 key.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *    digestp - the crc to store.
+ *
+ * Returns:
+ *   0 if this is a new entry; 1 if it is an existing entry, -1 if
+ *   an entry cannot be added ot the database.
+ *
+ * Errors:
+ *   Will call 'die' and exit if there is a hash collision. Will call
+ *   'error' if the value cannot be entered.
+ */
+extern int mdsdb_process(mdsdb_t dbf, const unsigned char *sha1,
+			 mdigest_t *digestp);
+
+/**
+ * Reorganize a MD database.
+ *
+ * Arguments:
+ *        dbf - the MD database; NULL for the default database
+ * Returns:
+ *   0 on success; -1 on failure
+ */
+extern int mdsdb_reorganize(mdsdb_t dbf);
+
+
+/**
+ * Close a  database file.
+ *
+ * If the same database was opened multiple times, a reference count is
+ * decremented and the the database will not be closed until the count
+ * reaches zero.  Calls to mdsdb_open or mdsdb_open_alt must be balanced
+ * by calls to mdsdb_close or mdsdb_close_alt.
+ *
+ * Arguments:
+ *        dbf - the MD database.
+ */
+extern void mdsdb_close(mdsdb_t dbf);
+
+/**
+ * Close a database file given an alternate object database pointer.
+ *
+ * If the same database was opened multiple times, a reference count is
+ * decremented and the the database will not be closed until the count
+ * reaches zero.  Calls to mdsdb_open or mdsdb_open_alt must be balanced
+ * by calls to mdsdb_close or mdsdb_close_alt.
+ *
+ * Arguments:
+ *       alt - a pointer ot an alternate object database
+ */
+extern void mdsdb_close_alt(struct alternate_object_database *alt);
+
+/**
+ * Shutdown the database files.
+ * This will shut down the default database and the cached alternative
+ * databases.  All others should be closed by calling crcb_alt_close
+ * explicitly
+ */
+extern void mdsdb_finish(void);
+
+#endif /*MDSDB_H */
diff --git a/objd-mdsdb.c b/objd-mdsdb.c
new file mode 100644
index 0000000..268c5c4
--- /dev/null
+++ b/objd-mdsdb.c
@@ -0,0 +1,340 @@
+#include<sys/types.h>
+#include "cache.h"
+#include "mdsdb.h"
+
+struct objd_mdsdb {
+  char *root;
+};
+
+static struct objd_mdsdb db;
+
+static mdsdb_t no_dbf = (mdsdb_t) 4;
+
+static mdsdb_t dbf = NULL;
+
+#define ALT_DBF_LIMIT  512
+
+
+struct alt_map {
+	struct objd_mdsdb db;
+	struct alternate_object_database *alt;
+	struct alt_map *refer;
+};
+
+struct alt_map alt_map[ALT_DBF_LIMIT];
+static int alt_in_use = 0;
+static int initialized = 0;
+
+
+void mdsdb_init(void) {
+	if (initialized) {
+		return;
+	}
+	dbf = &db;
+	db.root = get_object_mds_directory();
+	initialized = 1;
+}
+
+int mdsdb_initialized(void) {
+	return initialized;
+}
+
+static int setup_alt(struct alternate_object_database *alt, void *param) {
+	static char buffer[PATH_MAX];
+	int i;
+	int lim = alt->name - alt->base;
+	memcpy(buffer, alt->base, lim);
+	memcpy(buffer, alt->base, lim);
+	memcpy(buffer+lim, "mdsd", 4);
+	buffer[lim+4] = 0;
+	for (i = 0; i < alt_in_use; i++) {
+		if (alt_map[i].alt == alt) {
+			/* don't put in the same entry twice */
+			return 0;
+		}
+		if (strcmp(buffer, alt_map[i].db.root) == 0) {
+			break;
+		}
+	}
+	alt_map[alt_in_use].db.root = xstrdup(buffer);
+	alt_map[alt_in_use].alt = alt;
+	if (i < alt_in_use) {
+		alt_map[alt_in_use].refer = alt_map + i;
+	} else {
+		alt_map[alt_in_use].refer = NULL;
+	}
+	alt_in_use++;
+	return 0;
+}
+
+static int alt_initialized = 0;
+
+void mdsdb_init_alts(void){
+	if (alt_initialized) return;
+	foreach_alt_odb(setup_alt, NULL);
+	alt_initialized = 1;
+}
+
+
+mdsdb_t mdsdb_open(char *name) {
+	int i;
+	if (name == NULL) return NULL;
+	for (i = 0; i < alt_in_use; i++) {
+		if (strcmp(alt_map[i].db.root, name) == 0) {
+			if (alt_map[i].refer) {
+				i = (alt_map[i].refer - alt_map);
+			}
+			return (mdsdb_t)&(alt_map[i].db);
+		}
+	}
+	return no_dbf;
+}
+
+mdsdb_t mdsdb_open_alt(struct alternate_object_database *alt) {
+	int i;
+	for (i = 0; i < alt_in_use; i++) {
+		if (alt_map[i].alt == alt) {
+			return (mdsdb_t)&(alt_map[i].db);
+		}
+	}
+	return no_dbf;
+
+}
+/* copied from sha1_file.c */
+static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
+{
+	int i;
+	for (i = 0; i < 20; i++) {
+		static char hex[] = "0123456789abcdef";
+		unsigned int val = sha1[i];
+		char *pos = pathbuf + i*2 + (i > 0);
+		*pos++ = hex[val >> 4];
+		*pos = hex[val & 0xf];
+	}
+}
+
+/*
+ * Warning: returns a static buffer so be careful about threading.
+ */
+static char *crc32_file_name(const char *path, const unsigned char *sha1)
+{
+	static char buf[PATH_MAX];
+	const char *digestdir;
+	int len;
+
+	digestdir = path;
+	len = strlen(digestdir);
+
+	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
+	if (len + 43 > PATH_MAX)
+		die("insanely long object crc directory %s", digestdir);
+	memcpy(buf, digestdir, len);
+	buf[len] = '/';
+	buf[len+3] = '/';
+	buf[len+42] = '\0';
+	fill_sha1_path(buf + len + 1, sha1);
+	return buf;
+}
+
+static int mdsdb_lookup_aux(char *path, mdigest_t *digestp)
+{
+	if (!access(path, F_OK)) {
+		if (digestp) {
+			int fd = open(path, O_RDONLY);
+			int wcode, len;
+			unsigned char wsch;
+			unsigned char buffer[MAX_DIGEST_LENGTH];
+			if (fd < 0) {
+				return 0;
+			}
+			if (read_in_full(fd, &wsch, 1) != 1) {
+				close(fd);
+				return 0;
+			}
+			wcode = wsch;
+			len = get_mdigest_required_len(wcode);
+			if(read_in_full(fd, buffer, len)
+			   != len) {
+				close(fd);
+				return 0;
+			}
+			close(fd);
+			mdigest_load(digestp, wcode, buffer);
+		}
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+
+int mdsdb_lookup(mdsdb_t gdbf, const unsigned char *sha1, mdigest_t *digestp) {
+	char *path;
+
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) gdbf = dbf;
+
+	path = crc32_file_name(gdbf->root, sha1);
+	return mdsdb_lookup_aux(path, digestp);
+}
+
+int mdsdb_remove(mdsdb_t gdbf, const unsigned char *sha1) {
+	char *path;
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+
+	if (gdbf == NULL) {
+		gdbf = dbf;
+	} else {
+		return -1;
+	}
+	path = crc32_file_name(gdbf->root, sha1);
+	return unlink(path);
+}
+
+/* copied from sha1_file.c */
+/* Size of directory component, including the ending '/' */
+static inline int directory_size(const char *filename)
+{
+	const char *s = strrchr(filename, '/');
+	if (!s)
+		return 0;
+	return s - filename + 1;
+}
+
+
+/* copied from sha1_file.c */
+static int create_tmpfile(char *buffer, size_t bufsiz, const char *filename)
+{
+	int fd, dirlen = directory_size(filename);
+
+	if (dirlen + 20 > bufsiz) {
+		errno = ENAMETOOLONG;
+		return -1;
+	}
+	memcpy(buffer, filename, dirlen);
+	strcpy(buffer + dirlen, "tmp_obj_XXXXXX");
+	fd = git_mkstemp_mode(buffer, 0444);
+	if (fd < 0 && dirlen && errno == ENOENT) {
+		/* Make sure the directory exists */
+		memcpy(buffer, filename, dirlen);
+		buffer[dirlen-1] = 0;
+		if (mkdir(buffer, 0777) || adjust_shared_perm(buffer))
+			return -1;
+
+		/* Try again */
+		strcpy(buffer + dirlen - 1, "/tmp_obj_XXXXXX");
+		fd = git_mkstemp_mode(buffer, 0444);
+	}
+	return fd;
+}
+
+/* copied from sha1_file.c */
+static int write_buffer(int fd, const void *buf, size_t len)
+{
+	if (write_in_full(fd, buf, len) < 0)
+		return error("file write error (%s)", strerror(errno));
+	return 0;
+}
+
+/* copied from sha1_file.c */
+/* Finalize a file on disk, and close it. */
+static void close_sha1_file(int fd)
+{
+	if (fsync_object_files)
+		fsync_or_die(fd, "sha1 file");
+	if (close(fd) != 0)
+		die_errno("error when closing sha1 file");
+}
+
+
+int mdsdb_process(mdsdb_t gdbf, const unsigned char *sha1,
+		  mdigest_t *digestp)
+{
+	mdigest_t old_digest;
+	int has_old_digest = 0;
+	char *path;
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) gdbf = dbf;
+	path = crc32_file_name(gdbf->root, sha1);
+	has_old_digest = mdsdb_lookup_aux(path, &old_digest);
+	if (gdbf == dbf && !has_old_digest) {
+		mdigest_t crc;
+		int len, wcode;
+		static char ctmpfile[PATH_MAX];
+		int fdc = create_tmpfile(ctmpfile, sizeof(ctmpfile), path);
+		if (fdc < 0) {
+		  return -1;
+		}
+		crc = *(digestp);
+		len = get_mdigest_len(digestp);
+		wcode = get_mdigest_wcode(digestp);
+		crc.hdr.lhdr.wcode = (unsigned char)wcode;
+		if (fdc >= 0 && write_buffer(fdc, &crc.hdr.lhdr.wcode,
+					     len + 1) < 0) {
+			close_sha1_file(fdc);
+			return -1;
+		}
+		if (fdc >= 0) {
+			close_sha1_file(fdc);
+			return (move_temp_to_file(ctmpfile, path) == 0)?
+				0: -1;
+		}
+		return -1;
+	} else if (has_old_digest) {
+	  if (mdigest_tst(&old_digest, digestp)) {
+			die("SHA1 COLLISION WHEN INSERTING OBJECT %s",
+			    sha1_to_hex(sha1));
+			return -1;
+		}
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+
+void mdsdb_close(mdsdb_t gdbf) {
+	return;
+}
+
+void mdsdb_close_alt(struct alternate_object_database *alt) {
+	return;
+}
+
+
+
+int mdsdb_reorganize(mdsdb_t gdbf) {
+	if (!initialized || gdbf == no_dbf) {
+	  return -1;
+	}
+	if (gdbf == NULL) {
+		return 0;
+	} else {
+		return -1;
+	}
+}
+
+
+
+void mdsdb_finish(void) {
+	int i;
+	if (!initialized) {
+		return;
+	}
+	dbf->root = NULL;
+
+	for (i = 0; i < alt_in_use; i++) {
+		free(alt_map[i].db.root);
+		alt_map[i].db.root = NULL;
+	}
+	memset(alt_map, 0, sizeof(struct alt_map) *alt_in_use);
+	alt_in_use = 0;
+	initialized = 0;
+	alt_initialized = 0;
+}
diff --git a/packdb.h b/packdb.h
new file mode 100644
index 0000000..64c1d0a
--- /dev/null
+++ b/packdb.h
@@ -0,0 +1,93 @@
+#ifndef PACKDB_H
+#define PACKDB_H
+
+#include<stdint.h>
+#include "mdigest.h"
+
+/**
+ *  Initialize the database.
+ *  This opens a database file in the objects directory named mdsd,
+ *  used to store CRCS of objects (uncompressed, excluding the header)
+ *  for hash-collision detection.
+ */
+extern void packdb_init(void);
+
+/**
+ * Check if the database has been initialized.
+ * Returns:
+ *   1 if packdb_init has been called; false otherwise.
+ */
+extern int packdb_initialized(void);
+
+/**
+ * Open the persistent database to store a copy of obj CRCs in pack index files.
+ * Nested calls are allowed, but must be balanced by calls to packdb_close.
+ * For nested calls, subsequent ones merely increment a reference count.
+ *
+ * This is used to create space-efficient storage of object CRCs that
+ * are not associated with loose objects (e.g., because they are in pack
+ * files).  Intended for use when building pack files.
+ *
+ * Note:
+ *   Interacting with another process that calls this function on the
+ *   same repository may lead to deadlock unless packdb_close is
+ *   called before that interaction.
+ */
+extern void packdb_open(void);
+
+/**
+ * Store a crc in the persistent database for creating pack index files.
+ *
+ * Arguments:
+ *   sha1 - the key for the entry (a 20-byte sha1 hash)
+ *   crc - the crc to store (the crc of an object's data)
+ * Returns:
+ *   0 if we added a new entry, 1 if the entry already exists, -1 on error
+ */
+extern int packdb_process(const unsigned char *sha1, mdigest_t *digestp);
+
+/**
+ * Lookup a CRC from a database.
+ *
+ * Arguments:
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *    digestp - a pointer to a mdigest_t to store the returned value when
+ *              an entry in the database exists.
+ * Returns:
+ *   0 if no entry, 1 if there is an existing entry.
+ */
+extern int packdb_lookup(const unsigned char *sha1, mdigest_t *digestp);
+
+/**
+ * Remove a CRC from a database.
+ *
+ * Arguments:
+ *        dbf - the CRC database; NULL for the default database
+ *       sha1 - the key for the lookup (a 20-byte SHA1 digest)
+ *
+ * Returns:
+ *   0 on success; -1 if the entry did not exist or if an entry
+ *   could not be deleted
+ */
+extern int packdb_remove(const unsigned char *sha1);
+
+
+/**
+ * Reorganize the database.
+ * Returns:
+ *   0 on success; -1 on failure
+ */
+extern int packdb_reorganize(void);
+
+/**
+ * Close the database file.
+ */
+extern void packdb_close(void);
+
+/**
+ * Close the database if opened and uninitialize the module.
+ * This is intended to be called when the module is no longer needed.
+ */
+extern void packdb_finish(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related

* Re: Re* How to generate pull-request with info of signed tag
From: Junio C Hamano @ 2011-12-21  7:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Git Mailing List
In-Reply-To: <87liq6xwr8.fsf@linux.vnet.ibm.com>

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Also can we make .git/config remote stanza to have something like below
>
>
>      fetch = +refs/tags/*:refs/tags/abc/*
>
> so that one can do
>
>    git fetch t-remote tag-name
>
> and that get stored to abc/tag-name 

You can do whatever you want to your own config file without asking anybody.

Having said that, the point of the recent change to allow you to pull this
way (notice the lack of "tag")

    $ git pull $url $signed_tag_name

is so that you do not have to contaminate your own ref namespace with tags
that are used to leave audit trails in the history graph.

^ permalink raw reply

* Re: Re* How to generate pull-request with info of signed tag
From: Aneesh Kumar K.V @ 2011-12-21  6:30 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <7vk45smilc.fsf@alter.siamese.dyndns.org>

On Mon, 19 Dec 2011 12:06:55 -0800, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Come to think of it, the last patch I sent out on request pull was very
> > wrong....
> 
> And this should fix it.
> 
> -- >8 --
> Subject: [PATCH] request-pull: do not emit "tag" before the tagname
> 
> The whole point of the recent update to allow "git pull $url $tagname" is
> so that the integrator does not have to store the (signed) tag that is
> used to convey authenticity to be recorded in the resulting merge in the
> local repository's tag namespace.  Asking for a merge be made with "git
> pull $url tag $tagname" defeats it.
> 
> Note that the request can become ambiguous if the requestor has a branch
> with the same name as the tag, but that is not a new problem limited to
> pulling. I wouldn't mind if somebody wants to add disambiguation to the
> find_matching_ref logic in the script as a separate patch, though.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  git-request-pull.sh     |    4 +---
>  t/t5150-request-pull.sh |    2 +-
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/git-request-pull.sh b/git-request-pull.sh
> index 7b5c777..d7ba117 100755
> --- a/git-request-pull.sh
> +++ b/git-request-pull.sh
> @@ -63,10 +63,8 @@ die "fatal: No commits in common between $base and $head"
>  find_matching_ref='
>  	sub abbr {
>  		my $ref = shift;
> -		if ($ref =~ s|refs/heads/||) {
> +		if ($ref =~ s|refs/heads/|| || $ref =~ s|refs/tags/||) {
>  			return $ref;
> -		} elsif ($ref =~ s|refs/tags/||) {
> -			return "tag $ref";
>  		} else {
>  			return $ref;
>  		}
> diff --git a/t/t5150-request-pull.sh b/t/t5150-request-pull.sh
> index aec842f..da25bc2 100755
> --- a/t/t5150-request-pull.sh
> +++ b/t/t5150-request-pull.sh
> @@ -180,7 +180,7 @@ test_expect_success 'request names an appropriate branch' '
>  		read branch
>  	} <digest &&
>  	{
> -		test "$branch" = tag--full ||
> +		test "$branch" = full ||
>  		test "$branch" = master ||
>  		test "$branch" = for-upstream
>  	}

Also can we make .git/config remote stanza to have something like below


     fetch = +refs/tags/*:refs/tags/abc/*

so that one can do

   git fetch t-remote tag-name

and that get stored to abc/tag-name 

-aneesh

^ permalink raw reply

* Re: [PATCH] Specify a precision for the length of a subject string
From: Jeff King @ 2011-12-21  4:38 UTC (permalink / raw)
  To: Nathan W. Panike; +Cc: git
In-Reply-To: <20111220220754.GC21353@llunet.cs.wisc.edu>

On Tue, Dec 20, 2011 at 04:07:54PM -0600, Nathan W. Panike wrote:

> We can specify the precision of a subject string, so that length the subjects
> viewed by the user do not grow beyond a bound set by the user, in a pretty
> formatted string
> 
> This makes it possible to do, e.g., 
> 
> $ git log --pretty='%h %s' d165204 -1
> d165204 git-p4: fix skipSubmitEdit regression
> 
> With this patch, the user can do
> 
> $ git log --pretty='%h %30s' d165204 -1
> d165204 git-p4: fix skipSubmitEdit reg

Hmm. I think the idea of limiting is OK (though personally, I would just
pipe through a filter that truncates long lines). But I'm a bit negative
on adding a tweak like this that only affects the subject. Is there a
reason I couldn't do %30gs, or %30f, or even some other placeholder?

Also, we already have %w to handle wrapping. Could this be handled in a
similar way (perhaps it could even be considered a special form of
wrapping)?

-Peff

^ permalink raw reply

* Re: [PATCH] Use Python's "print" as a function, not as a keyword
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  2:48 UTC (permalink / raw)
  To: Sebastian Morr; +Cc: git, srabbelier
In-Reply-To: <20111221021930.GA31364@thinkpad>

On Wed, Dec 21, 2011 at 03:19, Sebastian Morr <sebastian@morr.cc> wrote:

> But, as nobody seems to have cared before: Is Git designed to be
> compatible only with versions prior 3.0?

I'm running Debian unstable and it has Python 2.7. Most people are
still using Python 2.x as their default system Python since 3.x breaks
backwards compatibility for common constructs like print.

Does this only break Python 2.6, or all 2.x versions of Python?

What's our currently supported Python version for the Python code in
Git? It's 5.8.0 for Perl, do we have any particular aim for a
supported Python version?

^ permalink raw reply

* Re: [PATCH] Makefile: Change the default compiler from "gcc" to "cc"
From: Linus Torvalds @ 2011-12-21  1:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð, git
In-Reply-To: <7vr4zyiyih.fsf@alter.siamese.dyndns.org>

On Tue, Dec 20, 2011 at 4:01 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Would this affect folks in BSD land negatively?

Probably not.

The people who might notice are the old=time crappy commercial unixes,
but they are all dead by now. The kinds where 'cc' isn't even ANSI C,
it's K&R and you needed to pay extra for the "real" compiler.

But if those people still exist, they probably haven't figured out CVS
yet, and still use RCS or SCCS. Their brains would explode messily if
they tried to use git.

                           Linus

^ permalink raw reply

* [PATCH] Use Python's "print" as a function, not as a keyword
From: Sebastian Morr @ 2011-12-21  2:19 UTC (permalink / raw)
  To: git; +Cc: srabbelier

This has changed from Version 2.6 to Version 3.0. Replace all occurrences of

    print ...

with

    print(...)

and all occurrences of

    print >> output, ...

with

    output.write(... + "\n")
---

I noticed that "make" on master spawned an "invalid syntax" error (it
was caused by the "print ..." in git_remote_helpers/Makefile) and looked
into how to fix that. I have Python 3.2.2 installed.

I'm afraid this way it breaks for Python 2.6. It seems to be possible to do

    from __future__ import print_function

which we could install in all Python scripts.

But, as nobody seems to have cared before: Is Git designed to be
compatible only with versions prior 3.0?

 contrib/ciabot/ciabot.py           |    6 +-
 contrib/fast-import/git-p4         |  140 ++++++++++++++++++------------------
 contrib/fast-import/import-zips.py |    2 +-
 contrib/gitview/gitview            |    2 +-
 contrib/hg-to-git/hg-to-git.py     |   46 ++++++------
 contrib/p4import/git-p4import.py   |    6 +-
 git-remote-testgit.py              |   26 ++++----
 git_remote_helpers/Makefile        |    2 +-
 git_remote_helpers/git/exporter.py |    6 +-
 git_remote_helpers/git/git.py      |    2 +-
 git_remote_helpers/util.py         |   12 ++--
 11 files changed, 125 insertions(+), 125 deletions(-)

diff --git a/contrib/ciabot/ciabot.py b/contrib/ciabot/ciabot.py
index 9775dff..10c5d22 100755
--- a/contrib/ciabot/ciabot.py
+++ b/contrib/ciabot/ciabot.py
@@ -173,7 +173,7 @@ if __name__ == "__main__":
     try:
         (options, arguments) = getopt.getopt(sys.argv[1:], "np:V")
     except getopt.GetoptError, msg:
-        print "ciabot.py: " + str(msg)
+        print("ciabot.py: " + str(msg))
         raise SystemExit, 1
 
     mailit = True
@@ -183,7 +183,7 @@ if __name__ == "__main__":
         elif switch == '-n':
             mailit = False
         elif switch == '-V':
-            print "ciabot.py: version 3.2"
+            print("ciabot.py: version 3.2")
             sys.exit(0)
 
     # Cough and die if user has not specified a project
@@ -214,7 +214,7 @@ if __name__ == "__main__":
         if mailit:
             server.sendmail(fromaddr, [toaddr], message)
         else:
-            print message
+            print(message)
 
     if mailit:
         server.quit()
diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 5949803..bb72f71 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -495,7 +495,7 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent
         update = False
         if not gitBranchExists(remoteHead):
             if verbose:
-                print "creating %s" % remoteHead
+                print("creating %s" % remoteHead)
             update = True
         else:
             settings = extractSettingsGitLog(extractLogMessageFromGitCommit(remoteHead))
@@ -610,9 +610,9 @@ class P4Debug(Command):
     def run(self, args):
         j = 0
         for output in p4CmdList(args):
-            print 'Element: %d' % j
+            print('Element: %d' % j)
             j += 1
-            print output
+            print(output)
         return True
 
 class P4RollBack(Command):
@@ -655,14 +655,14 @@ class P4RollBack(Command):
 
                 if len(p4Cmd("changes -m 1 "  + ' '.join (['%s...@%s' % (p, maxChange)
                                                            for p in depotPaths]))) == 0:
-                    print "Branch %s did not exist at change %s, deleting." % (ref, maxChange)
+                    print("Branch %s did not exist at change %s, deleting." % (ref, maxChange))
                     system("git update-ref -d %s `git rev-parse %s`" % (ref, ref))
                     continue
 
                 while change and int(change) > maxChange:
                     changed = True
                     if self.verbose:
-                        print "%s is at %s ; rewinding towards %s" % (ref, change, maxChange)
+                        print("%s is at %s ; rewinding towards %s" % (ref, change, maxChange))
                     system("git update-ref %s \"%s^\"" % (ref, ref))
                     log = extractLogMessageFromGitCommit(ref)
                     settings =  extractSettingsGitLog(log)
@@ -672,7 +672,7 @@ class P4RollBack(Command):
                     change = settings['change']
 
                 if changed:
-                    print "%s rewound to %s" % (ref, change)
+                    print("%s rewound to %s" % (ref, change))
 
         return True
 
@@ -746,7 +746,7 @@ class P4Submit(Command, P4UserMap):
             if not user:
                 msg = "Cannot find p4 user for email %s in commit %s." % (email, id)
                 if gitConfig('git-p4.allowMissingP4Users').lower() == "true":
-                    print "%s" % msg
+                    print("%s" % msg)
                 else:
                     die("Error: %s\nSet git-p4.allowMissingP4Users to true to allow this." % msg)
 
@@ -884,7 +884,7 @@ class P4Submit(Command, P4UserMap):
                 return False
 
     def applyCommit(self, id):
-        print "Applying %s" % (read_pipe("git log --max-count=1 --pretty=oneline %s" % id))
+        print("Applying %s" % (read_pipe("git log --max-count=1 --pretty=oneline %s" % id)))
 
         (p4User, gitEmail) = self.p4UserForCommit(id)
 
@@ -961,14 +961,14 @@ class P4Submit(Command, P4UserMap):
         applyPatchCmd = patchcmd + "--check --apply -"
 
         if os.system(tryPatchCmd) != 0:
-            print "Unfortunately applying the change failed!"
-            print "What do you want to do?"
+            print("Unfortunately applying the change failed!")
+            print("What do you want to do?")
             response = "x"
             while response != "s" and response != "a" and response != "w":
                 response = raw_input("[s]kip this patch / [a]pply the patch forcibly "
                                      "and with .rej files / [w]rite the patch to a file (patch.txt) ")
             if response == "s":
-                print "Skipping! Good luck with the next patches..."
+                print("Skipping! Good luck with the next patches...")
                 for f in editedFiles:
                     p4_revert(f)
                 for f in filesToAdd:
@@ -977,16 +977,16 @@ class P4Submit(Command, P4UserMap):
             elif response == "a":
                 os.system(applyPatchCmd)
                 if len(filesToAdd) > 0:
-                    print "You may also want to call p4 add on the following files:"
-                    print " ".join(filesToAdd)
+                    print("You may also want to call p4 add on the following files:")
+                    print(" ".join(filesToAdd))
                 if len(filesToDelete):
-                    print "The following files should be scheduled for deletion with p4 delete:"
-                    print " ".join(filesToDelete)
+                    print("The following files should be scheduled for deletion with p4 delete:")
+                    print(" ".join(filesToDelete))
                 die("Please resolve and submit the conflict manually and "
                     + "continue afterwards with git-p4 submit --continue")
             elif response == "w":
                 system(diffcmd + " > patch.txt")
-                print "Patch saved to patch.txt in %s !" % self.clientPath
+                print("Patch saved to patch.txt in %s !" % self.clientPath)
                 die("Please resolve and submit the conflict manually and "
                     "continue afterwards with git-p4 submit --continue")
 
@@ -1065,7 +1065,7 @@ class P4Submit(Command, P4UserMap):
                         self.modifyChangelistUser(changelist, p4User)
             else:
                 # skip this patch
-                print "Submission cancelled, undoing p4 changes."
+                print("Submission cancelled, undoing p4 changes.")
                 for f in editedFiles:
                     p4_revert(f)
                 for f in filesToAdd:
@@ -1106,19 +1106,19 @@ class P4Submit(Command, P4UserMap):
                 die("Cannot preserve user names without p4 super-user or admin permissions")
 
         if self.verbose:
-            print "Origin branch is " + self.origin
+            print("Origin branch is " + self.origin)
 
         if len(self.depotPath) == 0:
-            print "Internal error: cannot locate perforce depot path from existing branches"
+            print("Internal error: cannot locate perforce depot path from existing branches")
             sys.exit(128)
 
         self.clientPath = p4Where(self.depotPath)
 
         if len(self.clientPath) == 0:
-            print "Error: Cannot locate perforce checkout of %s in client view" % self.depotPath
+            print("Error: Cannot locate perforce checkout of %s in client view" % self.depotPath)
             sys.exit(128)
 
-        print "Perforce checkout for depot path %s located at %s" % (self.depotPath, self.clientPath)
+        print("Perforce checkout for depot path %s located at %s" % (self.depotPath, self.clientPath))
         self.oldWorkingDirectory = os.getcwd()
 
         # ensure the clientPath exists
@@ -1126,7 +1126,7 @@ class P4Submit(Command, P4UserMap):
             os.makedirs(self.clientPath)
 
         chdir(self.clientPath)
-        print "Synchronizing p4 checkout..."
+        print("Synchronizing p4 checkout...")
         p4_sync("...")
         self.check()
 
@@ -1151,7 +1151,7 @@ class P4Submit(Command, P4UserMap):
                 break
 
         if len(commits) == 0:
-            print "All changes applied!"
+            print("All changes applied!")
             chdir(self.oldWorkingDirectory)
 
             sync = P4Sync()
@@ -1354,7 +1354,7 @@ class P4Sync(Command, P4UserMap):
             # Ideally, someday, this script can learn how to generate
             # appledouble files directly and import those to git, but
             # non-mac machines can never find a use for apple filetype.
-            print "\nIgnoring apple filetype file %s" % file['depotFile']
+            print("\nIgnoring apple filetype file %s" % file['depotFile'])
             return
 
         # Perhaps windows wants unicode, utf16 newlines translated too;
@@ -1466,7 +1466,7 @@ class P4Sync(Command, P4UserMap):
         self.branchPrefixes = branchPrefixes
 
         if self.verbose:
-            print "commit into %s" % branch
+            print("commit into %s" % branch)
 
         # start with reading files; if that fails, we should not
         # create a commit.
@@ -1500,7 +1500,7 @@ class P4Sync(Command, P4UserMap):
 
         if len(parent) > 0:
             if self.verbose:
-                print "parent %s" % parent
+                print("parent %s" % parent)
             self.gitStream.write("from %s\n" % parent)
 
         self.streamP4Files(new_files)
@@ -1513,7 +1513,7 @@ class P4Sync(Command, P4UserMap):
             labelDetails = label[0]
             labelRevisions = label[1]
             if self.verbose:
-                print "Change %s is labelled %s" % (change, labelDetails)
+                print("Change %s is labelled %s" % (change, labelDetails))
 
             files = p4CmdList(["files"] + ["%s...@%s" % (p, change)
                                                     for p in branchPrefixes])
@@ -1556,14 +1556,14 @@ class P4Sync(Command, P4UserMap):
 
         l = p4CmdList("labels %s..." % ' '.join (self.depotPaths))
         if len(l) > 0 and not self.silent:
-            print "Finding files belonging to labels in %s" % `self.depotPaths`
+            print("Finding files belonging to labels in %s" % `self.depotPaths`)
 
         for output in l:
             label = output["label"]
             revisions = {}
             newestChange = 0
             if self.verbose:
-                print "Querying files for label %s" % label
+                print("Querying files for label %s" % label)
             for file in p4CmdList(["files"] +
                                       ["%s...@%s" % (p, label)
                                           for p in self.depotPaths]):
@@ -1575,7 +1575,7 @@ class P4Sync(Command, P4UserMap):
             self.labels[newestChange] = [output, revisions]
 
         if self.verbose:
-            print "Label changes: %s" % self.labels.keys()
+            print("Label changes: %s" % self.labels.keys())
 
     def guessProjectName(self):
         for p in self.depotPaths:
@@ -1613,8 +1613,8 @@ class P4Sync(Command, P4UserMap):
 
                     if destination in self.knownBranches:
                         if not self.silent:
-                            print "p4 branch %s defines a mapping from %s to %s" % (info["branch"], source, destination)
-                            print "but there exists another mapping from %s to %s already!" % (self.knownBranches[destination], destination)
+                            print("p4 branch %s defines a mapping from %s to %s" % (info["branch"], source, destination))
+                            print("but there exists another mapping from %s to %s already!" % (self.knownBranches[destination], destination))
                         continue
 
                     self.knownBranches[destination] = source
@@ -1685,28 +1685,28 @@ class P4Sync(Command, P4UserMap):
 
     def gitCommitByP4Change(self, ref, change):
         if self.verbose:
-            print "looking in ref " + ref + " for change %s using bisect..." % change
+            print("looking in ref " + ref + " for change %s using bisect..." % change)
 
         earliestCommit = ""
         latestCommit = parseRevision(ref)
 
         while True:
             if self.verbose:
-                print "trying: earliest %s latest %s" % (earliestCommit, latestCommit)
+                print("trying: earliest %s latest %s" % (earliestCommit, latestCommit))
             next = read_pipe("git rev-list --bisect %s %s" % (latestCommit, earliestCommit)).strip()
             if len(next) == 0:
                 if self.verbose:
-                    print "argh"
+                    print("argh")
                 return ""
             log = extractLogMessageFromGitCommit(next)
             settings = extractSettingsGitLog(log)
             currentChange = int(settings['change'])
             if self.verbose:
-                print "current change %s" % currentChange
+                print("current change %s" % currentChange)
 
             if currentChange == change:
                 if self.verbose:
-                    print "found %s" % next
+                    print("found %s" % next)
                 return next
 
             if currentChange < change:
@@ -1767,7 +1767,7 @@ class P4Sync(Command, P4UserMap):
                         filesForCommit = branches[branch]
 
                         if self.verbose:
-                            print "branch is %s" % branch
+                            print("branch is %s" % branch)
 
                         self.updatedBranches.add(branch)
 
@@ -1788,13 +1788,13 @@ class P4Sync(Command, P4UserMap):
                                         print("\n    Resuming with change %s" % change);
 
                                 if self.verbose:
-                                    print "parent determined through known branches: %s" % parent
+                                    print("parent determined through known branches: %s" % parent)
 
                         branch = self.gitRefForBranch(branch)
                         parent = self.gitRefForBranch(parent)
 
                         if self.verbose:
-                            print "looking for initial parent for %s; current parent is %s" % (branch, parent)
+                            print("looking for initial parent for %s; current parent is %s" % (branch, parent))
 
                         if len(parent) == 0 and branch in self.initialParents:
                             parent = self.initialParents[branch]
@@ -1807,11 +1807,11 @@ class P4Sync(Command, P4UserMap):
                                 self.initialParent)
                     self.initialParent = ""
             except IOError:
-                print self.gitError.read()
+                print(self.gitError.read())
                 sys.exit(1)
 
     def importHeadRevision(self, revision):
-        print "Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch)
+        print("Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch))
 
         details = {}
         details["user"] = "git perforce import user"
@@ -1869,8 +1869,8 @@ class P4Sync(Command, P4UserMap):
         try:
             self.commit(details, self.extractFilesFromCommit(details), self.branch, self.depotPaths)
         except IOError:
-            print "IO error with git fast-import. Is your git version recent enough?"
-            print self.gitError.read()
+            print("IO error with git fast-import. Is your git version recent enough?")
+            print(self.gitError.read())
 
 
     def getClientSpec(self):
@@ -1883,7 +1883,7 @@ class P4Sync(Command, P4UserMap):
                     # p4 has these %%1 to %%9 arguments in specs to
                     # reorder paths; which we can't handle (yet :)
                     if re.match('%%\d', v) != None:
-                        print "Sorry, can't handle %%n arguments in client specs"
+                        print("Sorry, can't handle %%n arguments in client specs")
                         sys.exit(1)
 
                     if v.startswith('"'):
@@ -1901,7 +1901,7 @@ class P4Sync(Command, P4UserMap):
                     # ... wildcard, then we're going to mess up the
                     # output directory, so fail gracefully.
                     if not cv.endswith('...'):
-                        print 'Sorry, client view in "%s" needs to end with wildcard' % (k)
+                        print('Sorry, client view in "%s" needs to end with wildcard' % (k))
                         sys.exit(1)
                     cv=cv[:-3]
 
@@ -1939,7 +1939,7 @@ class P4Sync(Command, P4UserMap):
 
         if self.syncWithOrigin and self.hasOrigin:
             if not self.silent:
-                print "Syncing with origin first by calling git fetch origin"
+                print("Syncing with origin first by calling git fetch origin")
             system("git fetch origin")
 
         if len(self.branch) == 0:
@@ -1963,11 +1963,11 @@ class P4Sync(Command, P4UserMap):
 
             if len(self.p4BranchesInGit) > 1:
                 if not self.silent:
-                    print "Importing from/into multiple branches"
+                    print("Importing from/into multiple branches")
                 self.detectBranches = True
 
             if self.verbose:
-                print "branches: %s" % self.p4BranchesInGit
+                print("branches: %s" % self.p4BranchesInGit)
 
             p4Change = 0
             for branch in self.p4BranchesInGit:
@@ -2004,14 +2004,14 @@ class P4Sync(Command, P4UserMap):
                 if not self.detectBranches:
                     self.initialParent = parseRevision(self.branch)
                 if not self.silent and not self.detectBranches:
-                    print "Performing incremental import into %s git branch" % self.branch
+                    print("Performing incremental import into %s git branch" % self.branch)
 
         if not self.branch.startswith("refs/"):
             self.branch = "refs/heads/" + self.branch
 
         if len(args) == 0 and self.depotPaths:
             if not self.silent:
-                print "Depot paths: %s" % ' '.join(self.depotPaths)
+                print("Depot paths: %s" % ' '.join(self.depotPaths))
         else:
             if self.depotPaths and self.depotPaths != args:
                 print ("previous import used depot path %s and now %s was specified. "
@@ -2065,8 +2065,8 @@ class P4Sync(Command, P4UserMap):
             else:
                 self.getBranchMapping()
             if self.verbose:
-                print "p4-git branches: %s" % self.p4BranchesInGit
-                print "initial parents: %s" % self.initialParents
+                print("p4-git branches: %s" % self.p4BranchesInGit)
+                print("initial parents: %s" % self.initialParents)
             for b in self.p4BranchesInGit:
                 if b != "master":
 
@@ -2104,8 +2104,8 @@ class P4Sync(Command, P4UserMap):
                 if len(args) == 0 and not self.p4BranchesInGit:
                     die("No remote p4 branches.  Perhaps you never did \"git p4 clone\" in here.");
                 if self.verbose:
-                    print "Getting p4 changes for %s...%s" % (', '.join(self.depotPaths),
-                                                              self.changeRange)
+                    print("Getting p4 changes for %s...%s" % (', '.join(self.depotPaths),
+                                                              self.changeRange))
                 changes = p4ChangesForPaths(self.depotPaths, self.changeRange)
 
                 if len(self.maxChanges) > 0:
@@ -2113,18 +2113,18 @@ class P4Sync(Command, P4UserMap):
 
             if len(changes) == 0:
                 if not self.silent:
-                    print "No changes to import!"
+                    print("No changes to import!")
                 return True
 
             if not self.silent and not self.detectBranches:
-                print "Import destination: %s" % self.branch
+                print("Import destination: %s" % self.branch)
 
             self.updatedBranches = set()
 
             self.importChanges(changes)
 
             if not self.silent:
-                print ""
+                print("")
                 if len(self.updatedBranches) > 0:
                     sys.stdout.write("Updated branches: ")
                     for b in self.updatedBranches:
@@ -2166,7 +2166,7 @@ class P4Rebase(Command):
         # the branchpoint may be p4/foo~3, so strip off the parent
         upstream = re.sub("~[0-9]+$", "", upstream)
 
-        print "Rebasing the current branch onto %s" % upstream
+        print("Rebasing the current branch onto %s" % upstream)
         oldHead = read_pipe("git rev-parse HEAD").strip()
         system("git rebase %s" % upstream)
         system("git diff-tree --stat --summary -M %s HEAD" % oldHead)
@@ -2228,7 +2228,7 @@ class P4Clone(P4Sync):
         if not self.cloneDestination:
             self.cloneDestination = self.defaultDestination(args)
 
-        print "Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination)
+        print("Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination))
 
         if not os.path.exists(self.cloneDestination):
             os.makedirs(self.cloneDestination)
@@ -2251,7 +2251,7 @@ class P4Clone(P4Sync):
                 if not self.cloneBare:
                     system("git checkout -f")
             else:
-                print "Could not detect main branch. No checkout/master branch created."
+                print("Could not detect main branch. No checkout/master branch created.")
 
         return True
 
@@ -2280,7 +2280,7 @@ class P4Branches(Command):
             log = extractLogMessageFromGitCommit("refs/remotes/%s" % branch)
             settings = extractSettingsGitLog(log)
 
-            print "%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"])
+            print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
         return True
 
 class HelpFormatter(optparse.IndentedHelpFormatter):
@@ -2294,12 +2294,12 @@ class HelpFormatter(optparse.IndentedHelpFormatter):
             return ""
 
 def printUsage(commands):
-    print "usage: %s <command> [options]" % sys.argv[0]
-    print ""
-    print "valid commands: %s" % ", ".join(commands)
-    print ""
-    print "Try %s <command> --help for command specific help." % sys.argv[0]
-    print ""
+    print("usage: %s <command> [options]" % sys.argv[0])
+    print("")
+    print("valid commands: %s" % ", ".join(commands))
+    print("")
+    print("Try %s <command> --help for command specific help." % sys.argv[0])
+    print("")
 
 commands = {
     "debug" : P4Debug,
@@ -2324,8 +2324,8 @@ def main():
         klass = commands[cmdName]
         cmd = klass()
     except KeyError:
-        print "unknown command %s" % cmdName
-        print ""
+        print("unknown command %s" % cmdName)
+        print("")
         printUsage(commands.keys())
         sys.exit(2)
 
diff --git a/contrib/fast-import/import-zips.py b/contrib/fast-import/import-zips.py
index 82f5ed3..596963b 100755
--- a/contrib/fast-import/import-zips.py
+++ b/contrib/fast-import/import-zips.py
@@ -14,7 +14,7 @@ from time import mktime
 from zipfile import ZipFile
 
 if len(argv) < 2:
-	print 'Usage:', argv[0], '<zipfile>...'
+	print('Usage:', argv[0], '<zipfile>...')
 	exit(1)
 
 branch_ref = 'refs/heads/import-zips'
diff --git a/contrib/gitview/gitview b/contrib/gitview/gitview
index 4c99dfb..c3e74f3 100755
--- a/contrib/gitview/gitview
+++ b/contrib/gitview/gitview
@@ -37,7 +37,7 @@ except ImportError:
         import gtksourceview
         have_gtksourceview = True
     except ImportError:
-        print "Running without gtksourceview2 or gtksourceview module"
+        print("Running without gtksourceview2 or gtksourceview module")
 
 re_ident = re.compile('(author|committer) (?P<ident>.*) (?P<epoch>\d+) (?P<tz>[+-]\d{4})')
 
diff --git a/contrib/hg-to-git/hg-to-git.py b/contrib/hg-to-git/hg-to-git.py
index 046cb2b..3c55a6d 100755
--- a/contrib/hg-to-git/hg-to-git.py
+++ b/contrib/hg-to-git/hg-to-git.py
@@ -38,7 +38,7 @@ hgnewcsets = 0
 
 def usage():
 
-        print """\
+        print("""\
 %s: [OPTIONS] <hgprj>
 
 options:
@@ -50,7 +50,7 @@ options:
 
 required:
     hgprj:  name of the HG project to import (directory)
-""" % sys.argv[0]
+""" % sys.argv[0])
 
 #------------------------------------------------------------------------------
 
@@ -100,22 +100,22 @@ os.chdir(hgprj)
 if state:
     if os.path.exists(state):
         if verbose:
-            print 'State does exist, reading'
+            print('State does exist, reading')
         f = open(state, 'r')
         hgvers = pickle.load(f)
     else:
-        print 'State does not exist, first run'
+        print('State does not exist, first run')
 
 sock = os.popen('hg tip --template "{rev}"')
 tip = sock.read()
 if sock.close():
     sys.exit(1)
 if verbose:
-    print 'tip is', tip
+    print('tip is', tip)
 
 # Calculate the branches
 if verbose:
-    print 'analysing the branches...'
+    print('analysing the branches...')
 hgchildren["0"] = ()
 hgparents["0"] = (None, None)
 hgbranch["0"] = "master"
@@ -151,7 +151,7 @@ for cset in range(1, int(tip) + 1):
             hgbranch[str(cset)] = "branch-" + str(cset)
 
 if not hgvers.has_key("0"):
-    print 'creating repository'
+    print('creating repository')
     os.system('git init')
 
 # loop through every hg changeset
@@ -176,27 +176,27 @@ for cset in range(int(tip) + 1):
     os.write(fdcomment, csetcomment)
     os.close(fdcomment)
 
-    print '-----------------------------------------'
-    print 'cset:', cset
-    print 'branch:', hgbranch[str(cset)]
-    print 'user:', user
-    print 'date:', date
-    print 'comment:', csetcomment
+    print('-----------------------------------------')
+    print('cset:', cset)
+    print('branch:', hgbranch[str(cset)])
+    print('user:', user)
+    print('date:', date)
+    print('comment:', csetcomment)
     if parent:
-	print 'parent:', parent
+	print('parent:', parent)
     if mparent:
-        print 'mparent:', mparent
+        print('mparent:', mparent)
     if tag:
-        print 'tag:', tag
-    print '-----------------------------------------'
+        print('tag:', tag)
+    print('-----------------------------------------')
 
     # checkout the parent if necessary
     if cset != 0:
         if hgbranch[str(cset)] == "branch-" + str(cset):
-            print 'creating new branch', hgbranch[str(cset)]
+            print('creating new branch', hgbranch[str(cset)])
             os.system('git checkout -b %s %s' % (hgbranch[str(cset)], hgvers[parent]))
         else:
-            print 'checking out branch', hgbranch[str(cset)]
+            print('checking out branch', hgbranch[str(cset)])
             os.system('git checkout %s' % hgbranch[str(cset)])
 
     # merge
@@ -205,7 +205,7 @@ for cset in range(int(tip) + 1):
             otherbranch = hgbranch[mparent]
         else:
             otherbranch = hgbranch[parent]
-        print 'merging', otherbranch, 'into', hgbranch[str(cset)]
+        print('merging', otherbranch, 'into', hgbranch[str(cset)])
         os.system(getgitenv(user, date) + 'git merge --no-commit -s ours "" %s %s' % (hgbranch[str(cset)], otherbranch))
 
     # remove everything except .git and .hg directories
@@ -229,12 +229,12 @@ for cset in range(int(tip) + 1):
 
     # delete branch if not used anymore...
     if mparent and len(hgchildren[str(cset)]):
-        print "Deleting unused branch:", otherbranch
+        print("Deleting unused branch:", otherbranch)
         os.system('git branch -d %s' % otherbranch)
 
     # retrieve and record the version
     vvv = os.popen('git show --quiet --pretty=format:%H').read()
-    print 'record', cset, '->', vvv
+    print('record', cset, '->', vvv)
     hgvers[str(cset)] = vvv
 
 if hgnewcsets >= opt_nrepack and opt_nrepack != -1:
@@ -243,7 +243,7 @@ if hgnewcsets >= opt_nrepack and opt_nrepack != -1:
 # write the state for incrementals
 if state:
     if verbose:
-        print 'Writing state'
+        print('Writing state')
     f = open(state, 'w')
     pickle.dump(hgvers, f)
 
diff --git a/contrib/p4import/git-p4import.py b/contrib/p4import/git-p4import.py
index b6e534b..144fafc 100644
--- a/contrib/p4import/git-p4import.py
+++ b/contrib/p4import/git-p4import.py
@@ -26,11 +26,11 @@ if s != default_int_handler:
 def die(msg, *args):
     for a in args:
         msg = "%s %s" % (msg, a)
-    print "git-p4import fatal error:", msg
+    print("git-p4import fatal error:", msg)
     sys.exit(1)
 
 def usage():
-    print "USAGE: git-p4import [-q|-v]  [--authors=<file>]  [-t <timezone>]  [//p4repo/path <branch>]"
+    print("USAGE: git-p4import [-q|-v]  [--authors=<file>]  [-t <timezone>]  [//p4repo/path <branch>]")
     sys.exit(1)
 
 verbosity = 1
@@ -48,7 +48,7 @@ def report(level, msg, *args):
     fd.writelines(msg)
     fd.close()
     if level <= verbosity:
-        print msg
+        print(msg)
 
 class p4_command:
     def __init__(self, _repopath):
diff --git a/git-remote-testgit.py b/git-remote-testgit.py
index 3dc4851..9803214 100644
--- a/git-remote-testgit.py
+++ b/git-remote-testgit.py
@@ -81,9 +81,9 @@ def do_capabilities(repo, args):
     """Prints the supported capabilities.
     """
 
-    print "import"
-    print "export"
-    print "refspec refs/heads/*:%s*" % repo.prefix
+    print("import")
+    print("export")
+    print("refspec refs/heads/*:%s*" % repo.prefix)
 
     dirname = repo.get_base_path(repo.gitdir)
 
@@ -92,11 +92,11 @@ def do_capabilities(repo, args):
 
     path = os.path.join(dirname, 'testgit.marks')
 
-    print "*export-marks %s" % path
+    print("*export-marks %s" % path)
     if os.path.exists(path):
-        print "*import-marks %s" % path
+        print("*import-marks %s" % path)
 
-    print # end capabilities
+    print() # end capabilities
 
 
 def do_list(repo, args):
@@ -109,16 +109,16 @@ def do_list(repo, args):
 
     for ref in repo.revs:
         debug("? refs/heads/%s", ref)
-        print "? refs/heads/%s" % ref
+        print("? refs/heads/%s" % ref)
 
     if repo.head:
         debug("@refs/heads/%s HEAD" % repo.head)
-        print "@refs/heads/%s HEAD" % repo.head
+        print("@refs/heads/%s HEAD" % repo.head)
     else:
         debug("@refs/heads/master HEAD")
-        print "@refs/heads/master HEAD"
+        print("@refs/heads/master HEAD")
 
-    print # end list
+    print() # end list
 
 
 def update_local_repo(repo):
@@ -161,7 +161,7 @@ def do_import(repo, args):
     repo = update_local_repo(repo)
     repo.exporter.export_repo(repo.gitdir, refs)
 
-    print "done"
+    print("done")
 
 
 def do_export(repo, args):
@@ -178,8 +178,8 @@ def do_export(repo, args):
         repo.non_local.push(repo.gitdir)
 
     for ref in changed:
-        print "ok %s" % ref
-    print
+        print("ok %s" % ref)
+    print()
 
 
 COMMANDS = {
diff --git a/git_remote_helpers/Makefile b/git_remote_helpers/Makefile
index 74b05dc..f65f064 100644
--- a/git_remote_helpers/Makefile
+++ b/git_remote_helpers/Makefile
@@ -23,7 +23,7 @@ endif
 
 PYLIBDIR=$(shell $(PYTHON_PATH) -c \
 	 "import sys; \
-	 print 'lib/python%i.%i/site-packages' % sys.version_info[:2]")
+	 print('lib/python%i.%i/site-packages' % sys.version_info[:2])")
 
 all: $(pysetupfile)
 	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) build
diff --git a/git_remote_helpers/git/exporter.py b/git_remote_helpers/git/exporter.py
index 9ee5f96..e6ad51e 100644
--- a/git_remote_helpers/git/exporter.py
+++ b/git_remote_helpers/git/exporter.py
@@ -38,10 +38,10 @@ class GitExporter(object):
         if not os.path.exists(dirname):
             os.makedirs(dirname)
 
-        print "feature relative-marks"
+        print("feature relative-marks")
         if os.path.exists(os.path.join(dirname, 'git.marks')):
-            print "feature import-marks=%s/git.marks" % self.repo.hash
-        print "feature export-marks=%s/git.marks" % self.repo.hash
+            print("feature import-marks=%s/git.marks" % self.repo.hash)
+        print("feature export-marks=%s/git.marks" % self.repo.hash)
         sys.stdout.flush()
 
         args = ["git", "--git-dir=" + self.repo.gitpath, "fast-export", "--export-marks=" + path]
diff --git a/git_remote_helpers/git/git.py b/git_remote_helpers/git/git.py
index 007a1bf..43f6c53 100644
--- a/git_remote_helpers/git/git.py
+++ b/git_remote_helpers/git/git.py
@@ -111,7 +111,7 @@ class GitObjectFetcher(object):
         """
         if self.queue and self.in_transit is None:
             self.in_transit = self.queue.pop(0)
-            print >> self.proc.stdin, self.in_transit[0]
+            self.proc.stdin.write(self.in_transit[0] + "\n")
 
     def push (self, obj, callback):
         """Push the given object name onto the queue.
diff --git a/git_remote_helpers/util.py b/git_remote_helpers/util.py
index fbbb01b..f6bd42e 100644
--- a/git_remote_helpers/util.py
+++ b/git_remote_helpers/util.py
@@ -32,20 +32,20 @@ DEBUG = False
 
 def notify(msg, *args):
     """Print a message to stderr."""
-    print >> sys.stderr, msg % args
+    sys.stderr.write(msg % args + "\n")
 
 def debug (msg, *args):
     """Print a debug message to stderr when DEBUG is enabled."""
     if DEBUG:
-        print >> sys.stderr, msg % args
+        sys.stderr.write(msg % args + "\n")
 
 def error (msg, *args):
     """Print an error message to stderr."""
-    print >> sys.stderr, "ERROR:", msg % args
+    sys.stderr.write("ERROR: " + msg % args + "\n")
 
 def warn(msg, *args):
     """Print a warning message to stderr."""
-    print >> sys.stderr, "warning:", msg % args
+    sys.stderr.write("warning: " + msg % args + "\n")
 
 def die (msg, *args):
     """Print as error message to stderr and exit the program."""
@@ -87,10 +87,10 @@ class ProgressIndicator(object):
         if msg is None:
             msg = self.States[self.n % len(self.States)]
         msg = self.prefix + msg
-        print >> self.f, "\r%-*s" % (self.prev_len, msg),
+        self.f.write("\r%-*s" % (self.prev_len, msg) + "\n")
         self.prev_len = len(msg.expandtabs())
         if lf:
-            print >> self.f
+            self.f.write("\n")
             self.prev_len = 0
         self.n += 1
 
-- 
1.7.8.382.g9f1d9.dirty

^ permalink raw reply related

* [PATCH 4/4] Suppress "statement not reached" warnings under Sun Studio
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  1:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren, Jason Evans, David Barr,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <1324430302-22441-1-git-send-email-avarab@gmail.com>

Sun Studio 12 Update 1's brain will melt on these two occurances of
using "goto" to jump into a loop. It'll emit these warnings:

    "read-cache.c", line 761: warning: statement not reached (E_STATEMENT_NOT_REACHED)
    "xdiff/xutils.c", line 194: warning: statement not reached (E_STATEMENT_NOT_REACHED)

Suppress these warnings by using a Sun Studio specific pragma
directive to turn them off, but only do so if __sun is defined, which
is the macro Sun Studio uses to identify itself under both its C and
C++ variants, see http://developers.sun.com/sunstudio/products/faqs/cpp.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 read-cache.c   |    6 ++++++
 xdiff/xutils.c |    6 ++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index a51bba1..0a4e895 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -758,7 +758,13 @@ int verify_path(const char *path)
 		return 0;
 
 	goto inside;
+#ifdef __sun
+#	pragma error_messages (off, E_STATEMENT_NOT_REACHED)
+#endif
 	for (;;) {
+#ifdef __sun
+#	pragma error_messages (on, E_STATEMENT_NOT_REACHED)
+#endif
 		if (!c)
 			return 1;
 		if (is_dir_sep(c)) {
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 0de084e..62c3567 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -191,7 +191,13 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 	 */
 	if (flags & XDF_IGNORE_WHITESPACE) {
 		goto skip_ws;
+#ifdef __sun
+#	pragma error_messages (off, E_STATEMENT_NOT_REACHED)
+#endif
 		while (i1 < s1 && i2 < s2) {
+#ifdef __sun
+#	pragma error_messages (on, E_STATEMENT_NOT_REACHED)
+#endif
 			if (l1[i1++] != l2[i2++])
 				return 0;
 		skip_ws:
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 3/4] Appease Sun Studio by renaming "tmpfile"
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  1:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren, Jason Evans, David Barr,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <1324430302-22441-1-git-send-email-avarab@gmail.com>

On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:

    "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
    "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
    "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
    "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)

Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c |    6 +++---
 fast-import.c        |    8 ++++----
 pack-write.c         |    6 +++---
 sha1_file.c          |   12 ++++++------
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 98025da..af7dc37 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -172,10 +172,10 @@ static const char *open_pack_file(const char *pack_name)
 	if (from_stdin) {
 		input_fd = 0;
 		if (!pack_name) {
-			static char tmpfile[PATH_MAX];
-			output_fd = odb_mkstemp(tmpfile, sizeof(tmpfile),
+			static char tmp_file[PATH_MAX];
+			output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
 						"pack/tmp_pack_XXXXXX");
-			pack_name = xstrdup(tmpfile);
+			pack_name = xstrdup(tmp_file);
 		} else
 			output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600);
 		if (output_fd < 0)
diff --git a/fast-import.c b/fast-import.c
index 4b9c4b7..6cd19e5 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -855,15 +855,15 @@ static struct tree_content *dup_tree_content(struct tree_content *s)
 
 static void start_packfile(void)
 {
-	static char tmpfile[PATH_MAX];
+	static char tmp_file[PATH_MAX];
 	struct packed_git *p;
 	struct pack_header hdr;
 	int pack_fd;
 
-	pack_fd = odb_mkstemp(tmpfile, sizeof(tmpfile),
+	pack_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
 			      "pack/tmp_pack_XXXXXX");
-	p = xcalloc(1, sizeof(*p) + strlen(tmpfile) + 2);
-	strcpy(p->pack_name, tmpfile);
+	p = xcalloc(1, sizeof(*p) + strlen(tmp_file) + 2);
+	strcpy(p->pack_name, tmp_file);
 	p->pack_fd = pack_fd;
 	p->do_not_close = 1;
 	pack_file = sha1fd(pack_fd, p->pack_name);
diff --git a/pack-write.c b/pack-write.c
index de2bd01..ca9e63b 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -73,9 +73,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 		f = sha1fd_check(index_name);
 	} else {
 		if (!index_name) {
-			static char tmpfile[PATH_MAX];
-			fd = odb_mkstemp(tmpfile, sizeof(tmpfile), "pack/tmp_idx_XXXXXX");
-			index_name = xstrdup(tmpfile);
+			static char tmp_file[PATH_MAX];
+			fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_idx_XXXXXX");
+			index_name = xstrdup(tmp_file);
 		} else {
 			unlink(index_name);
 			fd = open(index_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
diff --git a/sha1_file.c b/sha1_file.c
index f291f3f..88f2151 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2452,15 +2452,15 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	git_SHA_CTX c;
 	unsigned char parano_sha1[20];
 	char *filename;
-	static char tmpfile[PATH_MAX];
+	static char tmp_file[PATH_MAX];
 
 	filename = sha1_file_name(sha1);
-	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
+	fd = create_tmpfile(tmp_file, sizeof(tmp_file), filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error("insufficient permission for adding an object to repository database %s\n", get_object_directory());
 		else
-			return error("unable to create temporary sha1 filename %s: %s\n", tmpfile, strerror(errno));
+			return error("unable to create temporary sha1 filename %s: %s\n", tmp_file, strerror(errno));
 	}
 
 	/* Set it up */
@@ -2505,12 +2505,12 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 		struct utimbuf utb;
 		utb.actime = mtime;
 		utb.modtime = mtime;
-		if (utime(tmpfile, &utb) < 0)
+		if (utime(tmp_file, &utb) < 0)
 			warning("failed utime() on %s: %s",
-				tmpfile, strerror(errno));
+				tmp_file, strerror(errno));
 	}
 
-	return move_temp_to_file(tmpfile, filename);
+	return move_temp_to_file(tmp_file, filename);
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 2/4] Fix a bitwise negation assignment issue spotted by Sun Studio
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  1:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren, Jason Evans, David Barr,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <1324430302-22441-1-git-send-email-avarab@gmail.com>

Change direct and indirect assignments of the bitwise negation of 0 to
uint32_t variables to have a "U" suffix. I.e. ~0U instead of ~0. This
eliminates warnings under Sun Studio 12 Update 1:

    "vcs-svn/string_pool.c", line 11: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/string_pool.c", line 81: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/repo_tree.c", line 112: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/repo_tree.c", line 112: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "test-treap.c", line 34: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)

The semantics are still the same as demonstrated by this program:

    $ cat test.c && make test && ./test
    #include <stdio.h>
    #include <stdint.h>

    int main(void)
    {
        uint32_t foo = ~0;
        uint32_t bar = ~0U;

        printf("foo = <%u> bar = <%u>\n", foo, bar);

        return 0;
    }
    cc     test.c   -o test
    "test.c", line 5: warning: initializer will be sign-extended: -1
    foo = <4294967295> bar = <4294967295>

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 test-treap.c          |    2 +-
 vcs-svn/repo_tree.c   |    2 +-
 vcs-svn/string_pool.c |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/test-treap.c b/test-treap.c
index ab8c951..294d7ee 100644
--- a/test-treap.c
+++ b/test-treap.c
@@ -31,7 +31,7 @@ static void strtonode(struct int_node *item, const char *s)
 int main(int argc, char *argv[])
 {
 	struct strbuf sb = STRBUF_INIT;
-	struct trp_root root = { ~0 };
+	struct trp_root root = { ~0U };
 	uint32_t item;
 
 	if (argc != 1)
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index a21d89d..c3f198d 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -109,7 +109,7 @@ static struct repo_dirent *repo_read_dirent(uint32_t revision,
 static void repo_write_dirent(const uint32_t *path, uint32_t mode,
 			      uint32_t content_offset, uint32_t del)
 {
-	uint32_t name, revision, dir_o = ~0, parent_dir_o = ~0;
+	uint32_t name, revision, dir_o = ~0U, parent_dir_o = ~0U;
 	struct repo_dir *dir;
 	struct repo_dirent *key;
 	struct repo_dirent *dent = NULL;
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
index 8af8d54..1b63b19 100644
--- a/vcs-svn/string_pool.c
+++ b/vcs-svn/string_pool.c
@@ -8,7 +8,7 @@
 #include "obj_pool.h"
 #include "string_pool.h"
 
-static struct trp_root tree = { ~0 };
+static struct trp_root tree = { ~0U };
 
 struct node {
 	uint32_t offset;
@@ -78,7 +78,7 @@ void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream)
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str)
 {
 	char *context = NULL;
-	uint32_t token = ~0;
+	uint32_t token = ~0U;
 	uint32_t length;
 
 	if (sz == 0)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 1/4] Fix an enum assignment issue spotted by Sun Studio
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  1:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren, Jason Evans, David Barr,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <1324430302-22441-1-git-send-email-avarab@gmail.com>

In builtin/fast-export.c we'd assign to variables of the
tag_of_filtered_mode enum type with constants defined for the
signed_tag_mode enum.

We'd get the intended value since both the value we were assigning
with and the one we actually wanted had the same positional within
their respective enums, but doing it this way makes no sense.

This issue was spotted by Sun Studio 12 Update 1:

    "builtin/fast-export.c", line 54: warning: enum type mismatch: op "=" (E_ENUM_TYPE_MISMATCH_OP)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9836e6b..08fed98 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -25,7 +25,7 @@ static const char *fast_export_usage[] = {
 
 static int progress;
 static enum { ABORT, VERBATIM, WARN, STRIP } signed_tag_mode = ABORT;
-static enum { ERROR, DROP, REWRITE } tag_of_filtered_mode = ABORT;
+static enum { ERROR, DROP, REWRITE } tag_of_filtered_mode = ERROR;
 static int fake_missing_tagger;
 static int use_done_feature;
 static int no_data;
@@ -51,7 +51,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt,
 					  const char *arg, int unset)
 {
 	if (unset || !strcmp(arg, "abort"))
-		tag_of_filtered_mode = ABORT;
+		tag_of_filtered_mode = ERROR;
 	else if (!strcmp(arg, "drop"))
 		tag_of_filtered_mode = DROP;
 	else if (!strcmp(arg, "rewrite"))
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 0/4] Eliminate warnings under Sun Studio
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  1:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Elijah Newren, Jason Evans, David Barr,
	Ævar Arnfjörð Bjarmason

This patch series eliminates warnings under Sun Studio. The first two
patches address actual (but obviously minor) issues, the third is a
nit, and the fourth disables a warning Sun Studio gets wrong.

I'm not sure whether we want the verbose code needed in the forth to
disable warnings under specific compilers, but since it's a rare
enough case and saves people compiling the code from wondering about
it's it's probably warranted. It's a verbose way to get rid of it
though.

I've CC'd people involved in the code touched by the first two, but
the second two are generic enough that I've decided not to bother the
original authors.

Ævar Arnfjörð Bjarmason (4):
  Fix an enum assignment issue spotted by Sun Studio
  Fix a bitwise negation assignment issue spotted by Sun Studio
  Appease Sun Studio by renaming "tmpfile"
  Suppress "statement not reached" warnings under Sun Studio

 builtin/fast-export.c |    4 ++--
 builtin/index-pack.c  |    6 +++---
 fast-import.c         |    8 ++++----
 pack-write.c          |    6 +++---
 read-cache.c          |    6 ++++++
 sha1_file.c           |   12 ++++++------
 test-treap.c          |    2 +-
 vcs-svn/repo_tree.c   |    2 +-
 vcs-svn/string_pool.c |    4 ++--
 xdiff/xutils.c        |    6 ++++++
 10 files changed, 34 insertions(+), 22 deletions(-)

-- 
1.7.7.3

^ permalink raw reply

* Re: [PATCH] Makefile: Change the default compiler from "gcc" to "cc"
From: Ævar Arnfjörð Bjarmason @ 2011-12-21  1:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds
In-Reply-To: <7vr4zyiyih.fsf@alter.siamese.dyndns.org>

On Wed, Dec 21, 2011 at 01:01, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> However unlike Linux Git is written in ANSI C and supports a multitude
>> of compilers, including Clang, Sun Studio, xlc etc. On my Linux box
>> "cc" is a symlink to clang, and on a Solaris box I have access to "cc"
>> is Sun Studio's CC.
>>
>> Both of these are perfectly capable of compiling Git, and it's
>> annoying to have to specify CC=cc on the command-line when compiling
>> Git when that's the default behavior of most other portable programs.
>
> Would this affect folks in BSD land negatively?

^ permalink raw reply

* Re: [PATCH] builtin/init-db.c: eliminate -Wformat warning on Solaris
From: Junio C Hamano @ 2011-12-21  0:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy
In-Reply-To: <1324423661-31174-1-git-send-email-avarab@gmail.com>

I will queue this directly on 'maint', as I do not think it is worth
fixing immediately on top of 2c050e0 (i18n: mark init-db messages for
translation, 2011-04-10) and merging all the way down to the now-ancient
v1.7.5.X series.

Thanks.

^ permalink raw reply

* Re: [PATCH] Makefile: Change the default compiler from "gcc" to "cc"
From: Junio C Hamano @ 2011-12-21  0:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Linus Torvalds
In-Reply-To: <1324424447-7164-1-git-send-email-avarab@gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> However unlike Linux Git is written in ANSI C and supports a multitude
> of compilers, including Clang, Sun Studio, xlc etc. On my Linux box
> "cc" is a symlink to clang, and on a Solaris box I have access to "cc"
> is Sun Studio's CC.
>
> Both of these are perfectly capable of compiling Git, and it's
> annoying to have to specify CC=cc on the command-line when compiling
> Git when that's the default behavior of most other portable programs.

Would this affect folks in BSD land negatively?

^ permalink raw reply

* [PATCH] Makefile: Change the default compiler from "gcc" to "cc"
From: Ævar Arnfjörð Bjarmason @ 2011-12-20 23:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð Bjarmason

Ever since the very first commit to git.git we've been setting CC to
"gcc". Presumably this is behavior that Linus copied from the Linux
Makefile.

However unlike Linux Git is written in ANSI C and supports a multitude
of compilers, including Clang, Sun Studio, xlc etc. On my Linux box
"cc" is a symlink to clang, and on a Solaris box I have access to "cc"
is Sun Studio's CC.

Both of these are perfectly capable of compiling Git, and it's
annoying to have to specify CC=cc on the command-line when compiling
Git when that's the default behavior of most other portable programs.

So change the default to "cc". Users who want to compile with GCC can
still add "CC=gcc" to the make(1) command-line, but those users who
don't have GCC as their "cc" will see expected behavior, and as a
bonus we'll be more likely to smoke out new compilation warnings from
our distributors since they'll me using a more varied set of compilers
by default.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 9470a10..958c6e6 100644
--- a/Makefile
+++ b/Makefile
@@ -336,7 +336,7 @@ pathsep = :
 
 export prefix bindir sharedir sysconfdir gitwebdir localedir
 
-CC = gcc
+CC = cc
 AR = ar
 RM = rm -f
 DIFF = diff
-- 
1.7.7.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox