Git development

Git development
 help / color / mirror / Atom feed

* Re: RFC: New diff-delta.c implementation
From: Geert Bosch @ 2006-04-24 15:19 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano
In-Reply-To: <Pine.LNX.4.64.0604232327500.3603@localhost.localdomain>

On Mon, Apr 24, 2006 at 01:27:07AM -0400, Nicolas Pitre wrote:
> I took the liberty of reworking the coding style, as well as simplifying 
> some constructs a bit.
> 
> I also fixed all the format encoding and size computations bugs the 
> original code has so it works perfectly now.
Thanks for fixing that! I'll have a more detailed look at your
changes later, as not all changes are immediately obvious since the
formatting is different.
> Note that the GIT encoding is itself different (a bit denser) from the 
> xdiff encoding.
OK, I mostly looked a patch-delta.c to reverse engineer the format,
and mainly looked at libxdiff in the hope to find some format description
there.
> Please just completely drop that word fetch "optimization".  Not only 
> does it produce worse assembly code in the end due to the extra loop 
> counter and extra shifting in turn increasing register pressure, but it 
> also assumes that the target data buffer is always word aligned which is 
> not guaranteed (casting a char pointer to an unsigned and dereferencing 
> it is not portable since on some architecture such misaligned accesses 
> are not allowed).  And it even has worse performance.  For example, with 
> word fetch and shift code, packing the Linux kernel archive on my P4 at 
> 3GHz:

> user    6m33.797s
[vs.]
> user    6m13.911s
> So you must be extremely careful when trying to implement such kind of 
> optimizations.

As a compiler writer, I am more than aware of this. 
Your change results in a 70% slowdown on PowerPC. On many targets
byte-accesses are relatively slow. Indeed, I had expect this to be
mostly neutral on x86, and the 5% slowdown you saw is not unexpected.
Note that in this case aligned accesses are guaranteed, because the
index_step is guaranteed to be a multiple of the word size.

> But here comes the sad part.  Even after simplifying the code as much as 
> I could, performance is still significantly worse than the current 
> diff-delta.c code.  Repacking again the same Linux kernel repository 
> with the current code:
That's unexpected, but I can see how this could be if most files have
very few differences and are relatively small. For such cases, almost
any hash will do, and the more complicated hashing will be more compute
intensive.


I have benchmarked my original diff code on a set of large files with
lots of changes. These are hardest to get right, and hardest to get
good performance with. Just try diffing any two large (uncompressed)
tar files, and you'll see. On many of such large files, the new code
is orders of magnitude faster. On these cases, the resulting deltas
are also much smaller.

The comparison is a bit between a O(n^2) sort that is fast on small
or mostly sorted inputs (but horrible on large ones) and a more
complex O(nlogn) algorithm that is a bit slower for the simple
cases, but far faster for more complex cases.

> In the mean time I'll try replacing the adler32 hashing with your Rabin 
> polynomial hashing into the current code just to see if it helps (and I 
> think it will help quite a bit).  The best solution might be a mix of 
> both approaches.
I think the most interesting part of my code is in process_copies.
This optimization results in elimination of about half the commands
in large test cases, while taking virtually no time at all.
Also, I believe that any form of hash chaining of secondary hashing
is useless. I have done many benchmarks, and in all cases a slight
increase of hash table size was both more effective for pack size and
faster due to less cache misses.
> Of course there is the issue of reversing the object matching window 
> logic in pack-object.c to further improve things, but I consider that an 
> orthogonal improvement at this point which both approaches will benefit 
> from anyway.

Right, this will be the most important one for performance for now.

Thanks again for taking the time to look at my code and fixing the
errors in handling the git file format.

  -Geert

PS. Somehow your code had "double line spacing" :)

^ permalink raw reply

* Push over HTTP now works with refs/heads/foo instead of foo
From: Bertrand Jacquin @ 2006-04-24  9:36 UTC (permalink / raw)
  To: pasky; +Cc: Git Mailing List

Hi,

git-http-push now accept :
git-http-push http://doo.com/git/bar.git/ master:refs/heads/pr/my_passwd

instead of :
git-http-push http://doo.com/git/bar.git/ master:pr/my_passwd

And old syntax doesn't work anymore (unable to create remote branches).

Here is a patch :

diff -Nur cogito-0.17.2/cg-push cogito-0.17.2-mine/cg-push
--- cogito-0.17.2/cg-push	2006-04-08 03:02:49.000000000 +0200
+++ cogito-0.17.2-mine/cg-push	2006-04-24 11:33:34.748845600 +0200
@@ -69,8 +69,7 @@
 sprembranch=":refs/heads/$rembranch"

 if [ "${uri#http://}" != "$uri" -o "${uri#https://}" != "$uri" ]; then
-	# git-http-push doesn't like $sprembranch
-	git-http-push "$uri/" "$locbranch:$rembranch" "${tags[@]}"
+	git-http-push "$uri/" "$locbranch$sprembranch" "${tags[@]}"
 elif [ "${uri#git+ssh://}" != "$uri" ]; then
  send_pack_update "$name" "$(echo "$uri" | sed
's#^git+ssh://\([^/]*\)\(/.*\)$#\1:\2#')" "$locbranch$sprembranch"
"${tags[@]}"
 elif [ "${uri#rsync://}" != "$uri" ]; then


--
Beber
#e.fr@freenode

^ permalink raw reply

* Re: weird pull behavior as of late
From: Sergey Vlasov @ 2006-04-24  9:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: git
In-Reply-To: <20060423.175953.52710961.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On Sun, 23 Apr 2006 17:59:53 -0700 (PDT) David S. Miller wrote:

> Fast forward
>  MAINTAINERS |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> I got 446 objects and this amounted to just a 4 line change to the
> MAINTAINERS file? :-)

I got the same problem recently and tracked it down to a stale diff.o
object file inside libgit.a - apparently "ar rcs" does not recreate the
archive from scratch.  After "make clean" the problem has vanished.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Feeding some xmms2 gitweb enhancements back?
From: Junio C Hamano @ 2006-04-24  9:01 UTC (permalink / raw)
  To: Sham Chukoury; +Cc: git, Kay Sievers

I was browsing 

	http://git.xmms.se/?p=gitweb-xmms2.git;a=shortlog

and noticed there are some nice enhancements and clean-ups that
the official gitweb distribution could probably use.  I wonder
if it is a possibility for you to feed the following to Kay for
upstream inclusion after cleaning up?

 * Navbar refactoring
 * Move CSS out of gitweb.cgi
 * Make CSS readable

   I found these quite nice, from the pain last time I tried to
   touch gitweb X-<.  I suspect it wouldn't be a straight
   cherry-pick, though...

 * Add support for code highlighting in blob view
 * Highlight more file types

   Perhaps.

 * Add category support via $GIT_DIR/category for project
 * Show project tree git url on summary page
 * Add committags support

   Perhaps.  This depends on the site organization and the
   tracking system used.

 * Snapshot links support

   Running this on kernel.org might be a bit heavyweight and
   sysadmins may not like it, but I like this.

^ permalink raw reply

* Re: git-clone --reference problem?
From: Junio C Hamano @ 2006-04-24  8:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, David Woodhouse
In-Reply-To: <Pine.LNX.4.64.0604231122490.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Maybe something like this could work.
>
> Untested, of course. It probably isn't even syntactically correct shell. 
> Whatever. You get the idea.
>
> (That while-loop could probably be simplified to just a 
>
>   cat ""$reference/objects/info/alternates" >> "$GIT_DIR/objects/info/alternates"
>
> because all alternates _should_ already be absolute paths, but hey, 
> whatever.

I was hoping that the solution would be to do the recursive
alternate resolution on the runtime side (IOW, prepare_alt_odb()
in sha1_file.c).  Then the repository you borrow from _could_
change its own alternates after you set up your repository to
borrow from it.

Currently, if the repository you borrowed from suddenly starts
borrowing from somewhere else and pruned itself by running
"repack -a -d -l", the borrower would be in trouble, even with
your patch.

^ permalink raw reply

* 50% speed-up of "git-apply && git-write-tree" sequence.
From: Junio C Hamano @ 2006-04-24  5:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7v8xpvd69s.fsf_-_@assigned-by-dhcp.cox.net>

I've fixed-up the series and they are now sitting near the tip
of "pu" branch tonight.  With some more testing I am hoping to
merge it in "next" soon.

I should recap the overall idea for general public here.

When applying a series of patches to your working tree and
making commits out of the results, what happens is roughly this:

 1. Your index file exactly matches your HEAD commit; you may
    have local modifications to some paths in your working tree
    as long as they do not interfere with the patch application.

 2. "git-apply --index" reads the index file, reads the patch,
    applies them to both index and the working tree, and writes
    the updated index file out.

 3. "git-write-tree" reads the index file, writes the index
    entries out as a set of hierarchical tree objects, the top
    of which is wrapped in a new commit object that has the
    current HEAD as the parent, and the commit becomes the new
    HEAD.

You repeat 1-3 as many times as you apply patches.

When dealing with a huge tree, with a typical patch touching
only a handful paths, this is not very efficient.  A recent
kernel source tree has about 1200 directories, each of which
needs to be made into the "tree" object format in-core from the
data in the index file, and we hash it to see if we already have
that tree, and if not we write it out after compression (if we
do have it, we do not do anything after checking).

The updated code does this:

 (1) When git-write-tree writes out trees from the index, we
     store <directory, tree SHA1> pair for all the trees we
     compute, in .git/index.aux file.

 (2) When git-write-tree starts up, it reads the index file, and
     also reads the .git/index.aux file.  The latter file
     contains a fingerprint of the former, so if the index was
     updated without updating the index.aux, what is read is
     discarded.  However, if .git/index.aux is up-to-date, what
     it records can be reused when we scan the index.

 (3) There are a few commands that updates the index file.
     git-update-index is the most well-known one, but read-tree
     and apply also update it.

     No command other than apply updates index.aux when writing
     out index (yet).  However, git-apply has been told about
     it.  It reads index and index.aux file, and when it updates
     an index entry, it _invalidates_ the directory information
     it read from index.aux file, and writes the updated
     index.aux file out when it is done.

     Note.  Earlier I sent out a fix ([PATCH 5/4]) that
     contained a two-line removal from apply.c; the code was
     doing more than just invalidating entries, but actually
     recomputing the tree.  The fixed code does not hash tree
     information when git-apply writes index out.

 (4) When git-write-tree starts up after git-apply finishes its
     job, it notices index.aux file matches the fingerprint of
     the index (so it is up-to-date) and some entries have been
     _invalidated_.  When it writes out the index, the parts of
     the directory hierarchy that have not been invalidated do
     not have to be formatted in-core into "tree" -- we already
     know what tree object they are.  Only the parts that were
     invalidated need to be computed.

For this, a new data structure "cache_tree" is introduced.  The
way to use the API is very simple.  You initialize it like this:

	struct cache_tree *active_cache_tree;
	unsigned char sha1[20];
	read_cache_1(sha1);
        active_cache_tree = read_cache_tree(sha1);

The new function read_cache_1() is similar to the traditional
read_cache(), but it gives back the index file fingerprint.
This is given to read_cache_tree() to make sure index.aux is not
stale.  Then, when you touch a path in the cache, you invalidate
the paths in the cache_tree with:

	cache_tree_invalidate_path(active_cache_tree, path);

For example, if you are updating an index entry
"fs/ext3/inode.c", it invalidates the cached tree SHA1
information for top-level tree, "fs" tree and "fs/ext3" tree.
All other information you read from index.aux are still valid
(IOW, three out of 1200 are invalidated).

When you are done, you write out the index file and the updated
index.aux file like this:

        if (write_cache_1(newfd, active_cache, active_nr, active_cache_sha1) ||
            commit_index_file(&cache_file))
                die("Unable to write new cachefile");
        write_cache_tree(active_cache_sha1, active_cache_tree);

Again, write_cache_1() has one extra parameter to receive the
updated index file fingerprint, and you call write_cache_tree()
with it.  It updates index.aux file (with invalidated entries).

One thing write-tree does different is this (note that it does
not update the index file, it just reads from it):

	if (cache_tree_update(active_cache_tree, active_cache, active_nr,
			      missing_ok))
		die("git-write-tree: error building trees");
	write_cache_tree(active_cache_sha1, active_cache_tree);

The call to cache_tree_update() takes the (possibly partially
invalidated) cache_tree, fills in the invalidated parts from the
index file (this involves creation of new "tree" objects as
needed).  So the above call to write_cache_tree() writes a
complete index.aux file out without any invalidated entries.

Obviously, the change similar to what I did to apply.c in this
series can be done to other index-updating commands.  That's
"low hanging fruit" for the week ;-).

^ permalink raw reply

* Re: RFC: New diff-delta.c implementation
From: Nicolas Pitre @ 2006-04-24  5:27 UTC (permalink / raw)
  To: Geert Bosch; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano
In-Reply-To: <20060424025741.GA636@adacore.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3400 bytes --]

On Sun, 23 Apr 2006, Geert Bosch wrote:

> Note that I sent this code as a RFC, with explicit disclaimers about
> style.

I took the liberty of reworking the coding style, as well as simplifying 
some constructs a bit.

I also fixed all the format encoding and size computations bugs the 
original code has so it works perfectly now.

> So, I did not want to sign off on this code, since I pretty
> much knew there would be some problems with the undocumented
> ("proprietary", according the libxdiff site) file format. In contrast
> the GDIFF fileformat was documented very well, and I have a version
> of this code that works flawlessly with that format.

Note that the GIT encoding is itself different (a bit denser) from the 
xdiff encoding.

> > Regarding your FIXME comment about endianess: I think you are looking
> > for htonl().  Use it to convert the values from host byte order to
> > network byte order (= big endian) and you can get rid of those ugly
> > branches.
> Ah, I'll use that. It's of course a slight change that all processing
> now is big-endian centric, but that might actually even result in
> better code in this case. I'm just assuming any decent system has
> some highly optimized macro for this and will never do a function call.
> This is used in the most performance critical loops, and doing function
> calls here will lead to horrendous performance.

Please just completely drop that word fetch "optimization".  Not only 
does it produce worse assembly code in the end due to the extra loop 
counter and extra shifting in turn increasing register pressure, but it 
also assumes that the target data buffer is always word aligned which is 
not guaranteed (casting a char pointer to an unsigned and dereferencing 
it is not portable since on some architecture such misaligned accesses 
are not allowed).  And it even has worse performance.  For example, with 
word fetch and shift code, packing the Linux kernel archive on my P4 at 
3GHz:

$ git-repack -a -f
...
real    7m5.727s
user    6m33.797s
sys     0m31.394s

And with the word fetch optimization completely ripped out:

$ git-repack -a -f
...
real    6m45.443s
user    6m13.911s
sys     0m31.434s

So you must be extremely careful when trying to implement such kind of 
optimizations.

Now disabling all assert() statements gives:

$ git-repack -a -f
...
real    6m42.478s
user    6m12.031s
sys     0m31.090s

But here comes the sad part.  Even after simplifying the code as much as 
I could, performance is still significantly worse than the current 
diff-delta.c code.  Repacking again the same Linux kernel repository 
with the current code:

$ git-repack -a -f
...
real    4m20.742s
user    3m52.255s
sys     0m28.758s

The final pack is smaller with your code but not significantly: 
117867049 bytes vs 118824550 bytes with the current code, i.e. less than 
1% difference.

You can find attached my current version of your code.

In the mean time I'll try replacing the adler32 hashing with your Rabin 
polynomial hashing into the current code just to see if it helps (and I 
think it will help quite a bit).  The best solution might be a mix of 
both approaches.

Of course there is the issue of reversing the object matching window 
logic in pack-object.c to further improve things, but I consider that an 
orthogonal improvement at this point which both approaches will benefit 
from anyway.


Nicolas

[-- Attachment #2: Type: TEXT/PLAIN, Size: 40628 bytes --]

#include <unistd.h>

#include <stdlib.h>

#include <assert.h>

#include <string.h>

#include <sys/types.h>



#undef assert

#define assert(x) do { } while (0)



/*

 * MIN_HTAB_SIZE is fixed amount to be added to the size of the hash table

 * used for indexing and must be a power of two. This allows for small files

 * to have a sparse hash table, since in that case it's cheap.

 * Hash table sizes are rounded up to a power of two to avoid integer division. 

 */

#define MIN_HTAB_SIZE 8192

#define MAX_HTAB_SIZE (1024*1024*1024)



/*

 * Diffing files of gigabyte range is impractical with the current

 * algorithm, so we're assuming 32-bit sizes everywhere.

 * Size leaves some room for expansion when diffing random files.

 */

#define MAX_SIZE (0x7eff0000)



/* Initial size of copies table, dynamically extended as needed. */

#define MAX_COPIES 4096



/*

 * Matching is done using a sliding window for which a Rabin

 * polynomial is computed. The advantage of such polynomials is

 * that they can efficiently be updated at every position.

 * The tables needed for this are precomputed, as it is desirable

 * to use the same polynomial all the time for repeatable results.

 */



#if 0	/* those don't seem to work */



#define RABIN_POLY 0x25bd5331c0d7096dULL

#define RABIN_DEGREE 61

#define RABIN_SHIFT 53

#define RABIN_WINDOW_SIZE 22



static const u_int64_t T[256] = {

	0x6ec7f55241791bb7ULL, 0x96f54cc7035c25b4ULL, 0xb3481ff6c38b2cd9ULL,

	0xdd8feaa482f2376eULL, 0xf832b99542253e03ULL, 0x0857cabfc66f4205ULL,

	0x2dea998e06b84b68ULL, 0x432d6cdc47c150dfULL, 0x66903fed871659b2ULL,

	0x9ea28678c53367b1ULL, 0xbb1fd54905e46edcULL, 0xd5d8201b449d756bULL,

	0xf065732a844a7c06ULL, 0x10af957f8cde840aULL, 0x3512c64e4c098d67ULL,

	0x5bd5331c0d7096d0ULL, 0x7e68602dcda79fbdULL, 0x865ad9b88f82a1beULL,

	0xa3e78a894f55a8d3ULL, 0xcd207fdb0e2cb364ULL, 0xe89d2ceacefbba09ULL,

	0x18f85fc04ab1c60fULL, 0x3d450cf18a66cf62ULL, 0x5382f9a3cb1fd4d5ULL,

	0x763faa920bc8ddb8ULL, 0x8e0d130749ede3bbULL, 0xabb04036893aead6ULL,

	0xc577b564c843f161ULL, 0xe0cae6550894f80cULL, 0x04e279ced96a0179ULL,

	0x215f2aff19bd0814ULL, 0x4f98dfad58c413a3ULL, 0x6a258c9c98131aceULL,

	0x92173509da3624cdULL, 0xb7aa66381ae12da0ULL, 0xd96d936a5b983617ULL,

	0xfcd0c05b9b4f3f7aULL, 0x0cb5b3711f05437cULL, 0x2908e040dfd24a11ULL,

	0x47cf15129eab51a6ULL, 0x627246235e7c58cbULL, 0x9a40ffb61c5966c8ULL,

	0xbffdac87dc8e6fa5ULL, 0xd13a59d59df77412ULL, 0xf4870ae45d207d7fULL,

	0x144decb155b48573ULL, 0x31f0bf8095638c1eULL, 0x5f374ad2d41a97a9ULL,

	0x7a8a19e314cd9ec4ULL, 0x82b8a07656e8a0c7ULL, 0xa705f347963fa9aaULL,

	0xc9c20615d746b21dULL, 0xec7f55241791bb70ULL, 0x1c1a260e93dbc776ULL,

	0x39a7753f530cce1bULL, 0x5760806d1275d5acULL, 0x72ddd35cd2a2dcc1ULL,

	0x8aef6ac99087e2c2ULL, 0xaf5239f85050ebafULL, 0xc195ccaa1129f018ULL,

	0xe4289f9bd1fef975ULL, 0x09c4f39db2d402f2ULL, 0x2c79a0ac72030b9fULL,

	0x42be55fe337a1028ULL, 0x670306cff3ad1945ULL, 0x9f31bf5ab1882746ULL,

	0xba8cec6b715f2e2bULL, 0xd44b19393026359cULL, 0xf1f64a08f0f13cf1ULL,

	0x0193392274bb40f7ULL, 0x242e6a13b46c499aULL, 0x4ae99f41f515522dULL,

	0x6f54cc7035c25b40ULL, 0x976675e577e76543ULL, 0xb2db26d4b7306c2eULL,

	0xdc1cd386f6497799ULL, 0xf9a180b7369e7ef4ULL, 0x196b66e23e0a86f8ULL,

	0x3cd635d3fedd8f95ULL, 0x5211c081bfa49422ULL, 0x77ac93b07f739d4fULL,

	0x8f9e2a253d56a34cULL, 0xaa237914fd81aa21ULL, 0xc4e48c46bcf8b196ULL,

	0xe159df777c2fb8fbULL, 0x113cac5df865c4fdULL, 0x3481ff6c38b2cd90ULL,

	0x5a460a3e79cbd627ULL, 0x7ffb590fb91cdf4aULL, 0x87c9e09afb39e149ULL,

	0xa274b3ab3beee824ULL, 0xccb346f97a97f393ULL, 0xe90e15c8ba40fafeULL,

	0x0d268a536bbe038bULL, 0x289bd962ab690ae6ULL, 0x465c2c30ea101151ULL,

	0x63e17f012ac7183cULL, 0x9bd3c69468e2263fULL, 0xbe6e95a5a8352f52ULL,

	0xd0a960f7e94c34e5ULL, 0xf51433c6299b3d88ULL, 0x057140ecadd1418eULL,

	0x20cc13dd6d0648e3ULL, 0x4e0be68f2c7f5354ULL, 0x6bb6b5beeca85a39ULL,

	0x93840c2bae8d643aULL, 0xb6395f1a6e5a6d57ULL, 0xd8feaa482f2376e0ULL,

	0xfd43f979eff47f8dULL, 0x1d891f2ce7608781ULL, 0x38344c1d27b78eecULL,

	0x56f3b94f66ce955bULL, 0x734eea7ea6199c36ULL, 0x8b7c53ebe43ca235ULL,

	0xaec100da24ebab58ULL, 0xc006f5886592b0efULL, 0xe5bba6b9a545b982ULL,

	0x15ded593210fc584ULL, 0x306386a2e1d8cce9ULL, 0x5ea473f0a0a1d75eULL,

	0x7b1920c16076de33ULL, 0x832b99542253e030ULL, 0xa696ca65e284e95dULL,

	0xc8513f37a3fdf2eaULL, 0xedec6c06632afb87ULL, 0x1389e73b65a805e4ULL,

	0x3634b40aa57f0c89ULL, 0x58f34158e406173eULL, 0x7d4e126924d11e53ULL,

	0x857cabfc66f42050ULL, 0xa0c1f8cda623293dULL, 0xce060d9fe75a328aULL,

	0xebbb5eae278d3be7ULL, 0x1bde2d84a3c747e1ULL, 0x3e637eb563104e8cULL,

	0x50a48be72269553bULL, 0x7519d8d6e2be5c56ULL, 0x8d2b6143a09b6255ULL,

	0xa8963272604c6b38ULL, 0xc651c7202135708fULL, 0xe3ec9411e1e279e2ULL,

	0x03267244e97681eeULL, 0x269b217529a18883ULL, 0x485cd42768d89334ULL,

	0x6de18716a80f9a59ULL, 0x95d33e83ea2aa45aULL, 0xb06e6db22afdad37ULL,

	0xdea998e06b84b680ULL, 0xfb14cbd1ab53bfedULL, 0x0b71b8fb2f19c3ebULL,

	0x2eccebcaefceca86ULL, 0x400b1e98aeb7d131ULL, 0x65b64da96e60d85cULL,

	0x9d84f43c2c45e65fULL, 0xb839a70dec92ef32ULL, 0xd6fe525fadebf485ULL,

	0xf343016e6d3cfde8ULL, 0x176b9ef5bcc2049dULL, 0x32d6cdc47c150df0ULL,

	0x5c1138963d6c1647ULL, 0x79ac6ba7fdbb1f2aULL, 0x819ed232bf9e2129ULL,

	0xa42381037f492844ULL, 0xcae474513e3033f3ULL, 0xef592760fee73a9eULL,

	0x1f3c544a7aad4698ULL, 0x3a81077bba7a4ff5ULL, 0x5446f229fb035442ULL,

	0x71fba1183bd45d2fULL, 0x89c9188d79f1632cULL, 0xac744bbcb9266a41ULL,

	0xc2b3beeef85f71f6ULL, 0xe70eeddf3888789bULL, 0x07c40b8a301c8097ULL,

	0x227958bbf0cb89faULL, 0x4cbeade9b1b2924dULL, 0x6903fed871659b20ULL,

	0x9131474d3340a523ULL, 0xb48c147cf397ac4eULL, 0xda4be12eb2eeb7f9ULL,

	0xfff6b21f7239be94ULL, 0x0f93c135f673c292ULL, 0x2a2e920436a4cbffULL,

	0x44e9675677ddd048ULL, 0x61543467b70ad925ULL, 0x99668df2f52fe726ULL,

	0xbcdbdec335f8ee4bULL, 0xd21c2b917481f5fcULL, 0xf7a178a0b456fc91ULL,

	0x1a4d14a6d77c0716ULL, 0x3ff0479717ab0e7bULL, 0x5137b2c556d215ccULL,

	0x748ae1f496051ca1ULL, 0x8cb85861d42022a2ULL, 0xa9050b5014f72bcfULL,

	0xc7c2fe02558e3078ULL, 0xe27fad3395593915ULL, 0x121ade1911134513ULL,

	0x37a78d28d1c44c7eULL, 0x5960787a90bd57c9ULL, 0x7cdd2b4b506a5ea4ULL,

	0x84ef92de124f60a7ULL, 0xa152c1efd29869caULL, 0xcf9534bd93e1727dULL,

	0xea28678c53367b10ULL, 0x0ae281d95ba2831cULL, 0x2f5fd2e89b758a71ULL,

	0x419827bada0c91c6ULL, 0x6425748b1adb98abULL, 0x9c17cd1e58fea6a8ULL,

	0xb9aa9e2f9829afc5ULL, 0xd76d6b7dd950b472ULL, 0xf2d0384c1987bd1fULL,

	0x02b54b669dcdc119ULL, 0x270818575d1ac874ULL, 0x49cfed051c63d3c3ULL,

	0x6c72be34dcb4daaeULL, 0x944007a19e91e4adULL, 0xb1fd54905e46edc0ULL,

	0xdf3aa1c21f3ff677ULL, 0xfa87f2f3dfe8ff1aULL, 0x1eaf6d680e16066fULL,

	0x3b123e59cec10f02ULL, 0x55d5cb0b8fb814b5ULL, 0x7068983a4f6f1dd8ULL,

	0x885a21af0d4a23dbULL, 0xade7729ecd9d2ab6ULL, 0xc32087cc8ce43101ULL,

	0xe69dd4fd4c33386cULL, 0x16f8a7d7c879446aULL, 0x3345f4e608ae4d07ULL,

	0x5d8201b449d756b0ULL, 0x783f528589005fddULL, 0x800deb10cb2561deULL,

	0xa5b0b8210bf268b3ULL, 0xcb774d734a8b7304ULL, 0xeeca1e428a5c7a69ULL,

	0x0e00f81782c88265ULL, 0x2bbdab26421f8b08ULL, 0x457a5e74036690bfULL,

	0x60c70d45c3b199d2ULL, 0x98f5b4d08194a7d1ULL, 0xbd48e7e14143aebcULL,

	0xd38f12b3003ab50bULL, 0xf6324182c0edbc66ULL, 0x065732a844a7c060ULL,

	0x23ea61998470c90dULL, 0x4d2d94cbc509d2baULL, 0x6890c7fa05dedbd7ULL,

	0x90a27e6f47fbe5d4ULL, 0xb51f2d5e872cecb9ULL, 0xdbd8d80cc655f70eULL,

	0xfe658b3d0682fe63ULL

};



static const u_int64_t U[256] = {

	0x0000000000000000ULL, 0x067b86c43d6a6cb0ULL, 0x0cf70d887ad4d960ULL,

	0x0a8c8b4c47beb5d0ULL, 0x19ee1b10f5a9b2c0ULL, 0x1f959dd4c8c3de70ULL,

	0x151916988f7d6ba0ULL, 0x1362905cb2170710ULL, 0x166165102b846cedULL,

	0x101ae3d416ee005dULL, 0x1a9668985150b58dULL, 0x1cedee5c6c3ad93dULL,

	0x0f8f7e00de2dde2dULL, 0x09f4f8c4e347b29dULL, 0x03787388a4f9074dULL,

	0x0503f54c99936bfdULL, 0x097f991197dfd0b7ULL, 0x0f041fd5aab5bc07ULL,

	0x05889499ed0b09d7ULL, 0x03f3125dd0616567ULL, 0x1091820162766277ULL,

	0x16ea04c55f1c0ec7ULL, 0x1c668f8918a2bb17ULL, 0x1a1d094d25c8d7a7ULL,

	0x1f1efc01bc5bbc5aULL, 0x19657ac58131d0eaULL, 0x13e9f189c68f653aULL,

	0x1592774dfbe5098aULL, 0x06f0e71149f20e9aULL, 0x008b61d57498622aULL,

	0x0a07ea993326d7faULL, 0x0c7c6c5d0e4cbb4aULL, 0x12ff32232fbfa16eULL,

	0x1484b4e712d5cddeULL, 0x1e083fab556b780eULL, 0x1873b96f680114beULL,

	0x0b112933da1613aeULL, 0x0d6aaff7e77c7f1eULL, 0x07e624bba0c2caceULL,

	0x019da27f9da8a67eULL, 0x049e5733043bcd83ULL, 0x02e5d1f73951a133ULL,

	0x08695abb7eef14e3ULL, 0x0e12dc7f43857853ULL, 0x1d704c23f1927f43ULL,

	0x1b0bcae7ccf813f3ULL, 0x118741ab8b46a623ULL, 0x17fcc76fb62cca93ULL,

	0x1b80ab32b86071d9ULL, 0x1dfb2df6850a1d69ULL, 0x1777a6bac2b4a8b9ULL,

	0x110c207effdec409ULL, 0x026eb0224dc9c319ULL, 0x041536e670a3afa9ULL,

	0x0e99bdaa371d1a79ULL, 0x08e23b6e0a7776c9ULL, 0x0de1ce2293e41d34ULL,

	0x0b9a48e6ae8e7184ULL, 0x0116c3aae930c454ULL, 0x076d456ed45aa8e4ULL,

	0x140fd532664daff4ULL, 0x127453f65b27c344ULL, 0x18f8d8ba1c997694ULL,

	0x1e835e7e21f31a24ULL, 0x004337779fa84bb1ULL, 0x0638b1b3a2c22701ULL,

	0x0cb43affe57c92d1ULL, 0x0acfbc3bd816fe61ULL, 0x19ad2c676a01f971ULL,

	0x1fd6aaa3576b95c1ULL, 0x155a21ef10d52011ULL, 0x1321a72b2dbf4ca1ULL,

	0x16225267b42c275cULL, 0x1059d4a389464becULL, 0x1ad55fefcef8fe3cULL,

	0x1caed92bf392928cULL, 0x0fcc49774185959cULL, 0x09b7cfb37ceff92cULL,

	0x033b44ff3b514cfcULL, 0x0540c23b063b204cULL, 0x093cae6608779b06ULL,

	0x0f4728a2351df7b6ULL, 0x05cba3ee72a34266ULL, 0x03b0252a4fc92ed6ULL,

	0x10d2b576fdde29c6ULL, 0x16a933b2c0b44576ULL, 0x1c25b8fe870af0a6ULL,

	0x1a5e3e3aba609c16ULL, 0x1f5dcb7623f3f7ebULL, 0x19264db21e999b5bULL,

	0x13aac6fe59272e8bULL, 0x15d1403a644d423bULL, 0x06b3d066d65a452bULL,

	0x00c856a2eb30299bULL, 0x0a44ddeeac8e9c4bULL, 0x0c3f5b2a91e4f0fbULL,

	0x12bc0554b017eadfULL, 0x14c783908d7d866fULL, 0x1e4b08dccac333bfULL,

	0x18308e18f7a95f0fULL, 0x0b521e4445be581fULL, 0x0d29988078d434afULL,

	0x07a513cc3f6a817fULL, 0x01de95080200edcfULL, 0x04dd60449b938632ULL,

	0x02a6e680a6f9ea82ULL, 0x082a6dcce1475f52ULL, 0x0e51eb08dc2d33e2ULL,

	0x1d337b546e3a34f2ULL, 0x1b48fd9053505842ULL, 0x11c476dc14eeed92ULL,

	0x17bff01829848122ULL, 0x1bc39c4527c83a68ULL, 0x1db81a811aa256d8ULL,

	0x173491cd5d1ce308ULL, 0x114f170960768fb8ULL, 0x022d8755d26188a8ULL,

	0x04560191ef0be418ULL, 0x0eda8adda8b551c8ULL, 0x08a10c1995df3d78ULL,

	0x0da2f9550c4c5685ULL, 0x0bd97f9131263a35ULL, 0x0155f4dd76988fe5ULL,

	0x072e72194bf2e355ULL, 0x144ce245f9e5e445ULL, 0x12376481c48f88f5ULL,

	0x18bbefcd83313d25ULL, 0x1ec06909be5b5195ULL, 0x00866eef3f509762ULL,

	0x06fde82b023afbd2ULL, 0x0c71636745844e02ULL, 0x0a0ae5a378ee22b2ULL,

	0x196875ffcaf925a2ULL, 0x1f13f33bf7934912ULL, 0x159f7877b02dfcc2ULL,

	0x13e4feb38d479072ULL, 0x16e70bff14d4fb8fULL, 0x109c8d3b29be973fULL,

	0x1a1006776e0022efULL, 0x1c6b80b3536a4e5fULL, 0x0f0910efe17d494fULL,

	0x0972962bdc1725ffULL, 0x03fe1d679ba9902fULL, 0x05859ba3a6c3fc9fULL,

	0x09f9f7fea88f47d5ULL, 0x0f82713a95e52b65ULL, 0x050efa76d25b9eb5ULL,

	0x03757cb2ef31f205ULL, 0x1017ecee5d26f515ULL, 0x166c6a2a604c99a5ULL,

	0x1ce0e16627f22c75ULL, 0x1a9b67a21a9840c5ULL, 0x1f9892ee830b2b38ULL,

	0x19e3142abe614788ULL, 0x136f9f66f9dff258ULL, 0x151419a2c4b59ee8ULL,

	0x067689fe76a299f8ULL, 0x000d0f3a4bc8f548ULL, 0x0a8184760c764098ULL,

	0x0cfa02b2311c2c28ULL, 0x12795ccc10ef360cULL, 0x1402da082d855abcULL,

	0x1e8e51446a3bef6cULL, 0x18f5d780575183dcULL, 0x0b9747dce54684ccULL,

	0x0decc118d82ce87cULL, 0x07604a549f925dacULL, 0x011bcc90a2f8311cULL,

	0x041839dc3b6b5ae1ULL, 0x0263bf1806013651ULL, 0x08ef345441bf8381ULL,

	0x0e94b2907cd5ef31ULL, 0x1df622cccec2e821ULL, 0x1b8da408f3a88491ULL,

	0x11012f44b4163141ULL, 0x177aa980897c5df1ULL, 0x1b06c5dd8730e6bbULL,

	0x1d7d4319ba5a8a0bULL, 0x17f1c855fde43fdbULL, 0x118a4e91c08e536bULL,

	0x02e8decd7299547bULL, 0x049358094ff338cbULL, 0x0e1fd345084d8d1bULL,

	0x086455813527e1abULL, 0x0d67a0cdacb48a56ULL, 0x0b1c260991dee6e6ULL,

	0x0190ad45d6605336ULL, 0x07eb2b81eb0a3f86ULL, 0x1489bbdd591d3896ULL,

	0x12f23d1964775426ULL, 0x187eb65523c9e1f6ULL, 0x1e0530911ea38d46ULL,

	0x00c55998a0f8dcd3ULL, 0x06bedf5c9d92b063ULL, 0x0c325410da2c05b3ULL,

	0x0a49d2d4e7466903ULL, 0x192b428855516e13ULL, 0x1f50c44c683b02a3ULL,

	0x15dc4f002f85b773ULL, 0x13a7c9c412efdbc3ULL, 0x16a43c888b7cb03eULL,

	0x10dfba4cb616dc8eULL, 0x1a533100f1a8695eULL, 0x1c28b7c4ccc205eeULL,

	0x0f4a27987ed502feULL, 0x0931a15c43bf6e4eULL, 0x03bd2a100401db9eULL,

	0x05c6acd4396bb72eULL, 0x09bac08937270c64ULL, 0x0fc1464d0a4d60d4ULL,

	0x054dcd014df3d504ULL, 0x03364bc57099b9b4ULL, 0x1054db99c28ebea4ULL,

	0x162f5d5dffe4d214ULL, 0x1ca3d611b85a67c4ULL, 0x1ad850d585300b74ULL,

	0x1fdba5991ca36089ULL, 0x19a0235d21c90c39ULL, 0x132ca8116677b9e9ULL,

	0x15572ed55b1dd559ULL, 0x0635be89e90ad249ULL, 0x004e384dd460bef9ULL,

	0x0ac2b30193de0b29ULL, 0x0cb935c5aeb46799ULL, 0x123a6bbb8f477dbdULL,

	0x1441ed7fb22d110dULL, 0x1ecd6633f593a4ddULL, 0x18b6e0f7c8f9c86dULL,

	0x0bd470ab7aeecf7dULL, 0x0daff66f4784a3cdULL, 0x07237d23003a161dULL,

	0x0158fbe73d507aadULL, 0x045b0eaba4c31150ULL, 0x0220886f99a97de0ULL,

	0x08ac0323de17c830ULL, 0x0ed785e7e37da480ULL, 0x1db515bb516aa390ULL,

	0x1bce937f6c00cf20ULL, 0x114218332bbe7af0ULL, 0x17399ef716d41640ULL,

	0x1b45f2aa1898ad0aULL, 0x1d3e746e25f2c1baULL, 0x17b2ff22624c746aULL,

	0x11c979e65f2618daULL, 0x02abe9baed311fcaULL, 0x04d06f7ed05b737aULL,

	0x0e5ce43297e5c6aaULL, 0x082762f6aa8faa1aULL, 0x0d2497ba331cc1e7ULL,

	0x0b5f117e0e76ad57ULL, 0x01d39a3249c81887ULL, 0x07a81cf674a27437ULL,

	0x14ca8caac6b57327ULL, 0x12b10a6efbdf1f97ULL, 0x183d8122bc61aa47ULL,

	0x1e4607e6810bc6f7ULL

};



#else	/* the original ones */



#define RABIN_WINDOW_SIZE 22

#define RABIN_SHIFT 55



static const u_int64_t T[256] = {

	0x0000000000000000ULL, 0xb15e234bd3792f63ULL, 0x62bc4697a6f25ec6ULL,

	0xd3e265dc758b71a5ULL, 0x7426ae649e9d92efULL, 0xc5788d2f4de4bd8cULL,

	0x169ae8f3386fcc29ULL, 0xa7c4cbb8eb16e34aULL, 0x59137f82ee420abdULL,

	0xe84d5cc93d3b25deULL, 0x3baf391548b0547bULL, 0x8af11a5e9bc97b18ULL,

	0x2d35d1e670df9852ULL, 0x9c6bf2ada3a6b731ULL, 0x4f899771d62dc694ULL,

	0xfed7b43a0554e9f7ULL, 0x0378dc4e0ffd3a19ULL, 0xb226ff05dc84157aULL,

	0x61c49ad9a90f64dfULL, 0xd09ab9927a764bbcULL, 0x775e722a9160a8f6ULL,

	0xc600516142198795ULL, 0x15e234bd3792f630ULL, 0xa4bc17f6e4ebd953ULL,

	0x5a6ba3cce1bf30a4ULL, 0xeb35808732c61fc7ULL, 0x38d7e55b474d6e62ULL,

	0x8989c61094344101ULL, 0x2e4d0da87f22a24bULL, 0x9f132ee3ac5b8d28ULL,

	0x4cf14b3fd9d0fc8dULL, 0xfdaf68740aa9d3eeULL, 0x06f1b89c1ffa7432ULL,

	0xb7af9bd7cc835b51ULL, 0x644dfe0bb9082af4ULL, 0xd513dd406a710597ULL,

	0x72d716f88167e6ddULL, 0xc38935b3521ec9beULL, 0x106b506f2795b81bULL,

	0xa1357324f4ec9778ULL, 0x5fe2c71ef1b87e8fULL, 0xeebce45522c151ecULL,

	0x3d5e8189574a2049ULL, 0x8c00a2c284330f2aULL, 0x2bc4697a6f25ec60ULL,

	0x9a9a4a31bc5cc303ULL, 0x49782fedc9d7b2a6ULL, 0xf8260ca61aae9dc5ULL,

	0x058964d210074e2bULL, 0xb4d74799c37e6148ULL, 0x67352245b6f510edULL,

	0xd66b010e658c3f8eULL, 0x71afcab68e9adcc4ULL, 0xc0f1e9fd5de3f3a7ULL,

	0x13138c2128688202ULL, 0xa24daf6afb11ad61ULL, 0x5c9a1b50fe454496ULL,

	0xedc4381b2d3c6bf5ULL, 0x3e265dc758b71a50ULL, 0x8f787e8c8bce3533ULL,

	0x28bcb53460d8d679ULL, 0x99e2967fb3a1f91aULL, 0x4a00f3a3c62a88bfULL,

	0xfb5ed0e81553a7dcULL, 0x0de371383ff4e864ULL, 0xbcbd5273ec8dc707ULL,

	0x6f5f37af9906b6a2ULL, 0xde0114e44a7f99c1ULL, 0x79c5df5ca1697a8bULL,

	0xc89bfc17721055e8ULL, 0x1b7999cb079b244dULL, 0xaa27ba80d4e20b2eULL,

	0x54f00ebad1b6e2d9ULL, 0xe5ae2df102cfcdbaULL, 0x364c482d7744bc1fULL,

	0x87126b66a43d937cULL, 0x20d6a0de4f2b7036ULL, 0x918883959c525f55ULL,

	0x426ae649e9d92ef0ULL, 0xf334c5023aa00193ULL, 0x0e9bad763009d27dULL,

	0xbfc58e3de370fd1eULL, 0x6c27ebe196fb8cbbULL, 0xdd79c8aa4582a3d8ULL,

	0x7abd0312ae944092ULL, 0xcbe320597ded6ff1ULL, 0x1801458508661e54ULL,

	0xa95f66cedb1f3137ULL, 0x5788d2f4de4bd8c0ULL, 0xe6d6f1bf0d32f7a3ULL,

	0x3534946378b98606ULL, 0x846ab728abc0a965ULL, 0x23ae7c9040d64a2fULL,

	0x92f05fdb93af654cULL, 0x41123a07e62414e9ULL, 0xf04c194c355d3b8aULL,

	0x0b12c9a4200e9c56ULL, 0xba4ceaeff377b335ULL, 0x69ae8f3386fcc290ULL,

	0xd8f0ac785585edf3ULL, 0x7f3467c0be930eb9ULL, 0xce6a448b6dea21daULL,

	0x1d8821571861507fULL, 0xacd6021ccb187f1cULL, 0x5201b626ce4c96ebULL,

	0xe35f956d1d35b988ULL, 0x30bdf0b168bec82dULL, 0x81e3d3fabbc7e74eULL,

	0x2627184250d10404ULL, 0x97793b0983a82b67ULL, 0x449b5ed5f6235ac2ULL,

	0xf5c57d9e255a75a1ULL, 0x086a15ea2ff3a64fULL, 0xb93436a1fc8a892cULL,

	0x6ad6537d8901f889ULL, 0xdb8870365a78d7eaULL, 0x7c4cbb8eb16e34a0ULL,

	0xcd1298c562171bc3ULL, 0x1ef0fd19179c6a66ULL, 0xafaede52c4e54505ULL,

	0x51796a68c1b1acf2ULL, 0xe027492312c88391ULL, 0x33c52cff6743f234ULL,

	0x829b0fb4b43add57ULL, 0x255fc40c5f2c3e1dULL, 0x9401e7478c55117eULL,

	0x47e3829bf9de60dbULL, 0xf6bda1d02aa74fb8ULL, 0x1bc6e2707fe9d0c8ULL,

	0xaa98c13bac90ffabULL, 0x797aa4e7d91b8e0eULL, 0xc82487ac0a62a16dULL,

	0x6fe04c14e1744227ULL, 0xdebe6f5f320d6d44ULL, 0x0d5c0a8347861ce1ULL,

	0xbc0229c894ff3382ULL, 0x42d59df291abda75ULL, 0xf38bbeb942d2f516ULL,

	0x2069db65375984b3ULL, 0x9137f82ee420abd0ULL, 0x36f333960f36489aULL,

	0x87ad10dddc4f67f9ULL, 0x544f7501a9c4165cULL, 0xe511564a7abd393fULL,

	0x18be3e3e7014ead1ULL, 0xa9e01d75a36dc5b2ULL, 0x7a0278a9d6e6b417ULL,

	0xcb5c5be2059f9b74ULL, 0x6c98905aee89783eULL, 0xddc6b3113df0575dULL,

	0x0e24d6cd487b26f8ULL, 0xbf7af5869b02099bULL, 0x41ad41bc9e56e06cULL,

	0xf0f362f74d2fcf0fULL, 0x2311072b38a4beaaULL, 0x924f2460ebdd91c9ULL,

	0x358befd800cb7283ULL, 0x84d5cc93d3b25de0ULL, 0x5737a94fa6392c45ULL,

	0xe6698a0475400326ULL, 0x1d375aec6013a4faULL, 0xac6979a7b36a8b99ULL,

	0x7f8b1c7bc6e1fa3cULL, 0xced53f301598d55fULL, 0x6911f488fe8e3615ULL,

	0xd84fd7c32df71976ULL, 0x0badb21f587c68d3ULL, 0xbaf391548b0547b0ULL,

	0x4424256e8e51ae47ULL, 0xf57a06255d288124ULL, 0x269863f928a3f081ULL,

	0x97c640b2fbdadfe2ULL, 0x30028b0a10cc3ca8ULL, 0x815ca841c3b513cbULL,

	0x52becd9db63e626eULL, 0xe3e0eed665474d0dULL, 0x1e4f86a26fee9ee3ULL,

	0xaf11a5e9bc97b180ULL, 0x7cf3c035c91cc025ULL, 0xcdade37e1a65ef46ULL,

	0x6a6928c6f1730c0cULL, 0xdb370b8d220a236fULL, 0x08d56e51578152caULL,

	0xb98b4d1a84f87da9ULL, 0x475cf92081ac945eULL, 0xf602da6b52d5bb3dULL,

	0x25e0bfb7275eca98ULL, 0x94be9cfcf427e5fbULL, 0x337a57441f3106b1ULL,

	0x8224740fcc4829d2ULL, 0x51c611d3b9c35877ULL, 0xe09832986aba7714ULL,

	0x16259348401d38acULL, 0xa77bb003936417cfULL, 0x7499d5dfe6ef666aULL,

	0xc5c7f69435964909ULL, 0x62033d2cde80aa43ULL, 0xd35d1e670df98520ULL,

	0x00bf7bbb7872f485ULL, 0xb1e158f0ab0bdbe6ULL, 0x4f36eccaae5f3211ULL,

	0xfe68cf817d261d72ULL, 0x2d8aaa5d08ad6cd7ULL, 0x9cd48916dbd443b4ULL,

	0x3b1042ae30c2a0feULL, 0x8a4e61e5e3bb8f9dULL, 0x59ac04399630fe38ULL,

	0xe8f227724549d15bULL, 0x155d4f064fe002b5ULL, 0xa4036c4d9c992dd6ULL,

	0x77e10991e9125c73ULL, 0xc6bf2ada3a6b7310ULL, 0x617be162d17d905aULL,

	0xd025c2290204bf39ULL, 0x03c7a7f5778fce9cULL, 0xb29984bea4f6e1ffULL,

	0x4c4e3084a1a20808ULL, 0xfd1013cf72db276bULL, 0x2ef27613075056ceULL,

	0x9fac5558d42979adULL, 0x38689ee03f3f9ae7ULL, 0x8936bdabec46b584ULL,

	0x5ad4d87799cdc421ULL, 0xeb8afb3c4ab4eb42ULL, 0x10d42bd45fe74c9eULL,

	0xa18a089f8c9e63fdULL, 0x72686d43f9151258ULL, 0xc3364e082a6c3d3bULL,

	0x64f285b0c17ade71ULL, 0xd5aca6fb1203f112ULL, 0x064ec327678880b7ULL,

	0xb710e06cb4f1afd4ULL, 0x49c75456b1a54623ULL, 0xf899771d62dc6940ULL,

	0x2b7b12c1175718e5ULL, 0x9a25318ac42e3786ULL, 0x3de1fa322f38d4ccULL,

	0x8cbfd979fc41fbafULL, 0x5f5dbca589ca8a0aULL, 0xee039fee5ab3a569ULL,

	0x13acf79a501a7687ULL, 0xa2f2d4d1836359e4ULL, 0x7110b10df6e82841ULL,

	0xc04e924625910722ULL, 0x678a59fece87e468ULL, 0xd6d47ab51dfecb0bULL,

	0x05361f696875baaeULL, 0xb4683c22bb0c95cdULL, 0x4abf8818be587c3aULL,

	0xfbe1ab536d215359ULL, 0x2803ce8f18aa22fcULL, 0x995dedc4cbd30d9fULL,

	0x3e99267c20c5eed5ULL, 0x8fc70537f3bcc1b6ULL, 0x5c2560eb8637b013ULL,

	0xed7b43a0554e9f70ULL

};



static const u_int64_t U[256] = {

	0x0000000000000000ULL, 0x079343d61ab9f60eULL, 0x0f2687ac3573ec1cULL,

	0x08b5c47a2fca1a12ULL, 0x1e4d0f586ae7d838ULL, 0x19de4c8e705e2e36ULL,

	0x116b88f45f943424ULL, 0x16f8cb22452dc22aULL, 0x3c9a1eb0d5cfb070ULL,

	0x3b095d66cf76467eULL, 0x33bc991ce0bc5c6cULL, 0x342fdacafa05aa62ULL,

	0x22d711e8bf286848ULL, 0x2544523ea5919e46ULL, 0x2df196448a5b8454ULL,

	0x2a62d59290e2725aULL, 0x79343d61ab9f60e0ULL, 0x7ea77eb7b12696eeULL,

	0x7612bacd9eec8cfcULL, 0x7181f91b84557af2ULL, 0x67793239c178b8d8ULL,

	0x60ea71efdbc14ed6ULL, 0x685fb595f40b54c4ULL, 0x6fccf643eeb2a2caULL,

	0x45ae23d17e50d090ULL, 0x423d600764e9269eULL, 0x4a88a47d4b233c8cULL,

	0x4d1be7ab519aca82ULL, 0x5be32c8914b708a8ULL, 0x5c706f5f0e0efea6ULL,

	0x54c5ab2521c4e4b4ULL, 0x5356e8f33b7d12baULL, 0x433659888447eea3ULL,

	0x44a51a5e9efe18adULL, 0x4c10de24b13402bfULL, 0x4b839df2ab8df4b1ULL,

	0x5d7b56d0eea0369bULL, 0x5ae81506f419c095ULL, 0x525dd17cdbd3da87ULL,

	0x55ce92aac16a2c89ULL, 0x7fac473851885ed3ULL, 0x783f04ee4b31a8ddULL,

	0x708ac09464fbb2cfULL, 0x771983427e4244c1ULL, 0x61e148603b6f86ebULL,

	0x66720bb621d670e5ULL, 0x6ec7cfcc0e1c6af7ULL, 0x69548c1a14a59cf9ULL,

	0x3a0264e92fd88e43ULL, 0x3d91273f3561784dULL, 0x3524e3451aab625fULL,

	0x32b7a09300129451ULL, 0x244f6bb1453f567bULL, 0x23dc28675f86a075ULL,

	0x2b69ec1d704cba67ULL, 0x2cfaafcb6af54c69ULL, 0x06987a59fa173e33ULL,

	0x010b398fe0aec83dULL, 0x09befdf5cf64d22fULL, 0x0e2dbe23d5dd2421ULL,

	0x18d5750190f0e60bULL, 0x1f4636d78a491005ULL, 0x17f3f2ada5830a17ULL,

	0x1060b17bbf3afc19ULL, 0x3732905adbf6f225ULL, 0x30a1d38cc14f042bULL,

	0x381417f6ee851e39ULL, 0x3f875420f43ce837ULL, 0x297f9f02b1112a1dULL,

	0x2eecdcd4aba8dc13ULL, 0x265918ae8462c601ULL, 0x21ca5b789edb300fULL,

	0x0ba88eea0e394255ULL, 0x0c3bcd3c1480b45bULL, 0x048e09463b4aae49ULL,

	0x031d4a9021f35847ULL, 0x15e581b264de9a6dULL, 0x1276c2647e676c63ULL,

	0x1ac3061e51ad7671ULL, 0x1d5045c84b14807fULL, 0x4e06ad3b706992c5ULL,

	0x4995eeed6ad064cbULL, 0x41202a97451a7ed9ULL, 0x46b369415fa388d7ULL,

	0x504ba2631a8e4afdULL, 0x57d8e1b50037bcf3ULL, 0x5f6d25cf2ffda6e1ULL,

	0x58fe6619354450efULL, 0x729cb38ba5a622b5ULL, 0x750ff05dbf1fd4bbULL,

	0x7dba342790d5cea9ULL, 0x7a2977f18a6c38a7ULL, 0x6cd1bcd3cf41fa8dULL,

	0x6b42ff05d5f80c83ULL, 0x63f73b7ffa321691ULL, 0x646478a9e08be09fULL,

	0x7404c9d25fb11c86ULL, 0x73978a044508ea88ULL, 0x7b224e7e6ac2f09aULL,

	0x7cb10da8707b0694ULL, 0x6a49c68a3556c4beULL, 0x6dda855c2fef32b0ULL,

	0x656f4126002528a2ULL, 0x62fc02f01a9cdeacULL, 0x489ed7628a7eacf6ULL,

	0x4f0d94b490c75af8ULL, 0x47b850cebf0d40eaULL, 0x402b1318a5b4b6e4ULL,

	0x56d3d83ae09974ceULL, 0x51409becfa2082c0ULL, 0x59f55f96d5ea98d2ULL,

	0x5e661c40cf536edcULL, 0x0d30f4b3f42e7c66ULL, 0x0aa3b765ee978a68ULL,

	0x0216731fc15d907aULL, 0x058530c9dbe46674ULL, 0x137dfbeb9ec9a45eULL,

	0x14eeb83d84705250ULL, 0x1c5b7c47abba4842ULL, 0x1bc83f91b103be4cULL,

	0x31aaea0321e1cc16ULL, 0x3639a9d53b583a18ULL, 0x3e8c6daf1492200aULL,

	0x391f2e790e2bd604ULL, 0x2fe7e55b4b06142eULL, 0x2874a68d51bfe220ULL,

	0x20c162f77e75f832ULL, 0x2752212164cc0e3cULL, 0x6e6520b5b7ede44aULL,

	0x69f66363ad541244ULL, 0x6143a719829e0856ULL, 0x66d0e4cf9827fe58ULL,

	0x70282feddd0a3c72ULL, 0x77bb6c3bc7b3ca7cULL, 0x7f0ea841e879d06eULL,

	0x789deb97f2c02660ULL, 0x52ff3e056222543aULL, 0x556c7dd3789ba234ULL,

	0x5dd9b9a95751b826ULL, 0x5a4afa7f4de84e28ULL, 0x4cb2315d08c58c02ULL,

	0x4b21728b127c7a0cULL, 0x4394b6f13db6601eULL, 0x4407f527270f9610ULL,

	0x17511dd41c7284aaULL, 0x10c25e0206cb72a4ULL, 0x18779a78290168b6ULL,

	0x1fe4d9ae33b89eb8ULL, 0x091c128c76955c92ULL, 0x0e8f515a6c2caa9cULL,

	0x063a952043e6b08eULL, 0x01a9d6f6595f4680ULL, 0x2bcb0364c9bd34daULL,

	0x2c5840b2d304c2d4ULL, 0x24ed84c8fcced8c6ULL, 0x237ec71ee6772ec8ULL,

	0x35860c3ca35aece2ULL, 0x32154feab9e31aecULL, 0x3aa08b90962900feULL,

	0x3d33c8468c90f6f0ULL, 0x2d53793d33aa0ae9ULL, 0x2ac03aeb2913fce7ULL,

	0x2275fe9106d9e6f5ULL, 0x25e6bd471c6010fbULL, 0x331e7665594dd2d1ULL,

	0x348d35b343f424dfULL, 0x3c38f1c96c3e3ecdULL, 0x3babb21f7687c8c3ULL,

	0x11c9678de665ba99ULL, 0x165a245bfcdc4c97ULL, 0x1eefe021d3165685ULL,

	0x197ca3f7c9afa08bULL, 0x0f8468d58c8262a1ULL, 0x08172b03963b94afULL,

	0x00a2ef79b9f18ebdULL, 0x0731acafa34878b3ULL, 0x5467445c98356a09ULL,

	0x53f4078a828c9c07ULL, 0x5b41c3f0ad468615ULL, 0x5cd28026b7ff701bULL,

	0x4a2a4b04f2d2b231ULL, 0x4db908d2e86b443fULL, 0x450ccca8c7a15e2dULL,

	0x429f8f7edd18a823ULL, 0x68fd5aec4dfada79ULL, 0x6f6e193a57432c77ULL,

	0x67dbdd4078893665ULL, 0x60489e966230c06bULL, 0x76b055b4271d0241ULL,

	0x712316623da4f44fULL, 0x7996d218126eee5dULL, 0x7e0591ce08d71853ULL,

	0x5957b0ef6c1b166fULL, 0x5ec4f33976a2e061ULL, 0x567137435968fa73ULL,

	0x51e2749543d10c7dULL, 0x471abfb706fcce57ULL, 0x4089fc611c453859ULL,

	0x483c381b338f224bULL, 0x4faf7bcd2936d445ULL, 0x65cdae5fb9d4a61fULL,

	0x625eed89a36d5011ULL, 0x6aeb29f38ca74a03ULL, 0x6d786a25961ebc0dULL,

	0x7b80a107d3337e27ULL, 0x7c13e2d1c98a8829ULL, 0x74a626abe640923bULL,

	0x7335657dfcf96435ULL, 0x20638d8ec784768fULL, 0x27f0ce58dd3d8081ULL,

	0x2f450a22f2f79a93ULL, 0x28d649f4e84e6c9dULL, 0x3e2e82d6ad63aeb7ULL,

	0x39bdc100b7da58b9ULL, 0x3108057a981042abULL, 0x369b46ac82a9b4a5ULL,

	0x1cf9933e124bc6ffULL, 0x1b6ad0e808f230f1ULL, 0x13df149227382ae3ULL,

	0x144c57443d81dcedULL, 0x02b49c6678ac1ec7ULL, 0x0527dfb06215e8c9ULL,

	0x0d921bca4ddff2dbULL, 0x0a01581c576604d5ULL, 0x1a61e967e85cf8ccULL,

	0x1df2aab1f2e50ec2ULL, 0x15476ecbdd2f14d0ULL, 0x12d42d1dc796e2deULL,

	0x042ce63f82bb20f4ULL, 0x03bfa5e99802d6faULL, 0x0b0a6193b7c8cce8ULL,

	0x0c992245ad713ae6ULL, 0x26fbf7d73d9348bcULL, 0x2168b401272abeb2ULL,

	0x29dd707b08e0a4a0ULL, 0x2e4e33ad125952aeULL, 0x38b6f88f57749084ULL,

	0x3f25bb594dcd668aULL, 0x37907f2362077c98ULL, 0x30033cf578be8a96ULL,

	0x6355d40643c3982cULL, 0x64c697d0597a6e22ULL, 0x6c7353aa76b07430ULL,

	0x6be0107c6c09823eULL, 0x7d18db5e29244014ULL, 0x7a8b9888339db61aULL,

	0x723e5cf21c57ac08ULL, 0x75ad1f2406ee5a06ULL, 0x5fcfcab6960c285cULL,

	0x585c89608cb5de52ULL, 0x50e94d1aa37fc440ULL, 0x577a0eccb9c6324eULL,

	0x4182c5eefcebf064ULL, 0x46118638e652066aULL, 0x4ea44242c9981c78ULL,

	0x49370194d321ea76ULL

};



#endif



static unsigned char rabin_window[RABIN_WINDOW_SIZE];

static unsigned rabin_pos = 0;



#define MIN(x,y) ((y)<(x) ? (y) : (x))

#define MAX(x,y) ((y)>(x) ? (y) : (x))



/*

 * The copies array is the central data structure for diff generation.

 * Data statements are implicit, for ranges not covered by any copy command.

 *

 * The sum of tgt and length for each entry must be monotonically increasing,

 * and data ranges must be non-overlapping. This is accomplished by not

 * extending matches backwards during initial matching.

 *

 * Copies may have zero length, to make it quick to delete copies during

 * optimization. However, the last copy in the list must always be a

 * non-trivial copy.

 *

 * Before committing copies, an important optimization is performed: during 

 * a backward pass through the copies array, each entry is extended backwards,

 * and redundant copies are eliminated.

 *

 * If each match were extended backwards on insertion, the same data may be

 * matched an arbitrary number of times, resulting in potentially quadratic

 * time behavior.

 */



typedef struct copyinfo {

	unsigned src;

	unsigned tgt;

	unsigned length;

} CopyInfo;



static CopyInfo *copies;

static int copy_count = 0;

static unsigned max_copies = 0; /* Dynamically increased */



static unsigned *idx;

static unsigned idx_size;

static unsigned char *idx_data;

static unsigned idx_data_len;



static void rabin_reset(void)

{

	memset(rabin_window, 0, sizeof(rabin_window));

}



static u_int64_t rabin_slide (u_int64_t fp, unsigned char m)

{

	unsigned char om;

	if (++rabin_pos == RABIN_WINDOW_SIZE) rabin_pos = 0;

	om = rabin_window[rabin_pos];

	fp ^= U[om];

	rabin_window[rabin_pos] = m;

	fp = ((fp << 8) | m) ^ T[fp >> RABIN_SHIFT];

	return fp;

}



static void init_idx (unsigned char *data, size_t len, int level)

{

	unsigned index_step = RABIN_WINDOW_SIZE / sizeof(unsigned) * sizeof(unsigned);

	size_t j, k;

	unsigned char ch = 0;

	unsigned maxofs[256];

	unsigned maxlen[256];

	unsigned maxfp[256];

	unsigned runlen = 0;

	u_int64_t fp = 0;



	assert (len <= MAX_SIZE);

	assert (level >= 0 && level <= 9);

	memset(maxofs, 0, sizeof(maxofs));

	memset(maxlen, 0, sizeof(maxlen));

	memset(maxfp, 0, sizeof(maxfp));



	/* index_step must be multiple of word size */

	if (level >= 1)

		index_step = MIN(index_step, 4 * sizeof(unsigned));

	/* Use smaller step size for higher optimization levels or smaller files */

	if (level >= 3 || len <= 65536)

		index_step = MIN (index_step, 3 * sizeof (unsigned));

	if (level >= 4 || len <= 32768)

		index_step = MIN (index_step, 2 * sizeof (unsigned));

	if (level >= 6 || len < 16384)

		index_step = MIN (index_step, 1 * sizeof (unsigned));

	assert (index_step && !(index_step % sizeof (unsigned)));



	/* Add fixed amount to hash table size, as small files will benefit

	   a lot without using significantly more memory or time. */

	idx_size = (level + 1) * (len / index_step) / 2 + MIN_HTAB_SIZE;

	idx_size = MIN (idx_size, MAX_HTAB_SIZE - 1); /* So rounding up works */



	/* Round up to next power of two, but limit to MAX_HTAB_SIZE. */

	{

		unsigned s = MIN_HTAB_SIZE;

		while (s < idx_size) s += s;

		idx_size = s;

	}



	idx_data = data;

	idx_data_len = len;

	idx = calloc(idx_size, sizeof(unsigned)); 



	/* It is tempting to first index higher addresses, so hashes of lower

	   addresses will get preference in the hash table. However, for

	   repetitive patterns with a period that is a divisor of the fingerprint

	   window, this may mean the match is not anchored at the end. 

	   Furthermore, even when using a window length that is prime, the

	   benefits are small and the irregularity of the first matches being

	   more important is not worth it. */



	rabin_reset();



	ch = 0;

	runlen = 0;



	for (j = 0; j + index_step < len; j += index_step) {

		unsigned char pch = 0;

		unsigned hash;



		for (k = 0; k < index_step; k++) {

			pch = ch;

			ch = data[j + k];

			if (ch != pch)

				runlen = 0;

			runlen++;

			fp = rabin_slide(fp, ch);

		}



		/* See if there is a word-aligned window-sized run of equal characters */

		if (runlen >= RABIN_WINDOW_SIZE + sizeof(unsigned) - 1) {

			/* Skip ahead to end of run of identical input characters */

			while (j + k < len && data[j + k] == ch) {

				k++;

				runlen++;

			}



			/* Although matches are usually anchored at the end, in the case

			   of extended runs of equal characters it is better to anchor after the

			   first RABIN_WINDOW_SIZE bytes. This allows for quick skip ahead 

			   while matching such runs, avoiding unneeded fingerprint calculations.

			   Also, when anchoring at the end, matches will be generated after

			   every word, because the fingerprint stays constant. Even though

			   all matches would get combined during match optimization, 

			   it wastes time and space. */

			if (runlen > maxlen[pch] + 4) {

				unsigned ofs;

				/* ofs points RABIN_WINDOW_SIZE bytes after the start of the run,

				   rounded up to the next word */

				ofs = j + k - runlen + RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1);

				ofs -= ofs % sizeof(unsigned);

				maxofs[pch] = ofs;

				maxlen [pch] = runlen;

				assert(maxfp[pch] == 0 || maxfp[pch] == (unsigned)fp);

				maxfp[pch] = (unsigned)fp;

			}

			/* Keep input aligned as if no special run processing had taken place */

			j += k - (k % index_step) - index_step;

			k = index_step;

		}



		/* Testing showed that avoiding collisions using secondary hashing, or

		   hash chaining had little effect and is not worth the time. */

		hash = ((unsigned)fp) & (idx_size - 1);

		idx[hash] = j + k;

	}



	/* Lastly, index the longest runs of equal characters found before.

	   This ensures we always match the longerst such runs available.  */

	for (j = 0; j < 256; j++) 

		if (maxlen[j]) 

			idx[maxfp[j] % idx_size] = maxofs[j];

}



/* Match data against the current index and record all possible copies */

static int find_copies(unsigned char *data, size_t len)

{

	size_t j = 0;

	u_int64_t fp = 0;



	rabin_reset();



	while (j < RABIN_WINDOW_SIZE && j < len)

		fp = rabin_slide(fp, data[j++]);



	while (j < len) { 

		unsigned hash, ofs, src, tgt, runlen, maxrun;



		fp = rabin_slide(fp, data[j++]);

		hash = fp & (idx_size - 1);

		ofs = idx[hash];



		/* Invariant:

		   data[0] .. data[j-1] has been processed

		   fp is fingerprint of sliding window ending at j-1

		   ofs is zero or points just past tentative match

		   ofs is a multiple of index_step */



		if (!ofs)

			continue;



		runlen = 0;

		tgt = j - 4;

		src = ofs - 4;

		maxrun = MIN(idx_data_len - src, len - tgt);



		/* Hot loop */

		while (runlen < maxrun &&

		       data[tgt + runlen] == idx_data[src + runlen])

			runlen++;

		if (runlen < 4)

			continue;



		if (copy_count == max_copies) {

			max_copies *= 2;

			if (!copies) {

				max_copies = MAX_COPIES;

				copies = malloc(max_copies * sizeof(CopyInfo));

			} else {

				copies = realloc(copies, max_copies * sizeof (CopyInfo));

			}

			if (!copies)

				return 0;

		}



		copies[copy_count].src = src;

		copies[copy_count].tgt = tgt;

		copies[copy_count].length = runlen;

		copy_count++;



		/* For runs extending more than RABIN_WINDOW_SIZE bytes beyond j,

		   skip ahead to prevent useless fingerprint computations. */

		if (tgt + runlen > j + RABIN_WINDOW_SIZE)

			j = tgt + runlen - RABIN_WINDOW_SIZE;



		/* Quickly scan ahead without looking for matches

		   until the end of this run */

		while (j < tgt + runlen)

			fp = rabin_slide(fp, data[j++]);

	}



	return 1;

}



static unsigned header_length(unsigned srclen, unsigned tgtlen)

{

	unsigned len = 0;

	assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);



	/* GIT headers start with the length of the source and target,

	   with 7 bits per byte, least significant byte first, and

	   the high bit indicating continuation. */

	do { len++; srclen >>= 7; } while (srclen);

	do { len++; tgtlen >>= 7; } while (tgtlen);



	return len;

}



static unsigned char *write_header(unsigned char *patch, unsigned srclen, unsigned tgtlen)

{

	assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);



	while (srclen >= 0x80) {

		*patch++ = srclen | 0x80;

		srclen >>= 7;

	}

	*patch++ = srclen;



	while (tgtlen >= 0x80) {

		*patch++ = tgtlen | 0x80;

		tgtlen >>= 7;

	}

	*patch++ = tgtlen;



	return patch;

}



static unsigned data_length(unsigned length)

{

	/* Can only include 0x7f data bytes per command */

	unsigned partial = length % 0x7f;

	assert (length > 0 && length <= MAX_SIZE);

	if (partial) partial++;

	return partial + (length / 0x7f) * 0x80;

}



static unsigned char *write_data(unsigned char *patch, unsigned char *data, unsigned size)

{

	assert (size > 0 && size < MAX_SIZE);

	/* The return value must be equal to patch + data_length (patch, size).

	   This correspondence is essential for calculating the patch size.  */



	/* GIT has no data commands for large data, rest is same as GDIFF */

	do {

		unsigned s = size;

		if (s > 0x7f)

			s = 0x7f;

		*patch++ = s;

		memcpy(patch, data, s);

		data += s;

		patch += s;

		size -= s;

	} while (size);



	return patch;

} 



static unsigned copy_length (unsigned offset, unsigned length)

{

	unsigned size = 0;



	assert (offset < MAX_SIZE && length < MAX_SIZE);



	/* For now we only copy a maximum of 0x10000 bytes per command.

	   Longer copies are broken into pieces of that size. */

	do {

		signed s = length;

		if (s > 0x10000)

			s = 0x10000;

		size += !!(s & 0xff) + !!(s & 0xff00);

		size += !!(offset & 0xff) + !!(offset & 0xff00) + 

			!!(offset & 0xff0000) + !!(offset & 0xff000000);

		size += 1;

		offset += s;

		length -= s;

	} while (length);



	return size;

}



static unsigned char * write_copy(unsigned char *patch, unsigned offset, unsigned size)

{

	/* The return value must be equal to patch + copy_length (patch,offset,size).

	   This correspondence is essential for calculating the patch size.  */



	do {

		unsigned char c = 0x80, *cmd = patch++;

		unsigned v, s = size;

		if (s > 0x10000)

			s = 0x10000;



		v = offset;

		if (v & 0xff) c |= 0x01, *patch++ = v;

		v >>= 8;

		if (v & 0xff) c |= 0x02, *patch++ = v;

		v >>= 8;

		if (v & 0xff) c |= 0x04, *patch++ = v;

		v >>= 8;

		if (v & 0xff) c |= 0x08, *patch++ = v;



		v = s;

		if (v & 0xff) c |= 0x10, *patch++ = v;

		v >>= 8;

		if (v & 0xff) c |= 0x20, *patch++ = v;



		*cmd = c;

		offset += s;

		size -= s;

	} while (size);



	return patch;

} 



static unsigned process_copies (unsigned char *data, unsigned length, unsigned maxlen)

{

	int j;

	unsigned ptr = length;

	unsigned patch_bytes = header_length(idx_data_len, length);



	/* Work through the copies backwards, extending each one backwards. */

	for (j = copy_count - 1; j >= 0; j--) {

		CopyInfo *copy = copies+j;

		unsigned src = copy->src;

		unsigned tgt = copy->tgt;

		unsigned len = copy->length;

		int data_follows;



		if (tgt + len > ptr) {

			/* Part of copy already covered by later one, so shorten copy. */

			if (ptr < tgt) {

				/* Copy completely disappeared, but guess that a backward extension

				   might still be useful. This extension is non-contiguous, as it is

				   irrelevant whether the skipped data would have matched or not.

				   Be careful to not extend past the beginning of the source. */

				unsigned adjust = tgt - ptr;



				tgt = ptr;

				src = (src < adjust) ? 0 : src - adjust;



				copy->tgt = tgt;

				copy->src = src;

			}



			len = ptr - tgt;

		}



		while (src && tgt && idx_data[src - 1] == data[tgt - 1]) {

			src--;

			tgt--;

		}

		len += copy->tgt - tgt;



		data_follows = (tgt + len < ptr);



		/* A short copy may cost as much as 6 bytes for the copy and

		   5 as result of an extra data command.

		   It's not worth having extra copies in order to just save a byte or two.

		   Being too smart here may hurt later compression as well. */

		if (len < (data_follows ? 16 : 10))

			len = 0;



		/* Some target data is not covered by the copies, account for

		   the DATA command that will follow the copy. */

		if (len && data_follows) 

			patch_bytes += data_length(ptr - (tgt + len));



		/* Everything about the copy is known and will not change.

		   Write back the new information and update the patch size

		   with the size of the copy instruction. */

		copy->length = len;

		copy->src = src;

		copy->tgt = tgt;



		if (len) {

			/* update patch size for copy command */

			patch_bytes += copy_length (src, len);

			ptr = tgt;

		} else if (j == copy_count - 1) {

			/* Remove empty copies at end of list. */

			copy_count--;

		}



		if (patch_bytes > maxlen)

			return 0;

	}



	/* Account for data before first copy */

	if (ptr != 0)

		patch_bytes += data_length(ptr);



	if (patch_bytes > maxlen)

		return 0;

	return patch_bytes;

}



static unsigned calculate_delta(void *to_buf, unsigned long to_size, unsigned long max_size)

{

	assert (to_size < MAX_SIZE);



	if (!find_copies(to_buf, to_size))

		return 0;

	return process_copies(to_buf, to_size, max_size);

}



static void *create_delta (unsigned char *data, unsigned len, unsigned delta_size)

{

	unsigned char *delta = malloc(delta_size);

	unsigned char *ptr = delta;

	unsigned offset = 0;

	int j;



	ptr = write_header(ptr, idx_data_len, len);



	for (j = 0; j < copy_count; j++) {

		CopyInfo *copy = copies + j;

		unsigned copylen = copy->length;



		if (!copylen)

			continue;



		if (copy->tgt > offset) {

			assert (delta_size - (ptr - delta) > data_length(copy->tgt - offset));

			ptr = write_data(ptr, data + offset, copy->tgt - offset);

		}



		assert(delta_size - (ptr - delta) >= copy_length(copy->src, copylen));

		ptr = write_copy(ptr, copy->src, copylen);

		offset = copy->tgt + copylen;

	}



	if (offset < len)

		ptr = write_data(ptr, data + offset, len - offset);



	assert(ptr - delta == delta_size);



	return delta;

}



static void finalize_idx()

{

	free(copies);

	copies = 0;

	max_copies = 0;

	copy_count = 0;

	free(idx);

	idx = 0;

	idx_size = 0;

	idx_data = 0;

	idx_data_len = 0;

}



void *diff_delta(void *from_buf, unsigned long from_size,

		 void *to_buf, unsigned long to_size,

		 unsigned long *delta_size, unsigned long max_size)

{

	unsigned char *delta = 0;

	unsigned dsize;



	assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE);

	init_idx(from_buf, from_size, 1); /* Use optimization level 1 */

	if (!max_size)

		max_size = MAX_SIZE;

	dsize = calculate_delta (to_buf, to_size, max_size);

	if (dsize)

		delta = create_delta (to_buf, to_size, dsize);

	finalize_idx ();

	if (delta)	

		*delta_size = dsize;

	return delta;

}


^ permalink raw reply

* Re: How do I quickly check what heads a particular commit is in?
From: Martin Langhoff @ 2006-04-24  4:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4q0jd1sj.fsf@assigned-by-dhcp.cox.net>

On 4/24/06, Junio C Hamano <junkio@cox.net> wrote:
> git merge-base $broken_commit "master"
>
> would show $broken_commit if "master" is a fast-forward of
> $broken_commit (i.e. "master" is a descendant).  I think that is
> what you are calling "$broken_commit is in master".

of course! Thanks! I'm in the middle of a somewhat nasty merge, and my
fingers got all tangled working up and down the history finding what
heads have this commit. I can now do a loop with bash calling all the
heads against it. Great!

 (hmmm. just had to decide to go back to basics and perform some
reverts before retrying the merge.)

m

^ permalink raw reply

* Re: How do I quickly check what heads a particular commit is in?
From: Jeff King @ 2006-04-24  4:41 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <46a038f90604232123r7f35660aufbb9da0f561f8ea@mail.gmail.com>

On Mon, Apr 24, 2006 at 04:23:24PM +1200, Martin Langhoff wrote:

> Is there a practical way to ask in what heads they are?

This should work:

$ cat <<'EOF' >find-commit
#!/bin/sh
commit=$1; shift
for i in "$@"; do
  git rev-list $i | fgrep -q $commit && echo $i
done
EOF
$ sh find-commit full-40char-sha1-of-commit head1 head2 ...

You potentially end up traversing parts of the history multiple times
(if they are shared by multiple heads) but git is fast enough that
performance is fine.

I don't know of a way to do it all in a git command without the fgrep.

-Peff

^ permalink raw reply

* Re: How do I quickly check what heads a particular commit is in?
From: Junio C Hamano @ 2006-04-24  4:40 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90604232123r7f35660aufbb9da0f561f8ea@mail.gmail.com>

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> In the middle of a merge of 2 of a bunch of closely related heads, I
> found a few odd things, so I've pickaxe'd and found the (potentially
> bogus) commits. I am using gitk, and while I do find the commit, it's
> not very clear what heads have the dodgy commit. They've probably been
> for a few months in there <sigh>.
>
> Is there a practical way to ask in what heads they are?

git merge-base $broken_commit "master"

would show $broken_commit if "master" is a fast-forward of
$broken_commit (i.e. "master" is a descendant).  I think that is
what you are calling "$broken_commit is in master".

^ permalink raw reply

* How do I quickly check what heads a particular commit is in?
From: Martin Langhoff @ 2006-04-24  4:23 UTC (permalink / raw)
  To: Git Mailing List

In the middle of a merge of 2 of a bunch of closely related heads, I
found a few odd things, so I've pickaxe'd and found the (potentially
bogus) commits. I am using gitk, and while I do find the commit, it's
not very clear what heads have the dodgy commit. They've probably been
for a few months in there <sigh>.

Is there a practical way to ask in what heads they are?

martin

^ permalink raw reply

* [PATCH 5/4] Minimum fixups to cache-tree
From: Junio C Hamano @ 2006-04-24  3:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vodyrdas9.fsf@assigned-by-dhcp.cox.net>

The first hunk is the repeat of the previous "oops".  The second
is a real fix.

With this applied, 100-patch series (going from present to past
at the Linux 2.6 kernel tip) applies correctly with the
following stats:

Trying w/o the patch...
24.57user 4.48system 0:34.47elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+322156minor)pagefaults 0swaps
Trying w/ the patch...
15.81user 3.00system 0:20.84elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+252063minor)pagefaults 0swaps

So there definitely is an improvement.  *Grin*.

The script used to bench this is attached at the end.  "linus" is the tracking
branch and currently it points at v2.6.17-rc2-g6b426e7
$HOME/git-master/bin is where I installed the "master" version,
and $J (aka ../git.junio/) is the freshly compiled area with the
patch series applied (all five of four ;-).

-- >8 --
Minimum fixups to make things usable.

---
diff --git a/apply.c b/apply.c
index 5fa2c1e..e283df3 100644
--- a/apply.c
+++ b/apply.c
@@ -1935,8 +1935,6 @@ static int apply_patch(int fd, const cha
 				  active_cache_sha1) ||
 		    commit_index_file(&cache_file))
 			die("Unable to write new cachefile");
-		cache_tree_update(active_cache_tree,
-				  active_cache, active_nr, 1);
 		write_cache_tree(active_cache_sha1, active_cache_tree);
 	}
 
diff --git a/cache-tree.c b/cache-tree.c
index 4dbdb65..f6d1dd1 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -348,10 +348,7 @@ #endif
 	}
 	for (i = 0; i < it->subtree_nr; i++) {
 		struct cache_tree_sub *down = it->down[i];
-		int len = pathlen + down->namelen;
-		memcpy(path + pathlen, down->name, down->namelen);
-		path[len] = '/';
-		buffer = write_one(down->cache_tree, path, len+1,
+		buffer = write_one(down->cache_tree, down->name, down->namelen,
 				   buffer, size, offset);
 	}
 	return buffer;

--

#!/bin/sh
J=../git.junio
M=$HOME/git-master/bin
export J M

git reset --hard linus

previous=
git rev-list -n 100 HEAD |
while read commit
do
	if test -n "$previous"
	then
		git-diff-tree -p "$previous" "$commit" >"$commit-$previous"
		echo "$commit $previous"
	fi
	previous="$commit"
done >series

git reset --hard linus
echo "Trying w/o the patch..."
/usr/bin/time sh -c '
	while read commit previous
	do
		$M/git-apply --whitespace=nowarn --index "$commit-$previous"
		$M/git-write-tree
	done <series
' >out-1
tree1=`git write-tree`

git reset --hard linus
echo "Trying w/ the patch..."
$J/git-write-tree
/usr/bin/time sh -c '
	while read commit previous
	do
		$J/git-apply --whitespace=nowarn --index "$commit-$previous"
		$J/git-write-tree
	done <series
' >out-2
tree2=`git write-tree`
test "$tree1" = "$tree2" || exit 1
cmp out-1 out-2

^ permalink raw reply related

* Re: RFC: New diff-delta.c implementation
From: Geert Bosch @ 2006-04-24  2:57 UTC (permalink / raw)
  To: Rene Scharfe; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <444A2334.3030501@lsrfire.ath.cx>

On Sat, Apr 22, 2006 at 02:36:04PM +0200, Rene Scharfe wrote:
> Could you please send your code inline, not as an attachment?  And
> possibly as a patch with a Signed-off-by: tag (see
> Documentation/SubmittingPatches)?
For various reasons, mostly to do with managing and searching huge
mailboxes, I'm using Apple Mail. What sucks though is that automatic
line wrapping can't be turned off. This never got fixed, so it's useless
for posting inline patches. That said, I now leave a synchronized copy
of the git repository on my mailserver and use mutt for this reply.
Hopefully things will be better.

Note that I sent this code as a RFC, with explicit disclaimers about
style. So, I did not want to sign off on this code, since I pretty
much knew there would be some problems with the undocumented
("proprietary", according the libxdiff site) file format. In contrast
the GDIFF fileformat was documented very well, and I have a version
of this code that works flawlessly with that format.

> Regarding your FIXME comment about endianess: I think you are looking
> for htonl().  Use it to convert the values from host byte order to
> network byte order (= big endian) and you can get rid of those ugly
> branches.
Ah, I'll use that. It's of course a slight change that all processing
now is big-endian centric, but that might actually even result in
better code in this case. I'm just assuming any decent system has
some highly optimized macro for this and will never do a function call.
This is used in the most performance critical loops, and doing function
calls here will lead to horrendous performance.
> 
> You can use "indent -npro -kr -i8 -ts8 -l80 -ss -ncs" to reformat your
> code into a similar style as used in the rest of git (settings taken
> from Lindent which is shipped with the Linux source).
Although I cringe at 8-space indenting, and find much of the GIT
code close to unreadable for lack of design-level comments, I'll
gladly reformat any code to conform to existing code standards.
Please let me know if you've got documentation on that, as it would
be helpful for me to know what the standard is. (No flame intended. :-)
> 
> After converting to htonl() "make test" ran fine on my x86 box.  Here is
> what I get when I try to repack the git repo, though:
> 
>    $ git repack -a -d
>    Generating pack...
>    Done counting 18985 objects.
>    Deltifying 18985 objects.
>    git-pack-objects: diff-delta.c:766: create_delta: Assertion `ptr -
> delta == (int)delta_size' failed.
> 
> Please let me know if you need more details.
This was a result of incorrect calculation of the size of copy and
data commands. I fixed this in a follow-up patch sent to the list.
For any bug reports, they're easiest to fix if you can find a reproducer
using test-delta.

  -Geert

^ permalink raw reply

* Re: [PATCH 4/4] Invalidate cache-tree entries for touched paths in git-apply.
From: Junio C Hamano @ 2006-04-24  2:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vodyrdas9.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Junio C Hamano <junkio@cox.net> writes:
>
>>  * ... then the big rock falls.  With this, I tried to apply and
>>    then write-tree "diff-tree -p $commit^1 $commit" on top of
>>    "$commit^1" for the last 20 or so commits in the kernel tree.
>>    The "master" version takes 0.15 second per patch on my Duron
>>    750 with 700MB, while this one does that in 0.06 second.
>>    This also helps the memory pressure because we do not have to
>>    regenerate unchanged trees.  810 minor faults with the patch
>>    vs 2150 minor faults without.
>
> Sorry, but not really.  The patch is wrong and the measurement
> was flawed.

Again, sorry, but there are some more bugs in the cache-tree
code that I need to fix and re-measure.

In the meantime, please do not use it on your production
repositories.  It does not seem to produce corrupt trees, but it
creates broken index.aux for no good reason.

^ permalink raw reply

* Re: weird pull behavior as of late
From: Kyle McMartin @ 2006-04-24  1:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David S. Miller, git
In-Reply-To: <7vy7xvdb3x.fsf@assigned-by-dhcp.cox.net>

On Sun, Apr 23, 2006 at 06:18:58PM -0700, Junio C Hamano wrote:
> "David S. Miller" <davem@davemloft.net> writes:
> 
> > Updating from f4ffaa452e71495a06376f12f772342bc57051fc to 6b426e785cb81e53dc2fc4dcf997661472b470ef
> > Fast forward
> >  MAINTAINERS |    4 ++++
> >  1 files changed, 4 insertions(+), 0 deletions(-)
> >
> > I got 446 objects and this amounted to just a 4 line change to the
> > MAINTAINERS file? :-)
> 
> That is weird, certainly, and does not match what I am getting...
>

Hmm, I saw this when I did a git diff linus..master on my tree on hera
before I sent the pull request, but when I did git request-pull it 
had All The Right Stuff (tm), so I assumed I just mucked something up
with my command-line options. When I looked at the tree in gitk,
everything looked sane as well.

Hope I didn't break anything...

Cheers,
	Kyle 

^ permalink raw reply

* Re: [PATCH 4/4] Invalidate cache-tree entries for touched paths in git-apply.
From: Junio C Hamano @ 2006-04-24  1:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7v3bg3etnv.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

>  * ... then the big rock falls.  With this, I tried to apply and
>    then write-tree "diff-tree -p $commit^1 $commit" on top of
>    "$commit^1" for the last 20 or so commits in the kernel tree.
>    The "master" version takes 0.15 second per patch on my Duron
>    750 with 700MB, while this one does that in 0.06 second.
>    This also helps the memory pressure because we do not have to
>    regenerate unchanged trees.  810 minor faults with the patch
>    vs 2150 minor faults without.

Sorry, but not really.  The patch is wrong and the measurement
was flawed.

It was doing unnecessarily more work in git-apply, which made
git-write-tree a no-op, and I was measuring only git-write-tree.

The following goes on top of the series to remove that
unnecessary work from git-apply.  Unfortunately this makes the
overall combination a bit slower than before X-<.

But I have not optimized cache-tree.c for speed; for example,
its subtree lists are not even sorted.  Once that is done we may
get decent speedups from the combo.

---
diff --git a/apply.c b/apply.c
index 5fa2c1e..e283df3 100644
--- a/apply.c
+++ b/apply.c
@@ -1935,8 +1935,6 @@ static int apply_patch(int fd, const cha
 				  active_cache_sha1) ||
 		    commit_index_file(&cache_file))
 			die("Unable to write new cachefile");
-		cache_tree_update(active_cache_tree,
-				  active_cache, active_nr, 1);
 		write_cache_tree(active_cache_sha1, active_cache_tree);
 	}

^ permalink raw reply related

* Re: weird pull behavior as of late
From: Junio C Hamano @ 2006-04-24  1:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: git
In-Reply-To: <20060423.175953.52710961.davem@davemloft.net>

"David S. Miller" <davem@davemloft.net> writes:

> Updating from f4ffaa452e71495a06376f12f772342bc57051fc to 6b426e785cb81e53dc2fc4dcf997661472b470ef
> Fast forward
>  MAINTAINERS |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> I got 446 objects and this amounted to just a 4 line change to the
> MAINTAINERS file? :-)

That is weird, certainly, and does not match what I am getting...

Updating from f4ffaa452e71495a06376f12f772342bc57051fc to 6b426e785cb81e53dc2fc4dcf997661472b470ef
Fast forward
 MAINTAINERS                                 |    4
 arch/parisc/Kconfig                         |   31 +
 arch/parisc/defconfig                       |  494 +++++++++++------
 ... (many lines later)
 sound/ppc/tumbler.c                         |    2
 91 files changed, 3008 insertions(+), 1286 deletions(-)
 create mode 100644 drivers/char/tpm/tpm_tis.c

^ permalink raw reply

* weird pull behavior as of late
From: David S. Miller @ 2006-04-24  0:59 UTC (permalink / raw)
  To: git

I just pulled from Linus's linux-2.6.git tree knowing there was a bit
of activity over the past day:

Generating pack...
Done counting 446 objects.
Deltifying 446 objects.
 100% (446/446) done
Unpacking 446 objects
Total 446, written 446 (delta 235), reused 0 (delta 0)
 100% (446/446) done
* refs/heads/origin: fast forward to branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
  from f4ffaa452e71495a06376f12f772342bc57051fc to 6b426e785cb81e53dc2fc4dcf997661472b470ef

Updating from f4ffaa452e71495a06376f12f772342bc57051fc to 6b426e785cb81e53dc2fc4dcf997661472b470ef
Fast forward
 MAINTAINERS |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

I got 446 objects and this amounted to just a 4 line change to the
MAINTAINERS file? :-)

If I do a "gitk ORIG_HEAD.." all the changes are there, just the
diffstat it spits out during the pull is weird.

Just FYI...

^ permalink raw reply

* [PATCH 4/4] Invalidate cache-tree entries for touched paths in git-apply.
From: Junio C Hamano @ 2006-04-23 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This updates git-apply to maintain cache-tree information.  With
this and the previous write-tree patch, repeated "apply --index"
followed by "write-tree" on a huge tree will hopefully become
faster.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * ... then the big rock falls.  With this, I tried to apply and
   then write-tree "diff-tree -p $commit^1 $commit" on top of
   "$commit^1" for the last 20 or so commits in the kernel tree.
   The "master" version takes 0.15 second per patch on my Duron
   750 with 700MB, while this one does that in 0.06 second.
   This also helps the memory pressure because we do not have to
   regenerate unchanged trees.  810 minor faults with the patch
   vs 2150 minor faults without.

 apply.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

94005d7bb4b55e9984f6505a8916d23d304ccc4d
diff --git a/apply.c b/apply.c
index 269210a..f7cdefa 100644
--- a/apply.c
+++ b/apply.c
@@ -8,9 +8,14 @@
  */
 #include <fnmatch.h>
 #include "cache.h"
+#include "cache-tree.h"
 #include "quote.h"
 #include "blob.h"
 
+static unsigned char active_cache_sha1[20];
+static struct cache_tree *active_cache_tree;
+
+
 //  --check turns on checking that the working tree matches the
 //    files that are being modified, but doesn't apply the patch
 //  --stat does just a diffstat, and doesn't actually apply
@@ -1717,6 +1722,7 @@ static void remove_file(struct patch *pa
 	if (write_index) {
 		if (remove_file_from_cache(patch->old_name) < 0)
 			die("unable to remove %s from index", patch->old_name);
+		cache_tree_invalidate_path(active_cache_tree, patch->old_name);
 	}
 	unlink(patch->old_name);
 }
@@ -1815,6 +1821,7 @@ static void create_file(struct patch *pa
 		mode = S_IFREG | 0644;
 	create_one_file(path, mode, buf, size);	
 	add_index_file(path, mode, buf, size);
+	cache_tree_invalidate_path(active_cache_tree, path);
 }
 
 static void write_out_one_result(struct patch *patch)
@@ -1912,8 +1919,9 @@ static int apply_patch(int fd, const cha
 	if (write_index)
 		newfd = hold_index_file_for_update(&cache_file, get_index_file());
 	if (check_index) {
-		if (read_cache() < 0)
+		if (read_cache_1(active_cache_sha1) < 0)
 			die("unable to read index file");
+		active_cache_tree = read_cache_tree(active_cache_sha1);
 	}
 
 	if ((check || apply) && check_patch_list(list) < 0)
@@ -1923,9 +1931,13 @@ static int apply_patch(int fd, const cha
 		write_out_results(list, skipped_patch);
 
 	if (write_index) {
-		if (write_cache(newfd, active_cache, active_nr) ||
+		if (write_cache_1(newfd, active_cache, active_nr,
+				  active_cache_sha1) ||
 		    commit_index_file(&cache_file))
 			die("Unable to write new cachefile");
+		cache_tree_update(active_cache_tree,
+				  active_cache, active_nr, 1);
+		write_cache_tree(active_cache_sha1, active_cache_tree);
 	}
 
 	if (show_index_info)
-- 
1.3.0.g623a

^ permalink raw reply related

* [PATCH 3/4] Update write-tree to use cache-tree.
From: Junio C Hamano @ 2006-04-23 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

The updated write-tree reads from $GIT_DIR/index.aux to pick up
subtree objects information, updates the cache-tree with the
index, and updates index.aux file after writing a tree out of
the index file.

Until update-index and other programs that modify the index are
updated to maintain index.aux file, the index.aux file written
by the last write-tree will become stale immediately after they
update the index, which will result in the whole tree
recomputation just like the original write-tree.

The idea is to convert those commands to invalidate cache-tree
whenever they touch the index entries, and write updated
index.aux out.  After the index is updated with them, write-tree
will be able to reuse the parts of the cache-tree that have not
been touched.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * And this is the "Willie the Coyote waiting for the big rock
   to fail" patch.  Right now nobody uses the cache but...

 write-tree.c |  135 +++++-----------------------------------------------------
 1 files changed, 11 insertions(+), 124 deletions(-)

ff283e7fafdd0740f74dfa65d8efaaaf9a5a0bbf
diff --git a/write-tree.c b/write-tree.c
index dcad6e6..dbcfe8c 100644
--- a/write-tree.c
+++ b/write-tree.c
@@ -5,96 +5,23 @@
  */
 #include "cache.h"
 #include "tree.h"
+#include "cache-tree.h"
 
-static int missing_ok = 0;
-
-static int check_valid_sha1(unsigned char *sha1)
-{
-	int ret;
-
-	/* If we were anal, we'd check that the sha1 of the contents actually matches */
-	ret = has_sha1_file(sha1);
-	if (ret == 0)
-		perror(sha1_file_name(sha1));
-	return ret ? 0 : -1;
-}
-
-static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
-{
-	unsigned char subdir_sha1[20];
-	unsigned long size, offset;
-	char *buffer;
-	int nr;
-
-	/* Guess at some random initial size */
-	size = 8192;
-	buffer = xmalloc(size);
-	offset = 0;
-
-	nr = 0;
-	while (nr < maxentries) {
-		struct cache_entry *ce = cachep[nr];
-		const char *pathname = ce->name, *filename, *dirname;
-		int pathlen = ce_namelen(ce), entrylen;
-		unsigned char *sha1;
-		unsigned int mode;
-
-		/* Did we hit the end of the directory? Return how many we wrote */
-		if (baselen >= pathlen || memcmp(base, pathname, baselen))
-			break;
-
-		sha1 = ce->sha1;
-		mode = ntohl(ce->ce_mode);
-
-		/* Do we have _further_ subdirectories? */
-		filename = pathname + baselen;
-		dirname = strchr(filename, '/');
-		if (dirname) {
-			int subdir_written;
+static unsigned char active_cache_sha1[20];
+static struct cache_tree *active_cache_tree;
 
-			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
-			nr += subdir_written;
-
-			/* Now we need to write out the directory entry into this tree.. */
-			mode = S_IFDIR;
-			pathlen = dirname - pathname;
-
-			/* ..but the directory entry doesn't count towards the total count */
-			nr--;
-			sha1 = subdir_sha1;
-		}
-
-		if (!missing_ok && check_valid_sha1(sha1) < 0)
-			exit(1);
-
-		entrylen = pathlen - baselen;
-		if (offset + entrylen + 100 > size) {
-			size = alloc_nr(offset + entrylen + 100);
-			buffer = xrealloc(buffer, size);
-		}
-		offset += sprintf(buffer + offset, "%o %.*s", mode, entrylen, filename);
-		buffer[offset++] = 0;
-		memcpy(buffer + offset, sha1, 20);
-		offset += 20;
-		nr++;
-	}
-
-	write_sha1_file(buffer, offset, tree_type, returnsha1);
-	free(buffer);
-	return nr;
-}
+static int missing_ok = 0;
 
 static const char write_tree_usage[] = "git-write-tree [--missing-ok]";
 
 int main(int argc, char **argv)
 {
-	int i, funny;
 	int entries;
-	unsigned char sha1[20];
 	
 	setup_git_directory();
 
-	entries = read_cache();
+	entries = read_cache_1(active_cache_sha1);
+	active_cache_tree = read_cache_tree(active_cache_sha1);
 	if (argc == 2) {
 		if (!strcmp(argv[1], "--missing-ok"))
 			missing_ok = 1;
@@ -108,51 +35,11 @@ int main(int argc, char **argv)
 	if (entries < 0)
 		die("git-write-tree: error reading cache");
 
-	/* Verify that the tree is merged */
-	funny = 0;
-	for (i = 0; i < entries; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (ce_stage(ce)) {
-			if (10 < ++funny) {
-				fprintf(stderr, "...\n");
-				break;
-			}
-			fprintf(stderr, "%s: unmerged (%s)\n", ce->name, sha1_to_hex(ce->sha1));
-		}
-	}
-	if (funny)
-		die("git-write-tree: not able to write tree");
-
-	/* Also verify that the cache does not have path and path/file
-	 * at the same time.  At this point we know the cache has only
-	 * stage 0 entries.
-	 */
-	funny = 0;
-	for (i = 0; i < entries - 1; i++) {
-		/* path/file always comes after path because of the way
-		 * the cache is sorted.  Also path can appear only once,
-		 * which means conflicting one would immediately follow.
-		 */
-		const char *this_name = active_cache[i]->name;
-		const char *next_name = active_cache[i+1]->name;
-		int this_len = strlen(this_name);
-		if (this_len < strlen(next_name) &&
-		    strncmp(this_name, next_name, this_len) == 0 &&
-		    next_name[this_len] == '/') {
-			if (10 < ++funny) {
-				fprintf(stderr, "...\n");
-				break;
-			}
-			fprintf(stderr, "You have both %s and %s\n",
-				this_name, next_name);
-		}
-	}
-	if (funny)
-		die("git-write-tree: not able to write tree");
+	if (cache_tree_update(active_cache_tree, active_cache, active_nr,
+			      missing_ok))
+		die("git-write-tree: error building trees");
+	write_cache_tree(active_cache_sha1, active_cache_tree);
 
-	/* Ok, write it out */
-	if (write_tree(active_cache, entries, "", 0, sha1) != entries)
-		die("git-write-tree: internal error");
-	printf("%s\n", sha1_to_hex(sha1));
+	printf("%s\n", sha1_to_hex(active_cache_tree->sha1));
 	return 0;
 }
-- 
1.3.0.g623a

^ permalink raw reply related

* [PATCH 2/4] Add cache-tree.
From: Junio C Hamano @ 2006-04-23 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

The cache_tree data structure is to cache tree object names that
would result from the current index file.

The idea is to have an optional file to record each tree object
name that corresponds to a directory path in the cache when we
run write_cache(), and read it back when we run read_cache().
During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy.

We could perhaps make the cache-tree data an optional part of
the index file, but that would involve the index format updates,
so unless we need it for performance reasons, the current plan
is to use a separate file, $GIT_DIR/index.aux to store this
information and link it with the index file with the checksum
that is already used for index file integrity check.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * This is just the data structure and its maintenance, and not
   tied to the rest of the system yet, which will be done with
   the next two patches.

 Makefile     |    2 
 cache-tree.c |  522 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 cache-tree.h |   29 +++
 3 files changed, 552 insertions(+), 1 deletions(-)
 create mode 100644 cache-tree.c
 create mode 100644 cache-tree.h

c37a2dc93fb49dada5eec78f002eea9354e4fb4f
diff --git a/Makefile b/Makefile
index d9a3a82..518c3c1 100644
--- a/Makefile
+++ b/Makefile
@@ -204,7 +204,7 @@ DIFF_OBJS = \
 	diffcore-delta.o log-tree.o
 
 LIB_OBJS = \
-	blob.o commit.o connect.o csum-file.o \
+	blob.o commit.o connect.o csum-file.o cache-tree.o \
 	date.o diff-delta.o entry.o exec_cmd.o ident.o index.o \
 	object.o pack-check.o patch-delta.o path.o pkt-line.o \
 	quote.o read-cache.o refs.o run-command.o \
diff --git a/cache-tree.c b/cache-tree.c
new file mode 100644
index 0000000..c79da33
--- /dev/null
+++ b/cache-tree.c
@@ -0,0 +1,522 @@
+#include "cache.h"
+#include "tree.h"
+#include "cache-tree.h"
+
+#define DEBUG 0
+
+struct cache_tree *cache_tree(void)
+{
+	struct cache_tree *it = xcalloc(1, sizeof(struct cache_tree));
+	it->entry_count = -1; 
+	return it;
+}
+
+void cache_tree_free(struct cache_tree *it)
+{
+	int i;
+
+	if (!it)
+		return;
+	for (i = 0; i < it->subtree_nr; i++)
+		cache_tree_free(it->down[i]->cache_tree);
+	free(it->down);
+	free(it);
+}
+
+static struct cache_tree_sub *find_subtree(struct cache_tree *it,
+					   const char *path,
+					   int pathlen,
+					   int create)
+{
+	int i;
+	struct cache_tree_sub *down;
+	for (i = 0; i < it->subtree_nr; i++) {
+		down = it->down[i];
+		if (down->namelen == pathlen &&
+		    !memcmp(down->name, path, pathlen))
+			return down;
+	}
+	if (!create)
+		return NULL;
+	if (it->subtree_alloc <= it->subtree_nr) {
+		it->subtree_alloc = alloc_nr(it->subtree_alloc);
+		it->down = xrealloc(it->down, it->subtree_alloc *
+				    sizeof(*it->down));
+	}
+	down = xmalloc(sizeof(*down) + pathlen + 1);
+	down->cache_tree = NULL; /* cache_tree(); */
+	down->namelen = pathlen;
+	memcpy(down->name, path, pathlen);
+	down->name[pathlen] = 0; /* not strictly needed */
+	it->down[it->subtree_nr++] = down;
+	return down;
+}
+
+void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
+{
+	/* a/b/c
+	 * ==> invalidate self
+	 * ==> find "a", have it invalidate "b/c"
+	 * a
+	 * ==> invalidate self
+	 * ==> if "a" exists as a subtree, remove it.
+	 */
+	const char *slash;
+	int namelen;
+	struct cache_tree_sub *down;
+
+	if (!it)
+		return;
+	slash = strchr(path, '/');
+	it->entry_count = -1; 
+	if (!slash) {
+		int i;
+		namelen = strlen(path);
+		for (i = 0; i < it->subtree_nr; i++) {
+			if (it->down[i]->namelen == namelen &&
+			    !memcmp(it->down[i]->name, path, namelen))
+				break;
+		}
+		if (i < it->subtree_nr) {
+			cache_tree_free(it->down[i]->cache_tree);
+			free(it->down[i]);
+			/* 0 1 2 3 4 5
+			 *       ^     ^subtree_nr = 6
+			 *       i
+			 * move 4 and 5 up one place (2 entries)
+			 * 2 = 6 - 3 - 1 = subtree_nr - i - 1
+			 */
+			memmove(it->down+i, it->down+i+1,
+				sizeof(struct cache_tree_sub *) *
+				(it->subtree_nr - i - 1));
+			it->subtree_nr--;
+		}
+		return;
+	}
+	namelen = slash - path;
+	down = find_subtree(it, path, namelen, 0);
+	if (down)
+		cache_tree_invalidate_path(down->cache_tree, slash + 1);
+}
+
+static int verify_cache(struct cache_entry **cache,
+			int entries)
+{
+	int i, funny;
+
+	/* Verify that the tree is merged */
+	funny = 0;
+	for (i = 0; i < entries; i++) {
+		struct cache_entry *ce = cache[i];
+		if (ce_stage(ce)) {
+			if (10 < ++funny) {
+				fprintf(stderr, "...\n");
+				break;
+			}
+			fprintf(stderr, "%s: unmerged (%s)\n",
+				ce->name, sha1_to_hex(ce->sha1));
+		}
+	}
+	if (funny)
+		return -1;
+
+	/* Also verify that the cache does not have path and path/file
+	 * at the same time.  At this point we know the cache has only
+	 * stage 0 entries.
+	 */
+	funny = 0;
+	for (i = 0; i < entries - 1; i++) {
+		/* path/file always comes after path because of the way
+		 * the cache is sorted.  Also path can appear only once,
+		 * which means conflicting one would immediately follow.
+		 */
+		const char *this_name = cache[i]->name;
+		const char *next_name = cache[i+1]->name;
+		int this_len = strlen(this_name);
+		if (this_len < strlen(next_name) &&
+		    strncmp(this_name, next_name, this_len) == 0 &&
+		    next_name[this_len] == '/') {
+			if (10 < ++funny) {
+				fprintf(stderr, "...\n");
+				break;
+			}
+			fprintf(stderr, "You have both %s and %s\n",
+				this_name, next_name);
+		}
+	}
+	if (funny)
+		return -1;
+	return 0;
+}
+
+static void discard_unused_subtrees(struct cache_tree *it)
+{
+	struct cache_tree_sub **down = it->down;
+	int nr = it->subtree_nr;
+	int dst, src;
+	for (dst = src = 0; src < nr; src++) {
+		struct cache_tree_sub *s = down[src];
+		if (s->used)
+			down[dst++] = s;
+		else {
+			cache_tree_free(s->cache_tree);
+			free(s);
+			it->subtree_nr--;
+		}
+	}
+}
+
+static int update_one(struct cache_tree *it,
+		      struct cache_entry **cache,
+		      int entries,
+		      const char *base,
+		      int baselen,
+		      int missing_ok)
+{
+	unsigned long size, offset;
+	char *buffer;
+	int i;
+
+	if (0 <= it->entry_count)
+		return it->entry_count;
+
+	/*
+	 * We first scan for subtrees and update them; we start by
+	 * marking existing subtrees -- the ones that are unmarked
+	 * should not be in the result.
+	 */
+	for (i = 0; i < it->subtree_nr; i++)
+		it->down[i]->used = 0;
+
+	/*
+	 * Find the subtrees and update them.
+	 */
+	for (i = 0; i < entries; i++) {
+		struct cache_entry *ce = cache[i];
+		struct cache_tree_sub *sub;
+		const char *path, *slash;
+		int pathlen, sublen, subcnt;
+
+		path = ce->name;
+		pathlen = ce_namelen(ce);
+		if (pathlen <= baselen || memcmp(base, path, baselen))
+			break; /* at the end of this level */
+
+		slash = strchr(path + baselen, '/');
+		if (!slash)
+			continue;
+		/*
+		 * a/bbb/c (base = a/, slash = /c)
+		 * ==>
+		 * path+baselen = bbb/c, sublen = 3
+		 */
+		sublen = slash - (path + baselen);
+		sub = find_subtree(it, path + baselen, sublen, 1);
+		if (!sub->cache_tree)
+			sub->cache_tree = cache_tree();
+		subcnt = update_one(sub->cache_tree,
+				    cache + i, entries - i,
+				    path,
+				    baselen + sublen + 1,
+				    missing_ok);
+		i += subcnt - 1;
+		sub->used = 1;
+	}
+
+	discard_unused_subtrees(it);
+
+	/*
+	 * Then write out the tree object for this level.
+	 */
+	size = 8192;
+	buffer = xmalloc(size);
+	offset = 0;
+
+	for (i = 0; i < entries; i++) {
+		struct cache_entry *ce = cache[i];
+		struct cache_tree_sub *sub;
+		const char *path, *slash;
+		int pathlen, entlen;
+		const unsigned char *sha1;
+		unsigned mode;
+
+		path = ce->name;
+		pathlen = ce_namelen(ce);
+		if (pathlen <= baselen || memcmp(base, path, baselen))
+			break; /* at the end of this level */
+
+		slash = strchr(path + baselen, '/');
+		if (slash) {
+			entlen = slash - (path + baselen);
+			sub = find_subtree(it, path + baselen, entlen, 0);
+			if (!sub)
+				die("cache-tree.c: '%.*s' in '%s' not found",
+				    entlen, path + baselen, path);
+			i += sub->cache_tree->entry_count - 1;
+			sha1 = sub->cache_tree->sha1;
+			mode = S_IFDIR;
+		}
+		else {
+			sha1 = ce->sha1;
+			mode = ntohl(ce->ce_mode);
+			entlen = pathlen - baselen;
+		}
+		if (!missing_ok && !has_sha1_file(sha1))
+			return error("invalid object %s", sha1_to_hex(sha1));
+
+		if (!ce->ce_mode)
+			continue; /* entry being removed */
+
+		if (size < offset + entlen + 100) {
+			size = alloc_nr(offset + entlen + 100);
+			buffer = xrealloc(buffer, size);
+		}
+		offset += sprintf(buffer + offset,
+				  "%o %.*s", mode, entlen, path + baselen);
+		buffer[offset++] = 0;
+		memcpy(buffer + offset, sha1, 20);
+		offset += 20;
+
+#if DEBUG
+		fprintf(stderr, "cache-tree %o %.*s\n",
+			mode, entlen, path + baselen);
+#endif
+	}
+
+	write_sha1_file(buffer, offset, tree_type, it->sha1);
+	free(buffer);
+	it->entry_count = i;
+#if DEBUG
+	fprintf(stderr, "cache-tree (%d ent, %d subtree) %s\n",
+		it->entry_count, it->subtree_nr,
+		sha1_to_hex(it->sha1));
+#endif
+	return i;
+}
+
+int cache_tree_update(struct cache_tree *it,
+		      struct cache_entry **cache,
+		      int entries,
+		      int missing_ok)
+{
+	int i;
+	i = verify_cache(cache, entries);
+	if (i)
+		return i;
+	i = update_one(it, cache, entries, "", 0, missing_ok);
+	if (i < 0)
+		return i;
+	return 0;
+}
+
+static void *write_one(struct cache_tree *it,
+		       char *path,
+		       int pathlen,
+		       char *buffer,
+		       unsigned long *size,
+		       unsigned long *offset)
+{
+	int i;
+
+	/* One "cache-tree" entry consists of the following:
+	 * path (NUL terminated)
+	 * entry_count, subtree_nr ("%d %d\n")
+	 * tree-sha1 (missing if invalid)
+	 * subtree_nr "cache-tree" entries for subtrees.
+	 */
+	if (*size < *offset + pathlen + 100) {
+		*size = alloc_nr(*offset + pathlen + 100);
+		buffer = xrealloc(buffer, *size);
+	}
+	*offset += sprintf(buffer + *offset, "%.*s%c%d %d\n",
+			   pathlen, path, 0,
+			   it->entry_count, it->subtree_nr);
+
+#if DEBUG
+	if (0 <= it->entry_count)
+		fprintf(stderr, "cache-tree <%.*s> (%d ent, %d subtree) %s\n",
+			pathlen, path, it->entry_count, it->subtree_nr,
+			sha1_to_hex(it->sha1));
+	else
+		fprintf(stderr, "cache-tree <%.*s> (%d subtree) invalid\n",
+			pathlen, path, it->subtree_nr);
+#endif
+
+	if (0 <= it->entry_count) {
+		memcpy(buffer + *offset, it->sha1, 20);
+		*offset += 20;
+	}
+	for (i = 0; i < it->subtree_nr; i++) {
+		struct cache_tree_sub *down = it->down[i];
+		int len = pathlen + down->namelen;
+		memcpy(path + pathlen, down->name, down->namelen);
+		path[len] = '/';
+		buffer = write_one(down->cache_tree, path, len+1,
+				   buffer, size, offset);
+	}
+	return buffer;
+}
+
+static void *cache_tree_write(const unsigned char *cache_sha1,
+			      struct cache_tree *root,
+			      unsigned long *offset_p)
+{
+	char path[PATH_MAX];
+	unsigned long size = 8192;
+	char *buffer = xmalloc(size);
+
+	/* the cache checksum of the corresponding index file. */
+	memcpy(buffer, cache_sha1, 20);
+	*offset_p = 20;
+	path[0] = 0;
+	return write_one(root, path, 0, buffer, &size, offset_p);
+}
+
+static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
+{
+	const char *buf = *buffer;
+	unsigned long size = *size_p;
+	struct cache_tree *it;
+	int i, subtree_nr;
+
+	it = NULL;
+	/* skip name, but make sure name exists */
+	while (size && *buf) {
+		size--;
+		buf++;
+	}
+	if (!size)
+		goto free_return;
+	buf++; size--;
+	it = cache_tree();
+	if (sscanf(buf, "%d %d\n", &it->entry_count, &subtree_nr) != 2)
+		goto free_return;
+	while (size && *buf && *buf != '\n') {
+		size--;
+		buf++;
+	}
+	if (!size)
+		goto free_return;
+	buf++; size--;
+	if (0 <= it->entry_count) {
+		if (size < 20)
+			goto free_return;
+		memcpy(it->sha1, buf, 20);
+		buf += 20;
+		size -= 20;
+	}
+
+#if DEBUG
+	if (0 <= it->entry_count)
+		fprintf(stderr, "cache-tree <%s> (%d ent, %d subtree) %s\n",
+			*buffer, it->entry_count, subtree_nr,
+			sha1_to_hex(it->sha1));
+	else
+		fprintf(stderr, "cache-tree <%s> (%d subtrees) invalid\n",
+			*buffer, subtree_nr);
+#endif
+
+	/*
+	 * Just a heuristic -- we do not add directories that often but
+	 * we do not want to have to extend it immediately when we do,
+	 * hence +2.
+	 */
+	it->subtree_alloc = subtree_nr + 2;
+	it->down = xcalloc(it->subtree_alloc, sizeof(struct cache_tree_sub *));
+	for (i = 0; i < subtree_nr; i++) {
+		/* read each subtree */
+		struct cache_tree *sub;
+		const char *name = buf;
+		int namelen;
+		sub = read_one(&buf, &size);
+		if (!sub)
+			goto free_return;
+		namelen = strlen(name);
+		it->down[i] = find_subtree(it, name, namelen, 1);
+		it->down[i]->cache_tree = sub;
+	}
+	if (subtree_nr != it->subtree_nr)
+		die("cache-tree: internal error");
+	*buffer = buf;
+	*size_p = size;
+	return it;
+
+ free_return:
+	cache_tree_free(it);
+	return NULL;
+}
+
+static struct cache_tree *cache_tree_read(unsigned char *sha1,
+					  const char *buffer,
+					  unsigned long size)
+{
+	/* check the cache-tree matches the index */
+	if (memcmp(buffer, sha1, 20))
+		return NULL; /* checksum mismatch */
+	if (buffer[20])
+		return NULL; /* not the whole tree */
+	buffer += 20;
+	size -= 20;
+	return read_one(&buffer, &size);
+}
+
+struct cache_tree *read_cache_tree(unsigned char *sha1)
+{
+	int fd;
+	struct stat st;
+	char path[PATH_MAX];
+	unsigned long size = 0;
+	void *map;
+	struct cache_tree *it;
+
+	sprintf(path, "%s.aux", get_index_file());
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return cache_tree();
+
+	if (fstat(fd, &st))
+		return cache_tree();
+	size = st.st_size;
+	map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (map == MAP_FAILED)
+		return cache_tree();
+	it = cache_tree_read(sha1, map, size);
+	munmap(map, size);
+	if (!it)
+		return cache_tree();
+	return it;
+}
+
+int write_cache_tree(const unsigned char *sha1, struct cache_tree *root)
+{
+	char path[PATH_MAX];
+	unsigned long size = 0;
+	void *buf, *buffer;
+	int fd, ret = -1;
+
+	sprintf(path, "%s.aux", get_index_file());
+	if (!root) {
+		unlink(path);
+		return -1;
+	}
+	fd = open(path, O_WRONLY|O_CREAT, 0666);
+	if (fd < 0)
+		return -1;
+	buffer = buf = cache_tree_write(sha1, root, &size);
+	while (size) {
+		int written = xwrite(fd, buf, size);
+		if (written <= 0)
+			goto fail;
+		buf += written;
+		size -= written;
+	}
+	ret = 0;
+
+ fail:
+	close(fd);
+	free(buffer);
+	if (ret)
+		unlink(path);
+	return ret;
+}
diff --git a/cache-tree.h b/cache-tree.h
new file mode 100644
index 0000000..7b149af
--- /dev/null
+++ b/cache-tree.h
@@ -0,0 +1,29 @@
+#ifndef CACHE_TREE_H
+#define CACHE_TREE_H
+
+struct cache_tree;
+struct cache_tree_sub {
+	struct cache_tree *cache_tree;
+	int namelen;
+	int used;
+	char name[FLEX_ARRAY];
+};
+
+struct cache_tree {
+	int entry_count; /* negative means "invalid" */
+	unsigned char sha1[20];
+	int subtree_nr;
+	int subtree_alloc;
+	struct cache_tree_sub **down;
+};
+
+struct cache_tree *cache_tree(void);
+void cache_tree_free(struct cache_tree *);
+void cache_tree_invalidate_path(struct cache_tree *, const char *);
+
+int write_cache_tree(const unsigned char *, struct cache_tree *);
+struct cache_tree *read_cache_tree(unsigned char *);
+int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int);
+
+
+#endif
-- 
1.3.0.g623a

^ permalink raw reply related

* [PATCH 1/4] read-cache/write-cache: optionally return cache checksum SHA1.
From: Junio C Hamano @ 2006-04-23 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

read_cache_1() and write_cache_1() takes an extra parameter
*sha1 that returns the checksum of the index file when non-NULL.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * This is a low-impact small preparation for the following
   three, which is an interesting optimization.

 cache.h      |    5 ++++-
 read-cache.c |   35 +++++++++++++++++++++++++++--------
 2 files changed, 31 insertions(+), 9 deletions(-)

c2a742f3ac14e93704c917e4d9b08881d6032281
diff --git a/cache.h b/cache.h
index 69801b0..8c9947e 100644
--- a/cache.h
+++ b/cache.h
@@ -138,8 +138,11 @@ extern const char *prefix_filename(const
 #define alloc_nr(x) (((x)+16)*3/2)
 
 /* Initialize and use the cache information */
+extern int read_cache_1(unsigned char *);
+extern int write_cache_1(int, struct cache_entry **, int, unsigned char *);
 extern int read_cache(void);
-extern int write_cache(int newfd, struct cache_entry **cache, int entries);
+extern int write_cache(int, struct cache_entry **, int);
+
 extern int cache_name_pos(const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
diff --git a/read-cache.c b/read-cache.c
index f97f92d..50e094e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -496,10 +496,12 @@ int add_cache_entry(struct cache_entry *
 	return 0;
 }
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(struct cache_header *hdr, unsigned long size, unsigned char *sha1)
 {
 	SHA_CTX c;
-	unsigned char sha1[20];
+	unsigned char sha1_buf[20];
+	if (!sha1)
+		sha1 = sha1_buf;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
 		return error("bad signature");
@@ -513,7 +515,7 @@ static int verify_hdr(struct cache_heade
 	return 0;
 }
 
-int read_cache(void)
+int read_cache_1(unsigned char *cache_sha1)
 {
 	int fd, i;
 	struct stat st;
@@ -547,7 +549,7 @@ int read_cache(void)
 		die("index file mmap failed (%s)", strerror(errno));
 
 	hdr = map;
-	if (verify_hdr(hdr, size) < 0)
+	if (verify_hdr(hdr, size, cache_sha1) < 0)
 		goto unmap;
 
 	active_nr = ntohl(hdr->hdr_entries);
@@ -595,7 +597,7 @@ static int ce_write(SHA_CTX *context, in
  	return 0;
 }
 
-static int ce_flush(SHA_CTX *context, int fd)
+static int ce_flush(SHA_CTX *context, int fd, unsigned char *sha1)
 {
 	unsigned int left = write_buffer_len;
 
@@ -612,7 +614,8 @@ static int ce_flush(SHA_CTX *context, in
 	}
 
 	/* Append the SHA1 signature at the end */
-	SHA1_Final(write_buffer + left, context);
+	SHA1_Final(sha1, context);
+	memcpy(write_buffer + left, sha1, 20);
 	left += 20;
 	if (write(fd, write_buffer, left) != left)
 		return -1;
@@ -663,11 +666,14 @@ static void ce_smudge_racily_clean_entry
 	}
 }
 
-int write_cache(int newfd, struct cache_entry **cache, int entries)
+int write_cache_1(int newfd, struct cache_entry **cache, int entries,
+		  unsigned char *cache_sha1)
 {
 	SHA_CTX c;
 	struct cache_header hdr;
 	int i, removed;
+	int status;
+	unsigned char sha1[20];
 
 	for (i = removed = 0; i < entries; i++)
 		if (!cache[i]->ce_mode)
@@ -691,5 +697,18 @@ int write_cache(int newfd, struct cache_
 		if (ce_write(&c, newfd, ce, ce_size(ce)) < 0)
 			return -1;
 	}
-	return ce_flush(&c, newfd);
+	status = ce_flush(&c, newfd, sha1);
+	if (cache_sha1)
+		memcpy(cache_sha1, sha1, 20);
+	return status;
+}
+
+int read_cache(void)
+{
+	return read_cache_1(NULL);
+}
+
+int write_cache(int newfd, struct cache_entry **cache, int entries)
+{
+	return write_cache_1(newfd, cache, entries, NULL);
 }
-- 
1.3.0.g623a

^ permalink raw reply related

* Re: git-clone --reference problem?
From: Linus Torvalds @ 2006-04-23 18:40 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604231122490.3701@g5.osdl.org>

On Sun, 23 Apr 2006, Linus Torvalds wrote:
>
> I also forget whether we decided that non-absolute paths were 
> relative to the $reference directory, or to the $reference/objects/ 
> directory.

Just checked. It should be relative to $reference/objects, and we should 
strip out lines that start with '#', since we allow comments.

Blah. That whole shell-thing is cerainly convenient for prototyping, but 
it would be wonderful if we had some enterprising young soul that rewrote 
these things in C and made them built-ins. It may be wasteful to use C for 
this, but it would make portability easier, and we have all those nice 
routines for validating and building alternate db lists already that do it 
right.

Oh, well.

		Linus

^ permalink raw reply

* Re: git-clone --reference problem?
From: Linus Torvalds @ 2006-04-23 18:34 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git
In-Reply-To: <1145810080.16166.223.camel@shinybook.infradead.org>

On Sun, 23 Apr 2006, David Woodhouse wrote:
>
> Should I expect cloning with alternates using '--reference' to be
> transitive?

Well, certainly not now.

> Using '--reference clone1 --reference linux-2.6' for the second clone
> works OK, but surely it ought to work with just '--reference clone1'? 

Actually, I don't think using "--reference clone1 --reference linux-2.6" 
works OK at all - in the sense that it doesn't do what you _think_ it 
does.

It doesn't use two reference repos at all: it uses just one, which is the 
last one (ie linux-2.6).

Maybe something like this could work.

Untested, of course. It probably isn't even syntactically correct shell. 
Whatever. You get the idea.

(That while-loop could probably be simplified to just a 

  cat ""$reference/objects/info/alternates" >> "$GIT_DIR/objects/info/alternates"

because all alternates _should_ already be absolute paths, but hey, 
whatever. I also forget whether we decided that non-absolute paths were 
relative to the $reference directory, or to the $reference/objects/ 
directory. The while-loop should be fixed to match whatever that decision 
was (or just not accept anything but absolute paths: make it use a

	grep '^/' ..

instead of "cat"?)

		Linus

---
diff --git a/git-clone.sh b/git-clone.sh
index 0805168..df4c135 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -221,6 +221,13 @@ then
 		fi
 		reference=$(cd "$reference" && pwd)
 		echo "$reference/objects" >"$GIT_DIR/objects/info/alternates"
+		if test -s "$reference/objects/info/alternates"
+		then
+			while read alt
+			do
+				echo $(cd "$reference" && cd "$alt" && pwd) >> "$GIT_DIR/objects/info/alternates"
+			done < "$reference/objects/info/alternates"
+		fi
 		(cd "$reference" && tar cf - refs) |
 		(cd "$GIT_DIR/refs" &&
 		 mkdir reference-tmp &&

^ permalink raw reply related

* Re: git-clone --reference problem?
From: Junio C Hamano @ 2006-04-23 17:50 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git
In-Reply-To: <1145810080.16166.223.camel@shinybook.infradead.org>

David Woodhouse <dwmw2@infradead.org> writes:

> Should I expect cloning with alternates using '--reference' to be
> transitive?

Not with the current code and design, but a patch that cleanly
does that can be considered for inclusion.  Adjusting relative
paths obtained from other repositories correctly, and avoiding
circular alternates by mistake would be tricky, though.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox