Git development
 help / color / mirror / Atom feed
* Re: [PATCH] write-tree performance problems
From: Chris Mason @ 2005-04-20 16:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504200833580.6467@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

On Wednesday 20 April 2005 11:40, Linus Torvalds wrote:
> On Wed, 20 Apr 2005, Chris Mason wrote:
> > Thanks for looking at this.  Your new tree is faster, it gets the commit
> > 100 patches time down from 1m5s to 50s.
>
> It really _shouldn't_ be faster. It still does the compression, and throws
> the end result away.

Well, that's a little odd.  I had thought about making sure you did this 
change and forgotten.  1 minute benchmarks are a horrible idea since they run 
into noise with cache writebacks.  I should know better...

At any rate, the time for a single write-tree is pretty consistent.  Before it 
was around .5 seconds, and with this change it goes down to .128s.  My patch 
was .024.

The 100 patch time is down to 32s (3 run average).  This is close enough that 
I don't think my patch is worth it if no other part of git can benefit from 
having trees in the index.

>
> To actually go faster, it _should_ need this patch. Untested. See if it
> works..

Thanks. This one missed the filling in the returnsha1.  New patch attached.

-chris

[-- Attachment #2: file-check.diff --]
[-- Type: text/x-diff, Size: 1059 bytes --]

diff -u linus.back/sha1_file.c linus/sha1_file.c
--- linus.back/sha1_file.c	2005-04-20 12:31:00.240181016 -0400
+++ linus/sha1_file.c	2005-04-20 12:13:56.339837528 -0400
@@ -173,12 +173,27 @@
 	z_stream stream;
 	unsigned char sha1[20];
 	SHA_CTX c;
+	char *filename;
+	int fd;
 
 	/* Sha1.. */
 	SHA1_Init(&c);
 	SHA1_Update(&c, buf, len);
 	SHA1_Final(sha1, &c);
 
+	filename = sha1_file_name(sha1);
+	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	if (fd < 0) {
+		if (errno != EEXIST)
+			return -1;
+
+		/*
+		 * We might do collision checking here, but we'd need to
+		 * uncompress the old file and check it. Later.
+		 */
+		goto out;
+	}
+
 	/* Set it up */
 	memset(&stream, 0, sizeof(stream));
 	deflateInit(&stream, Z_BEST_COMPRESSION);
@@ -195,8 +210,10 @@
 	deflateEnd(&stream);
 	size = stream.total_out;
 
-	if (write_sha1_buffer(sha1, compressed, size) < 0)
-		return -1;
+	if (write(fd, compressed, size) != size)
+		die("unable to write file");
+	close(fd);
+out:		
 	if (returnsha1)
 		memcpy(returnsha1, sha1, 20);
 	return 0;

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Martin Uecker @ 2005-04-20 16:33 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Martin Uecker
In-Reply-To: <20050420155734.GA13575@macavity>

[-- Attachment #1: Type: text/plain, Size: 1002 bytes --]

On Wed, Apr 20, 2005 at 05:57:34PM +0200, Martin Uecker wrote:
> On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote:
> 
> > Yes, I guess this is the detail I was going to abandon. =)
> > 
> > I viewed the fact that the top-level hash was dependent on the exact chunk 
> > makeup a 'misfeature', because it doesn't allow easy interoperability with 
> > existing non-chunked repos.
> 
> I thought this as a misfeature too before I realized how
> many advantages this has.

To make it more clear: Ofcourse it is a bug if the
hash depends on unimportant implementation details.

But a hash which is calculated recusively from
subhashes is a lot more usefull than a hash
which can only be calculated from the entire data
at once. And if this hash can be recalculated
cheaply from subhashes even if some data was
inserted somewhere this is an even more usefull
thing.

Martin

-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: Linus Torvalds @ 2005-04-20 16:33 UTC (permalink / raw)
  To: Chris Mason; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504200833580.6467@ppc970.osdl.org>



On Wed, 20 Apr 2005, Linus Torvalds wrote:
> 
> To actually go faster, it _should_ need this patch. Untested. See if it 
> works..

NO! Don't see if this works. For the "sha1 file already exists" file, it 
forgot to return the SHA1 value in "returnsha1", and would thus corrupt 
the trees it wrote.

So don't apply, don't test. You won't corrupt your archive (you'll just
write bogus tree objects), but if you commit the bogus trees you're going
to be in a world of hurt and will have to undo everything you did.

It's a good test for "fsck" though. It core-dumps because it tries to add 
references to NULL objects.

		Linus

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: Linus Torvalds @ 2005-04-20 16:21 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: git
In-Reply-To: <Pine.LNX.4.61.0504201147280.2630@cag.csail.mit.edu>



On Wed, 20 Apr 2005, C. Scott Ananian wrote:
> 
> OK, sure.  But how 'bout chunking trees?  Are you grown happy with the new 
> trees-reference-other-trees paradigm, or is there a deep longing in your 
> heart for the simplicity of 'trees-reference-blobs-period'?

I'm pretty sure we do better chunking on a subdirectory basis, especially 
as it allows us to do various optimizations (avoid diffing common parts).

Yes, you could try to do the same optimizations with chunking, but then 
you'd need to make sure that the chunking was always on a full tree entry 
boundary etc - ie much harder than blob chunking. 

But hey, numbers talk, bullshit walks. 

		Linus

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: David Willmore @ 2005-04-20 16:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Mason, git
In-Reply-To: <Pine.LNX.4.58.0504200833580.6467@ppc970.osdl.org>

On 4/20/05, Linus Torvalds <torvalds@osdl.org> wrote:
> It really _shouldn't_ be faster. It still does the compression, and throws
> the end result away.

Am I misunderstanding or is the proglem that doing:
<file with unknown status> -> compress -> sha1 -> compare with existing hash

is expensive?

What about doing:
<file it's supposed to be equal to> -> uncompress -> compare with
unknown status file

It's more file I/O, but the uncompress is much cheaper than the compress.

On a second issue, what's the format of the main 'index' file?  Is it:
<pathspec> <sha1hash>
<pathspec> <sha1hash> 
?
If so, that's not going to compress well.  A file like:
<pathspec1>
<pathspec2>

<sha1hash1>
<sha1hash2>

Will compress better.

Stop me if I'm way off base--I'm just following the mailing list, I
haven't tried out the code.

Cheers,
David

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Martin Uecker @ 2005-04-20 15:57 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.LNX.4.61.0504201121490.2630@cag.csail.mit.edu>

[-- Attachment #1: Type: text/plain, Size: 2419 bytes --]

On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote:

Hi,
 
> A merkle-tree (which I think you initially pointed me at) makes the hash 
> of the internal nodes be a hash of the chunk's hashes; ie not a straight 
> content hash.  This is roughly what my current implementation does, but
> I would like to identify each subtree with the hash of the 
> *(expanded) contents of that subtree* (ie no explicit reference to 
> subtree hashes).  This makes it interoperable with non-chunked or 
> differently-chunked representations, in that the top-level hash is *just 
> the hash of the complete content*, not some hash-of-subtree-hashes.  Does 
> that make more sense?

Yes, thank you. But I would like to argue against this:

You can make the representations interoperable
if you calculate the hash for the non-chunked
representations exactly as if this file is stored
chunked but simple do not store it in that way.

Of course this is not backward compatible to the
monolithic hash and not compatible with a differently
chunked representation (but you could store subtrees
unchunked if you think your chunks are too small).

> The code I posted doesn't demonstrate this very well, but now that Linus 
> has abandoned the 'hash of compressed content' stuff, my next code posting 
> should show this more clearly.

I think the hash of the treap piece should be calculated
from the hash of the prefix and suffix tree and the already
calculated hash of the uncompressed data. This makes hashing
nearly as cheap as in Linus version which is important
because checking whether a given file has identically
content as a stored version should be fast.

> >If I don't miss anything essential, you can validate
> >each treap piece at the moment you get it from the
> >network with its SHA1 hash and then proceed with
> >downloading the prefix and suffix tree (in parallel
> >if you have more than one peer a la bittorrent).
> 
> Yes, I guess this is the detail I was going to abandon. =)
> 
> I viewed the fact that the top-level hash was dependent on the exact chunk 
> makeup a 'misfeature', because it doesn't allow easy interoperability with 
> existing non-chunked repos.

I thought this as a misfeature too before I realized how
many advantages this has.

Martin
 

-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: enforcing DB immutability
From: Erik Mouw @ 2005-04-20 15:57 UTC (permalink / raw)
  To: linux; +Cc: git, linux-kernel, mingo
In-Reply-To: <20050420084115.2699.qmail@science.horizon.com>

On Wed, Apr 20, 2005 at 08:41:15AM -0000, linux@horizon.com wrote:
> [A discussion on the git list about how to provide a hardlinked file
> that *cannot* me modified by an editor, but must be replaced by
> a new copy.]

Some time ago there was somebody working on copy-on-write links: once
you modify a cow-linked file, the file contents are copied, the file is
unlinked and you can safely work on the new file. It has some horrible
semantics in that the inode number of the opened file changes, I don't
know if applications are or should be aware of that.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: C. Scott Ananian @ 2005-04-20 15:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504200840240.6467@ppc970.osdl.org>

On Wed, 20 Apr 2005, Linus Torvalds wrote:

>> I was considering using a chunked representation for *all* files (not just
>> blobs), which would avoid the original 'trees must reference other trees
>> or they become too large' issue -- and maybe the performance issue you're
>> referring to, as well?
> No. The most common index file operation is reading, and that's the one
> that has to be _fast_. And it is - it's a single "mmap" and some parsing.

OK, sure.  But how 'bout chunking trees?  Are you grown happy with the new 
trees-reference-other-trees paradigm, or is there a deep longing in your 
heart for the simplicity of 'trees-reference-blobs-period'?  I'm fairly
certain that chunking could get you the space-savings you need without 
multi-level trees, if the simplicity of that is still appealing.

Not necessarily for rev.1 of the chunking code, but I'm curious as to 
whether it's still of interest at all.  I don't know exactly how far
ingrained multilevel trees have become since they were adopted.
  --scott

Japan explosion BLUEBIRD Honduras jihad D5 SLBM Diplomat overthrow 
JMTIDE CABOUNCE AMTHUG ESODIC Kennedy AVBRANDY CLOWER mail drop PHOENIX
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: Linus Torvalds @ 2005-04-20 15:46 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: Chris Mason, git
In-Reply-To: <Pine.LNX.4.61.0504201128550.2630@cag.csail.mit.edu>



On Wed, 20 Apr 2005, C. Scott Ananian wrote:
> 
> Hmm.  Are our index files too large, or is there some other factor?

They _are_ pretty large, but they have to be,

For the kernel, the index file is about 1.6MB. That's 

 - 17,000+ files and filenames
 - stat information for all of them
 - the sha1 for them all

ie for the kernel it averages to 93.5 bytes per file. Which is actually 
pretty dense (just the sha1 and stat information is about half of it, and 
those are required).

> I was considering using a chunked representation for *all* files (not just 
> blobs), which would avoid the original 'trees must reference other trees 
> or they become too large' issue -- and maybe the performance issue you're 
> referring to, as well?

No. The most common index file operation is reading, and that's the one 
that has to be _fast_. And it is - it's a single "mmap" and some parsing.

In fact, writing it is pretty fast too, exactly because the index file is 
totally linear and isn't compressed or anything fancy like that. It's a 
_lot_ faster than the "tree objects", exactly because it doesn't need to 
be as careful.

The main cost of the index file is probably the fact that I add a sha1 
signature of the file into itself to verify that it's ok. The advantage is 
that the signature means that the file is ok, and the parsing of it can be 
much more relaxed. You win some, you lose some.

		Linus

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: Linus Torvalds @ 2005-04-20 15:40 UTC (permalink / raw)
  To: Chris Mason; +Cc: git
In-Reply-To: <200504201122.35448.mason@suse.com>



On Wed, 20 Apr 2005, Chris Mason wrote:
> 
> Thanks for looking at this.  Your new tree is faster, it gets the commit 100 
> patches time down from 1m5s to 50s.

It really _shouldn't_ be faster. It still does the compression, and throws
the end result away.

To actually go faster, it _should_ need this patch. Untested. See if it 
works..

		Linus
---
sha1_file.c: 40c00b77d0e52b31dda1696f10026fe6f92bc082
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -173,12 +173,27 @@ int write_sha1_file(char *buf, unsigned 
 	z_stream stream;
 	unsigned char sha1[20];
 	SHA_CTX c;
+	char *filename;
+	int fd;
 
 	/* Sha1.. */
 	SHA1_Init(&c);
 	SHA1_Update(&c, buf, len);
 	SHA1_Final(sha1, &c);
 
+	filename = sha1_file_name(sha1);
+	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	if (fd < 0) {
+		if (errno != EEXIST)
+			return -1;
+
+		/*
+		 * We might do collision checking here, but we'd need to
+		 * uncompress the old file and check it. Later.
+		 */
+		return 0;
+	}
+
 	/* Set it up */
 	memset(&stream, 0, sizeof(stream));
 	deflateInit(&stream, Z_BEST_COMPRESSION);
@@ -195,8 +210,10 @@ int write_sha1_file(char *buf, unsigned 
 	deflateEnd(&stream);
 	size = stream.total_out;
 
-	if (write_sha1_buffer(sha1, compressed, size) < 0)
-		return -1;
+	if (write(fd, compressed, size) != size)
+		die("unable to write file");
+	close(fd);
+		
 	if (returnsha1)
 		memcpy(returnsha1, sha1, 20);
 	return 0;

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: C. Scott Ananian @ 2005-04-20 15:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git
In-Reply-To: <200504201122.35448.mason@suse.com>

On Wed, 20 Apr 2005, Chris Mason wrote:

> With the basic changes I described before, the  100 patch time only goes down
> to 40s.  Certainly not fast enough to justify the changes.  In this case, the
> bulk of the extra time comes from write-tree writing the index file, so I
> split write-tree.c up into libwrite-tree.c, and created update-cache
> --write-tree.

Hmm.  Are our index files too large, or is there some other factor?
I was considering using a chunked representation for *all* files (not just 
blobs), which would avoid the original 'trees must reference other trees 
or they become too large' issue -- and maybe the performance issue you're 
referring to, as well?
  --scott

Boston MI6 quiche LPMEDLEY BLUEBIRD PBSUCCESS jihad biowarfare non-violent protest 
Yakima NRA EZLN DES hack SARANAC KMPLEBE Echelon PBCABOOSE security
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: [PATCH 1/4] Accept commit in some places when tree is needed.
From: Linus Torvalds @ 2005-04-20 15:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vis2ikmj0.fsf@assigned-by-dhcp.cox.net>



On Tue, 19 Apr 2005, Junio C Hamano wrote:
> 
> This patch lifts the tree-from-tree-or-commit logic from
> diff-cache.c and moves it to sha1_file.c, which is a common
> library source for the SHA1 storage part.

I don't think that's a good interface. It changes the sha1 passed into it: 
that may actually be nice, since you may want to know what it changed to, 
but I think you'd want to have that as an (optional) separate 
"sha1_result" parameter. 

Also, the "type" or "size" things make no sense to have as a parameter 
at all.

IOW, it was fine when it was an internal hacky thing in diff-cache, but 
once it's promoted to be a real library function it should definitely be 
cleaned up to have sane interfaces that make sense in general, and not 
just within the original context.

		Linus

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: C. Scott Ananian @ 2005-04-20 15:28 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Git Mailing List
In-Reply-To: <20050420151902.GA13175@macavity>

On Wed, 20 Apr 2005, Martin Uecker wrote:

>> You can (and my code demonstrates/will demonstrate) still use a whole-file
>> hash to use chunking.  With content prefixes, this takes O(N ln M) time
>> (where N is the file size and M is the number of chunks) to compute all
>> hashes; if subtrees can share the same prefix, then you can do this in
>> O(N) time (ie, as fast as possible, modulo a constant factor, which is
>> '2').  You don't *need* internal hashing functions.
>
> I don't understand this paragraph. What is an internal
> hash function? Your code seems to do exactly what I want.
> The hashes are computed recusively as in a hash tree
> with O(N ln N). The only difference between your design
> and a design based on a conventional (binary) hash tree
> seems to be that data is stored in the intermediate nodes
> too.

A merkle-tree (which I think you initially pointed me at) makes the hash 
of the internal nodes be a hash of the chunk's hashes; ie not a straight 
content hash.  This is roughly what my current implementation does, but
I would like to identify each subtree with the hash of the 
*(expanded) contents of that subtree* (ie no explicit reference to 
subtree hashes).  This makes it interoperable with non-chunked or 
differently-chunked representations, in that the top-level hash is *just 
the hash of the complete content*, not some hash-of-subtree-hashes.  Does 
that make more sense?

The code I posted doesn't demonstrate this very well, but now that Linus 
has abandoned the 'hash of compressed content' stuff, my next code posting 
should show this more clearly.

> If I don't miss anything essential, you can validate
> each treap piece at the moment you get it from the
> network with its SHA1 hash and then proceed with
> downloading the prefix and suffix tree (in parallel
> if you have more than one peer a la bittorrent).

Yes, I guess this is the detail I was going to abandon. =)

I viewed the fact that the top-level hash was dependent on the exact chunk 
makeup a 'misfeature', because it doesn't allow easy interoperability with 
existing non-chunked repos.
  --scott

WTO atomic operation Mossad Castro overthrow FSF fissionable HTAUTOMAT 
LCPANES MKDELTA Bush non-violent protest OVER THE HORIZON RADAR KUPALM
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: [PATCH] write-tree performance problems
From: Chris Mason @ 2005-04-20 15:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504192337120.6467@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On Wednesday 20 April 2005 02:43, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Chris Mason wrote:
> > I'll finish off the patch once you ok the basics below.  My current code
> > works like this:
>
> Chris, before you do anything further, let me re-consider.
>
> Assuming that the real cost of write-tree is the compression (and I think
> it is), I really suspect that this ends up being the death-knell to my
> "use the sha1 of the _compressed_ object" approach. 

Thanks for looking at this.  Your new tree is faster, it gets the commit 100 
patches time down from 1m5s to 50s.  I've attached my patch from last night, 
which is mostly a rough guess of the changes we would need, I haven't 
validated or cleaned things up.

With the basic changes I described before, the  100 patch time only goes down 
to 40s.  Certainly not fast enough to justify the changes.  In this case, the 
bulk of the extra time comes from write-tree writing the index file, so I 
split write-tree.c up into libwrite-tree.c, and created update-cache 
--write-tree.

This gets our time back down to 21s.

The attached patch is not against your latest revs.  After updating I would 
need to sprinkle a few S_ISDIR checks into diff-cache.c and checkout-cache.c, 
but the changes should be small.

-chris

[-- Attachment #2: fast-dirs-3.diff --]
[-- Type: text/x-diff, Size: 15962 bytes --]

Index: Makefile
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/Makefile  (mode:100644 sha1:6a04941a337ec50da06cf4cf52aa58f3b1435776)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/Makefile  (mode:100644 sha1:2ba6d49196e8a2335cfcd77ec0dbe9cda3e402dd)
@@ -29,7 +29,7 @@
 
 VERSION= VERSION
 
-LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o
+LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o libwrite-tree.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h
 
Index: cache.h
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/cache.h  (mode:100644 sha1:c182ea0c5c1def37d899f9a05f8884ebe17c9d92)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/cache.h  (mode:100644 sha1:0882b713222b71e67c9dab5d58ab6f15c3c49ed6)
@@ -74,7 +74,7 @@
 #define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
 
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
-#define create_ce_mode(mode) htonl(S_IFREG | ce_permissions(mode))
+#define create_ce_mode(mode) htonl((mode & (S_IFREG|S_IFDIR)) | ce_permissions(mode))
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
 
Index: libwrite-tree.c
===================================================================
--- /dev/null  (tree:dbeacafeb442bcfd39dfdc90c360d47d4215c185)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/libwrite-tree.c  (mode:100644 sha1:52202930d02b3721f5a388ae1178c5a4d99ec1b4)
@@ -0,0 +1,174 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+#include "cache.h"
+
+struct new_ce {
+	struct new_ce *next;
+	struct cache_entry ce;
+};
+
+static struct new_ce *add_list = NULL;
+
+static int check_valid_sha1(unsigned char *sha1)
+{
+	char *filename = sha1_file_name(sha1);
+	int ret;
+
+	/* If we were anal, we'd check that the sha1 of the contents actually matches */
+	ret = access(filename, R_OK);
+	if (ret)
+		perror(filename);
+	return ret;
+}
+
+static int prepend_integer(char *buffer, unsigned val, int i)
+{
+	buffer[--i] = '\0';
+	do {
+		buffer[--i] = '0' + (val % 10);
+		val /= 10;
+	} while (val);
+	return i;
+}
+
+#define ORIG_OFFSET (40)	/* Enough space to add the header of "tree <size>\0" */
+
+static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
+{
+	unsigned char subdir_sha1[20];
+	unsigned long size, offset;
+	char *buffer;
+	int i, nr;
+
+	/* Guess at some random initial size */
+	size = 8192;
+	buffer = malloc(size);
+	offset = ORIG_OFFSET;
+
+	nr = 0;
+	do {
+		struct cache_entry *ce = cachep[nr];
+		const char *pathname = ce->name, *filename, *dirname;
+		int pathlen = ce_namelen(ce), entrylen;
+		unsigned char *sha1;
+		unsigned int mode;
+
+		/* Did we hit the end of the directory? Return how many we wrote */
+		if (baselen >= pathlen || memcmp(base, pathname, baselen))
+			break;
+
+		sha1 = ce->sha1;
+		mode = ntohl(ce->ce_mode);
+
+		/* Do we have _further_ subdirectories? */
+		filename = pathname + baselen;
+		dirname = strchr(filename, '/');
+		if (dirname) {
+			int subdir_written;
+			int len = dirname - pathname;
+			unsigned int size = cache_entry_size(len);
+			struct new_ce *new_ce = malloc(size + sizeof(struct new_ce *));
+			struct cache_entry *c = &new_ce->ce;
+			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
+			nr += subdir_written - 1;
+
+			/* Now we need to write out the directory entry into this tree.. */
+			mode = S_IFDIR;
+			pathlen = dirname - pathname;
+
+			sha1 = subdir_sha1;
+
+			memset(c, 0, size);
+
+			/* create a new cache entry for what we just calculated.
+			 * place the new entry on a list for adding later so we
+			 * don't change the size of the cache right now.
+			 */
+			c->ce_mode = create_ce_mode(mode);
+			c->ce_flags = create_ce_flags(len, 0);
+			memcpy(c->name, pathname, len);
+			c->name[len] = '\0';
+			memcpy(c->sha1, sha1, 20);
+			new_ce->next = add_list;
+			add_list = new_ce;
+		} else if (mode & S_IFDIR) {
+			/* eat all the entries below this directory */
+			while(++nr < maxentries) {
+				struct cache_entry *c = cachep[nr];
+				
+				if (strlen(c->name) < pathlen)
+					break;
+				if (memcmp(c->name, pathname, pathlen) ||
+				    c->name[pathlen] != '/')
+					break;
+			}
+			/* our loop went too far by 1 */
+			nr--;
+			mode = S_IFDIR;
+		}
+
+		if (check_valid_sha1(sha1) < 0)
+			exit(1);
+
+		entrylen = pathlen - baselen;
+		if (offset + entrylen + 100 > size) {
+			size = alloc_nr(offset + entrylen + 100);
+			buffer = realloc(buffer, size);
+		}
+		offset += sprintf(buffer + offset, "%o %.*s", mode, entrylen, filename);
+		buffer[offset++] = 0;
+		memcpy(buffer + offset, sha1, 20);
+		offset += 20;
+		nr++;
+	} while (nr < maxentries);
+
+	i = prepend_integer(buffer, offset - ORIG_OFFSET, ORIG_OFFSET);
+	i -= 5;
+	memcpy(buffer+i, "tree ", 5);
+
+	write_sha1_file(buffer + i, offset - i, returnsha1);
+	free(buffer);
+	return nr;
+}
+
+void write_full_tree(int entries) {
+	unsigned char sha1[20];
+	int i, unmerged;
+
+	if (entries <= 0)
+		die("write-tree: no cache contents to write");
+
+	/* Verify that the tree is merged */
+	unmerged = 0;
+	for (i = 0; i < entries; i++) {
+		struct cache_entry *ce = active_cache[i];
+		if (ntohs(ce->ce_flags) & ~CE_NAMEMASK) {
+			if (++unmerged > 10) {
+				fprintf(stderr, "...\n");
+				break;
+			}
+			fprintf(stderr, "%s: unmerged (%s)\n", ce->name, sha1_to_hex(ce->sha1));
+		}
+	}
+	if (unmerged)
+		die("write-tree: not able to write tree");
+
+	/* Ok, write it out */
+	if (write_tree(active_cache, entries, "", 0, sha1) != entries)
+		die("write-tree: internal error");
+	printf("%s\n", sha1_to_hex(sha1));
+	if (add_list) {
+		struct new_ce *nc = add_list;
+		while(nc) {
+			add_cache_entry(&nc->ce, 1);
+			nc = nc->next;
+		}
+	}
+}
+
+int write_tree_updated_cache(void) {
+	return add_list != NULL;
+}
Index: merge-cache.c
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/merge-cache.c  (mode:100644 sha1:96c86c26d06837bf604a70caf9dd2133884a63bc)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/merge-cache.c  (mode:100644 sha1:1b45bfc8cf5b4c610a4b149f3a4295081bc0e00f)
@@ -63,8 +63,16 @@
 	 * If it already exists in the cache as stage0, it's
 	 * already merged and there is nothing to do.
 	 */
-	if (pos < 0)
-		merge_entry(-pos-1, path);
+	if (pos < 0) {
+		pos = -pos-1;
+		while(pos > 0) {
+			int mode = htonl(active_cache[pos]->ce_mode);
+			if (S_ISREG(mode))
+				break;
+			pos--;
+		}
+		merge_entry(pos, path);
+	}
 }
 
 static void merge_all(void)
@@ -74,7 +82,8 @@
 		struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
 			continue;
-		i += merge_entry(i, ce->name)-1;
+		if (S_ISREG(htonl(ce->ce_mode)))
+			i += merge_entry(i, ce->name)-1;
 	}
 }
 
Index: read-cache.c
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/read-cache.c  (mode:100644 sha1:17d4d2284e79d3f1070b51200d797115f2d09d6a)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/read-cache.c  (mode:100644 sha1:d2fc8e35f5ef602f7baa1a4c83c74dbf373fc3e0)
@@ -96,11 +96,30 @@
 	return 1;
 }
 
+static void invalidate_trees(char *path) {
+	char *p;
+	int len = strlen(path);
+	int pos;
+	extern void *memrchr(__const void *, int, size_t);
+	while(1) {
+		p = memrchr(path, '/', len);
+		if (!p || p == path)
+			return;
+		len = p-path;
+		pos = cache_name_pos(path, len);
+		if (pos < 0)
+			return;
+		remove_entry_at(pos);
+	}	
+}
+
 int remove_file_from_cache(char *path)
 {
 	int pos = cache_name_pos(path, strlen(path));
-	if (pos >= 0)
+	if (pos >= 0) {
 		remove_entry_at(pos);
+		invalidate_trees(path);
+	}
 	return 0;
 }
 
@@ -113,12 +132,16 @@
 int add_cache_entry(struct cache_entry *ce, int ok_to_add)
 {
 	int pos;
+	int invalidate = S_ISREG(htonl(ce->ce_mode));
 
 	pos = cache_name_pos(ce->name, htons(ce->ce_flags));
 
 	/* existing match? Just replace it */
 	if (pos >= 0) {
 		active_cache[pos] = ce;
+		if (invalidate) {
+			invalidate_trees(ce->name);
+		}
 		return 0;
 	}
 	pos = -pos-1;
@@ -149,6 +172,9 @@
 	if (active_nr > pos)
 		memmove(active_cache + pos + 1, active_cache + pos, (active_nr - pos - 1) * sizeof(ce));
 	active_cache[pos] = ce;
+	if (invalidate) {
+		invalidate_trees(ce->name);
+	}
 	return 0;
 }
 
Index: read-tree.c
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/read-tree.c  (mode:100644 sha1:a573a3155e532081a8be0dab60f1ec35ea159ddf)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/read-tree.c  (mode:100644 sha1:fa497650937a62caadd043897032cf5f9e07dea2)
@@ -63,7 +63,6 @@
 				free(buffer);
 				return -1;
 			}
-			continue;
 		}
 		if (read_one_entry(sha1, base, baselen, path, mode) < 0) {
 			free(buffer);
Index: show-diff.c
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/show-diff.c  (mode:100644 sha1:007dabd2978de4c58f49050d3969ca353278dbb6)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/show-diff.c  (mode:100644 sha1:a7b0a0bca00c173f591a0d8ef0dfbcbdd96ef8a9)
@@ -163,6 +163,8 @@
 		if (1 < argc &&
 		    ! matches_pathspec(ce, argv+1, argc-1))
 			continue;
+		if (S_ISDIR(htonl(ce->ce_mode)))
+			continue;
 		matched++;
 
 		if (ce_stage(ce)) {
Index: update-cache.c
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/update-cache.c  (mode:100644 sha1:11388582a830a6161d1c769aa8616bed6f593b8a)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/update-cache.c  (mode:100644 sha1:cab4e8e1fa7aceff287cfb3464710b1dd52f3a5f)
@@ -4,6 +4,7 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include "cache.h"
+#include "write-tree.h"
 
 /*
  * Default to not allowing changes to the list of files. The
@@ -12,7 +13,7 @@
  * like "update-cache *" and suddenly having all the object
  * files be revision controlled.
  */
-static int allow_add = 0, allow_remove = 0;
+static int allow_add = 0, allow_remove = 0, write_tree = 0;
 
 static int index_fd(unsigned char *sha1, int fd, struct stat *st)
 {
@@ -182,7 +183,6 @@
 
 	if (stat(ce->name, &st) < 0)
 		return NULL;
-
 	changed = cache_match_stat(ce, &st);
 	if (!changed)
 		return ce;
@@ -191,12 +191,11 @@
 	 * If the mode has changed, there's no point in trying
 	 * to refresh the entry - it's not going to match
 	 */
-	if (changed & MODE_CHANGED)
+	if (changed & MODE_CHANGED) {
 		return NULL;
-
+	}
 	if (compare_data(ce, st.st_size))
 		return NULL;
-
 	size = ce_size(ce);
 	updated = malloc(size);
 	memcpy(updated, ce, size);
@@ -222,7 +221,8 @@
 
 		new = refresh_entry(ce);
 		if (!new) {
-			printf("%s: needs update\n", ce->name);
+			if (S_ISREG(ntohl(ce->ce_mode)))
+				printf("%s: needs update\n", ce->name);
 			continue;
 		}
 		/* You can NOT just free active_cache[i] here, since it
@@ -336,6 +336,10 @@
 				i += 3;
 				continue;
 			}
+			if (!strcmp(path, "--write-tree")) {
+				write_tree = 1;
+				continue;
+			}
 			die("unknown option %s", path);
 		}
 		if (!verify_path(path)) {
@@ -345,6 +349,9 @@
 		if (add_file_to_cache(path))
 			die("Unable to add %s to database", path);
 	}
+	if (write_tree) {
+		write_full_tree(active_nr);
+	}
 	if (write_cache(newfd, active_cache, active_nr) ||
 	    rename(".git/index.lock", ".git/index"))
 		die("Unable to write new cachefile");
Index: write-tree.c
===================================================================
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/write-tree.c  (mode:100644 sha1:fb046aa6ce6b9fce6a523a1e36ff43adab9bdd93)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/write-tree.c  (mode:100644 sha1:c3d62242f0d5fff0c245cd46ac8a5b72d4aef4cd)
@@ -4,127 +4,28 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include "cache.h"
-
-static int check_valid_sha1(unsigned char *sha1)
-{
-	char *filename = sha1_file_name(sha1);
-	int ret;
-
-	/* If we were anal, we'd check that the sha1 of the contents actually matches */
-	ret = access(filename, R_OK);
-	if (ret)
-		perror(filename);
-	return ret;
-}
-
-static int prepend_integer(char *buffer, unsigned val, int i)
-{
-	buffer[--i] = '\0';
-	do {
-		buffer[--i] = '0' + (val % 10);
-		val /= 10;
-	} while (val);
-	return i;
-}
-
-#define ORIG_OFFSET (40)	/* Enough space to add the header of "tree <size>\0" */
-
-static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
-{
-	unsigned char subdir_sha1[20];
-	unsigned long size, offset;
-	char *buffer;
-	int i, nr;
-
-	/* Guess at some random initial size */
-	size = 8192;
-	buffer = malloc(size);
-	offset = ORIG_OFFSET;
-
-	nr = 0;
-	do {
-		struct cache_entry *ce = cachep[nr];
-		const char *pathname = ce->name, *filename, *dirname;
-		int pathlen = ce_namelen(ce), entrylen;
-		unsigned char *sha1;
-		unsigned int mode;
-
-		/* Did we hit the end of the directory? Return how many we wrote */
-		if (baselen >= pathlen || memcmp(base, pathname, baselen))
-			break;
-
-		sha1 = ce->sha1;
-		mode = ntohl(ce->ce_mode);
-
-		/* Do we have _further_ subdirectories? */
-		filename = pathname + baselen;
-		dirname = strchr(filename, '/');
-		if (dirname) {
-			int subdir_written;
-
-			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
-			nr += subdir_written;
-
-			/* Now we need to write out the directory entry into this tree.. */
-			mode = S_IFDIR;
-			pathlen = dirname - pathname;
-
-			/* ..but the directory entry doesn't count towards the total count */
-			nr--;
-			sha1 = subdir_sha1;
-		}
-
-		if (check_valid_sha1(sha1) < 0)
-			exit(1);
-
-		entrylen = pathlen - baselen;
-		if (offset + entrylen + 100 > size) {
-			size = alloc_nr(offset + entrylen + 100);
-			buffer = realloc(buffer, size);
-		}
-		offset += sprintf(buffer + offset, "%o %.*s", mode, entrylen, filename);
-		buffer[offset++] = 0;
-		memcpy(buffer + offset, sha1, 20);
-		offset += 20;
-		nr++;
-	} while (nr < maxentries);
-
-	i = prepend_integer(buffer, offset - ORIG_OFFSET, ORIG_OFFSET);
-	i -= 5;
-	memcpy(buffer+i, "tree ", 5);
-
-	write_sha1_file(buffer + i, offset - i, returnsha1);
-	free(buffer);
-	return nr;
-}
+#include "write-tree.h"
 
 int main(int argc, char **argv)
 {
-	int i, unmerged;
 	int entries = read_cache();
-	unsigned char sha1[20];
+	int newfd;
+	newfd = open(".git/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
+	if (newfd < 0)
+		die("unable to create new cachefile");
+
 
 	if (entries <= 0)
 		die("write-tree: no cache contents to write");
 
-	/* Verify that the tree is merged */
-	unmerged = 0;
-	for (i = 0; i < entries; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (ntohs(ce->ce_flags) & ~CE_NAMEMASK) {
-			if (++unmerged > 10) {
-				fprintf(stderr, "...\n");
-				break;
-			}
-			fprintf(stderr, "%s: unmerged (%s)\n", ce->name, sha1_to_hex(ce->sha1));
-		}
+	write_full_tree(entries);
+	if (write_tree_updated_cache()) {
+		if (write_cache(newfd, active_cache, active_nr) ||
+		    rename(".git/index.lock", ".git/index"))
+			die("Unable to write new cachefile");
+	} else {
+		close(newfd);
+		unlink(".git/index.lock");
 	}
-	if (unmerged)
-		die("write-tree: not able to write tree");
-
-	/* Ok, write it out */
-	if (write_tree(active_cache, entries, "", 0, sha1) != entries)
-		die("write-tree: internal error");
-	printf("%s\n", sha1_to_hex(sha1));
 	return 0;
 }
Index: write-tree.h
===================================================================
--- /dev/null  (tree:dbeacafeb442bcfd39dfdc90c360d47d4215c185)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/write-tree.h  (mode:100644 sha1:0ad5fe36126577e56544e08e0f4dfa766350e841)
@@ -0,0 +1,3 @@
+
+void write_full_tree(int);
+int write_tree_updated_cache(void);

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Martin Uecker @ 2005-04-20 15:19 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.LNX.4.61.0504201025030.2630@cag.csail.mit.edu>

[-- Attachment #1: Type: text/plain, Size: 2752 bytes --]

On Wed, Apr 20, 2005 at 10:30:15AM -0400, C. Scott Ananian wrote:

Hi,

your code looks pretty cool. thank you!

> On Wed, 20 Apr 2005, Martin Uecker wrote:
> 
> >The other thing I don't like is the use of a sha1
> >for a complete file. Switching to some kind of hash
> >tree would allow to introduce chunks later. This has
> >two advantages:
> 
> You can (and my code demonstrates/will demonstrate) still use a whole-file 
> hash to use chunking.  With content prefixes, this takes O(N ln M) time 
> (where N is the file size and M is the number of chunks) to compute all 
> hashes; if subtrees can share the same prefix, then you can do this in 
> O(N) time (ie, as fast as possible, modulo a constant factor, which is 
> '2').  You don't *need* internal hashing functions.

I don't understand this paragraph. What is an internal
hash function? Your code seems to do exactly what I want.
The hashes are computed recusively as in a hash tree
with O(N ln N). The only difference between your design
and a design based on a conventional (binary) hash tree
seems to be that data is stored in the intermediate nodes
too. 

> >It would allow git to scale to repositories of large
> >binary files. And it would allow to build a very cool
> >content transport algorithm for those repositories.
> >This algorithm could combine all the advantages of
> >bittorrent and rsync (without the cpu load).
> 
> Yes, the big benefit of internal hashing is that it lets you check 
> validity of a chunk w/o having the entire file available.  I'm not sure 
> that's terribly useful in this case.  [And, if it is, then it can 
> obviously be done w/ other means.]

If I don't miss anything essential, you can validate
each treap piece at the moment you get it from the
network with its SHA1 hash and then proceed with
downloading the prefix and suffix tree (in parallel
if you have more than one peer a la bittorrent).

> >And it would allow trivial merging of patches which
> >apply to different chunks of a file in exact the same
> >way as merging changesets which apply to different
> >files in a tree.
> 
> I'm not sure anyone should be looking at chunks.  To me, at least, they 
> are an object-store-implementation detail only.  For merging, etc, we 
> should be looking at whole files, or (better) the whole repository.
> The chunking algorithm is guaranteed not to respect semantic boundaries 
> (for *some* semantics of *some* file).

You might be right. I just wanted to point out this
possibility because it would allow to avoid calling
external merging code for a lot of trivial merges.

bye,
Martin



-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Linus Torvalds @ 2005-04-20 14:59 UTC (permalink / raw)
  To: David Woodhouse; +Cc: H. Peter Anvin, Git Mailing List, Chris Mason
In-Reply-To: <1114006429.5877.42.camel@localhost.localdomain>



On Thu, 21 Apr 2005, David Woodhouse wrote:
> 
> The reason for doing this is that without it, we can't ever have a full
> history actually connected to the current trees. There'd always be a
> break at 2.6.12-rc2, at which point you'd have to switch to an entirely
> different git repository.

Quite frankly, I'd _much_ rather have a notion of "external references" 
than start depending on external hashes.

IOW, I'd be happier with a new line in the header (after the normal
"author"/"committer" lines) that just pointed to an external tree, aka

	external linux-2.6.12-rc2-tree

and then people could literally use this to link whatever they wanted, and 
it would not force one particular version of an external tree on you.

Why? Because we can't keep re-generating trees.

However, the second part of that plan is that once you do that, you might 
as well make the "external" linkages be external to the repository itself. 
IOW, you could just make a file that the git tools can parse that say

	external-parent <root-hash> <external-parent-ID>
		comment for this parent

	external-parent <commit-hash> <external-parent-ID>
		comment for this parent

and the nice thing about that is that now that information allows you to 
add external parents at any point. 

Why do it like this? First off, I think that the "initial import" ends up
being just one special case of the much more _generic_ issue of having
patches come in from other source control systems (ie the above would
actually work with the darcs issues too, and allow people to track the
dependencies between a tree maintained in git and maintained elsewhere).

Secondly, we do need something like this for pruning off history anyway, 
so that the tools have a better way of saying "history has been pruned 
off" than just hitting a missing commit. That's not a big deal right now, 
since I'm not planning on letting people prune their history (or at least 
I'm planning on having tools complain loudly), but it _will_ be an issue. 
I think history pruning is wonderful, but I do want to have some mechanism 
to say "it was pruned" as opposed to "it was lost".

Thirdly, I don't actually want my new tree to depend on a conversion of
the old BK tree.

Two reasons: if it's a really full conversion, there are definitely going
to be issues with BitMover. They do not want people to try to reverse
engineer how they do namespace merges, which is why they have the "don't
look at git and do another SCM at the same time" clause in the first
place. Namespace merges (and probably other things too, for that matter)
tend to be the thing they tend to do better than anybody else. The kernel
probably does not actually have a lot of those so it might be ok by them,
but the keyword is _might_, and I don't want to cloud git by another
flamewar.

The other reason is just the really obvious one: in the last week, I've
already changed the format _twice_ in ways that change the hash. As long
as it's 119MB of data, it's not going to be too nasty to do again. If it's
3+GB of data, I'm going to feel really constrained about the kind of
conversions I can do. It's one thing to have something that takes a few
minutes and that anybody can do. It's another thing entirely to do
something that requires the convertee to dedicate tons of diskspace and
hours of work on it.

Let's face it, I doubt we did our last conversion ever. I still think that
the git data model is the best model _ever_ for an SCM, but it's not all
the minute details I'm proud over, it's the general big things. For
example, let's see how the "blobs are sequences of smaller hashes" thing
works out. I was doubtful, but Scott's first chunking code doesn't make me
hurl chunks, and I've been wrong before.

And the thing is, I'm ok with being wrong. Especially if I can fix things 
up later.

So I've got tons of reasons (that you may not agree with, obviously) for
why I don't think it's a good idea to base the kernel on a large
conversion. Some (or all) of those reasons may become moot in another week
or month, but I'd definitely _not_ that interested in doing it now. If it
turns out later that we do want to re-base the kernel, we can do any
conversion we want at a later time - it's not that it's necessarily the
wrong thing to do, but I think it is the wrogn thing to do _now_.

		Linus

^ permalink raw reply

* Re: enforcing DB immutability
From: Nick Craig-Wood @ 2005-04-20 14:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git
In-Reply-To: <20050420074948.GA22620@elte.hu>

On Wed, Apr 20, 2005 at 09:49:48AM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > perhaps having a new 'immutable hardlink' feature in the Linux VFS 
> > would help? I.e. a hardlink that can only be readonly followed, and 
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
> > think would be a large enough barrier for editors/build-tools not to 
> > play the tricks they already do that makes 'readonly' files virtually 
> > meaningless.
> 
> immutable hardlinks have the following advantage: a hardlink by design 
> hides the information where the link comes from. So even if an editor 
> wanted to play stupid games and override the immutability - it doesnt 
> know where the DB object is. (sure, it could find it if it wants to, but 
> that needs real messing around - editors wont do _that_)

This has already been implemented for the linux vserver project.  Take
a look in the patch here :-

  http://vserver.13thfloor.at/Experimental/patch-2.6.11.7-vs1.9.5.x5.diff.bz2

(Its not split out, but search for IMMUTABLE and you'll see what I mean)

It implements immutable linkage invert, which basically allows people
to delete hardlinks to immutable files, but not do anything else to
them.  It uses another bit out of the attributes to "invert" the
immutability of the linkage of immutable files.

Its used in the vserver project so that individual vservers (which are
basically just fancy chroots) can share libraries, binaries and hence
memory, can't muck each other up, but can upgrade their libs/binaries.

-- 
Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: C. Scott Ananian @ 2005-04-20 14:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jon, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504200725110.6467@ppc970.osdl.org>

On Wed, 20 Apr 2005, Linus Torvalds wrote:

> - _keep_ the same compression format, but notice that we already have an
>   object by looking at the uncompressed one.

With a chunked file, you can also skip writing certain *subtrees* of the 
file as soon as you notice it's already present on disk.  I can code this 
up if you are interested.

Of course, the paranoid folks will give up any performance benefit you 
obtain if they keep their "yes the SHA1 matches, but is the file *really* 
the same" code.  But maybe they're willing to be slow -- and they can do 
an uncompress rather than a compress in order to do the comparison, which 
will give *some* performance improvement.
  --scott

LCPANGS Serbian MKSEARCH security KUCLUB LCPANES Saddam Hussein Secretary 
Delta Force AMLASH ESMERALDITE TPAJAX plutonium ESGAIN Ft. Meade India
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: C. Scott Ananian @ 2005-04-20 14:30 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Git Mailing List
In-Reply-To: <20050420132446.GA10126@macavity>

On Wed, 20 Apr 2005, Martin Uecker wrote:

> The other thing I don't like is the use of a sha1
> for a complete file. Switching to some kind of hash
> tree would allow to introduce chunks later. This has
> two advantages:

You can (and my code demonstrates/will demonstrate) still use a whole-file 
hash to use chunking.  With content prefixes, this takes O(N ln M) time 
(where N is the file size and M is the number of chunks) to compute all 
hashes; if subtrees can share the same prefix, then you can do this in 
O(N) time (ie, as fast as possible, modulo a constant factor, which is 
'2').  You don't *need* internal hashing functions.

> It would allow git to scale to repositories of large
> binary files. And it would allow to build a very cool
> content transport algorithm for those repositories.
> This algorithm could combine all the advantages of
> bittorrent and rsync (without the cpu load).

Yes, the big benefit of internal hashing is that it lets you check 
validity of a chunk w/o having the entire file available.  I'm not sure 
that's terribly useful in this case.  [And, if it is, then it can 
obviously be done w/ other means.]

> And it would allow trivial merging of patches which
> apply to different chunks of a file in exact the same
> way as merging changesets which apply to different
> files in a tree.

I'm not sure anyone should be looking at chunks.  To me, at least, they 
are an object-store-implementation detail only.  For merging, etc, we 
should be looking at whole files, or (better) the whole repository.
The chunking algorithm is guaranteed not to respect semantic boundaries 
(for *some* semantics of *some* file).
  --scott

explosion JMTRAX DC KUBARK biowarfare LCFLUTTER ESMERALDITE for Dummies 
Hager Nader Israel General ZRMETAL Castro cryptographic Indonesia
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Linus Torvalds @ 2005-04-20 14:29 UTC (permalink / raw)
  To: jon; +Cc: Git Mailing List
In-Reply-To: <2cfc4032050420050655265d3a@mail.gmail.com>



On Wed, 20 Apr 2005, Jon Seymour wrote:
> 
> Am I correct to understand that with this change, all the objects in the 
> database are still being compressed (so no net performance benefit), but by 
> doing the SHA1 calculations before compression you are keeping open the 
> possibility that at some point in the future you may use a different 
> compression technique (including none at all) for some or all of the 
> objects?

Correct. There is zero performance benefit to this right now, and the only 
reason for doing it is because it will allow other things to happen.

Note that the other things include:
 - change the compression format to make it cheaper
 - _keep_ the same compression format, but notice that we already have an 
   object by looking at the uncompressed one.

I'm actually leaning towards just #2 at this time. I like how things
compress, and it sure is simple. The fact that we use the equivalent of
"-9" may be expensive, but the thing is, we don't actually write new files
that often, and it's "just" CPU time (no seeking on disk or anything like
that), which tends to get cheaper over time.

So I suspect that once I optimize the tree writing to notice that "oh, I
already have this tree object", and thus build it up but never compressing
it, "write-tree" performance will go up _hugely_ even without removing the
compressioin. Because most of the time, write-tree actually only needs to
create a couple of small new tree objects.

			Linus

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: David Woodhouse @ 2005-04-20 14:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Git Mailing List, Chris Mason
In-Reply-To: <Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org>

On Wed, 2005-04-20 at 02:08 -0700, Linus Torvalds wrote:
> I converted my git archives (kernel and git itself) to do the SHA1
> hash _before_ the compression phase.

I'm happy to see that -- because I'm going to be asking you to make
another change which will also require a simple repository conversion. 

We are working on getting the complete history since 2.4.0 into git
form. When it's done and checked (which should be RSN) I'd like you to
edit the first commit object in your tree -- the import of 2.6.12-rc2,
and give it a parent. That parent will be the sha1 hash of the
2.6.12-rc2 commit in the newly-provided history, and of course will
change the sha1 hash of your first commit, and all subsequent commits. 
We'll provide a tool to do that, of course.

The history itself will be absent from your tree. Obviously we'll need
to make sure that the tools can cope with an absentee parent, probably
by just treating that case as if no parent exists. That won't be hard,
it'll be useful for people to prune their trees of unwanted older
history in the general case too. That history won't be lost or undone --
it'll just be archived elsewhere.

The reason for doing this is that without it, we can't ever have a full
history actually connected to the current trees. There'd always be a
break at 2.6.12-rc2, at which point you'd have to switch to an entirely
different git repository.

-- 
dwmw2


^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Jon Seymour @ 2005-04-20 13:41 UTC (permalink / raw)
  To: Martin Uecker, Git Mailing List
In-Reply-To: <20050420132446.GA10126@macavity>

> The main point is not about trying different compression
> techniques but that you don't need to compress at all just
> to calculate the hash of some data. (to know if it is
> unchanged for example)
> 

Ah, ok, I didn't understand that there were extra compresses being
performed for that reason. Thanks for the explanation.

jon.

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Morten Welinder @ 2005-04-20 13:35 UTC (permalink / raw)
  To: Martin Uecker, Git Mailing List
In-Reply-To: <20050420132446.GA10126@macavity>

On 4/20/05, Martin Uecker <muecker@gmx.de> wrote:

> The storage method of the database of a collection of
> files in the underlying file system. Because of the
> random nature of the hashes this leads to a horrible
> amount of seeking for all operations which walk the
> logical structure of some tree stored in the database.
> 
> Why not store all objects linearized in one or more
> flat file?

I've been thinking along the same lines and it doesn't look too hard
to factor out the
"back end", i.e., provide methods to
read/write/stat/remove/mmap/whatever objects.
(Note the mmap there.  Apart from that, the backend could be an http connection
or worse.)

It will, however, seriously break rsync as transport for people who
commit to their trees.
Thus you need an alternative in place before you can present it as an
alternative.

Morten

^ permalink raw reply

* Blob chunking code. [First look.]
From: C. Scott Ananian @ 2005-04-20 13:30 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Linus Torvalds
In-Reply-To: <Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org>

So I wrote up my ideas regarding blob chunking as code; see attached.
This is against git-0.4 (I know, ancient, but I had to start somewhere.)

The idea here is that blobs are chunked using a rolling checksum (so the 
chunk boundaries are content-dependent and stay fixed even if you mutate 
pieces of the file).  The chunks are then tree-structured as 'treaps', 
which will ensure that chunk trees can be profitably reused.  (If you 
create a flat 'chunk index' instead of tree-structuring it, then you need 
to write two files even if you make a small change to a small file.  If 
you use a full binary tree, then insertions at the beginning (say) still 
change the entire tree structure.  The treap ensures that on avg O(ln N) 
chunks need to be written per change, where N is the number of chunks in 
the file).  More details are in the code.

Compatibility with existing archives in git-0.4 was tricky, because of 
git's 'compress-before-hash' thingy.  Moving to 'hash before compress' is 
*much* better, although because the file size is included in the hash, I will 
need to perform (the equivalent of) O(ln N) hashes of the complete file.
If the file size weren't included, or if it were put at the end, then 2 
hashes would suffice. (Basically, we can save work hashing subranges which 
are prefix-identical, but including the length means that no subtrees are
prefix-identical.)

I'll work on bringing this forward to the latest git, but I thought I'd 
post it here for early reviews and comments.  My informal testing shows 
that 1) my chunk size is currently too small, and 2) subrange sharing 
works well even on relatively small files.  I'll be working on getting 
concrete numbers for larger archives.
  --scott

DNC NRA Kojarena ESCOBILLA QKENCHANT STANDEL shotgun ESGAIN KGB Mossad 
overthrow ASW cracking HOPEFUL KUBARK counter-intelligence Yakima
                          ( http://cscott.net/ )
------- begin chunk.c ----------
#include <stdlib.h>

/* we could be clever and do this even if we don't fit in memory...
  * ... but we're going to be quick and dirty. */

/* C source has approx 5 bits per character of entropy.
  * We'd like to get 32 bits of good entropy; that means 7 bytes is a
  * reasonable minimum for the window size. */
#define ROLLING_WINDOW 30
#define CHUNK_SIZE 1023 /* desired block size */

#include <assert.h>
#include "cache.h"

/*
  * This file implements a treap-based chunked content store.  The
  * idea is that every stored file is broken down into tree-structured
  * chunks (that is, every chunk has an optional 'prefix' and 'suffix'
  * chunk), and these chunks are put in the object store.  This way
  * similar files will be expected to share chunks, saving space.
  * Files less than one disk block long are expected to fit in a single
  * chunk, so there is no extra indirection overhead for this case.
  */

/* First, some data structures: */
struct chunk {
     /* a chunk represents some range of the underlying file */
     size_t start /* inclusive */, end /*exclusive*/;
     unsigned char sha1[20]; /* sha1 for this chunk; used as the heap key */
};
struct chunklist {
     /* a dynamically-sized list of chunks */
     struct chunk *chunk; /* an array of chunks */
     size_t num_items; /* how many items are currently in the list */
     size_t allocd;    /* how many items we've allocated space for */
};
struct treap {
     /* A treap node represents a run of consecutive chunks. */

     struct chunk *chunk; /* some chunk in the run. */
     /* treaps representing the run before 'chunk' (left) and
      * after 'chunk' (right).  */
     struct treap *left, *right;
     /* sha1 for the run represented by this treap */
     unsigned char sha1[20];
};

static struct chunklist *
create_chunklist(int expected_items) {
     struct chunklist *cl = malloc(sizeof(*cl));
     cl->num_items = 0;
     cl->allocd = expected_items;
     cl->chunk = malloc(sizeof(cl->chunk[0]) * cl->allocd);
     return cl;
}
static void
free_chunklist(struct chunklist *cl) {
     free(cl->chunk);
     free(cl);
}

/* Add a chunk to the chunk list, calculating its SHA1 in the process. */
/* The chunk includes buf[start] to buf[end-1].                        */
static void
add_chunk(struct chunklist *cl, char *buf, size_t start, size_t end) {
     struct chunk *ch;
     SHA_CTX c;
     assert(start<end); assert(cl); assert(buf);
     if (cl->num_items >= cl->allocd) {
 	cl->allocd = cl->allocd*3/2;
 	cl->chunk = realloc(cl->chunk, cl->allocd * sizeof(*(cl->chunk)));
     }
     assert(cl->num_items < cl->allocd);
     ch = cl->chunk + (cl->num_items++);
     ch->start = start;
     ch->end = end;
     // compute SHA-1
     SHA1_Init(&c);
     SHA1_Update(&c, buf+start, end-start);
     SHA1_Final(ch->sha1, &c);
     // done!
}

/* Split a buffer into chunks, using a rolling checksum over ROLLING_WINDOW
  * bytes to determine chunk boundaries.  We try to split chunks into pieces
  * whose size averages out to be 'CHUNK_SIZE'. */
static void
chunkify(struct chunklist *cl, char *buf, size_t size) {
     int i, rsync_s1, rsync_s2, last=-1;
     /* Make seed non-zero so that leading 0s don't create 1-char chunks. */
     rsync_s1 = rsync_s2 = 0xBABE; /* arbitrary */
     /* While window is filling: */
     for (i=0; i<ROLLING_WINDOW && i<size; i++) {
 	rsync_s1 = (rsync_s1 + ((unsigned char)buf[i])) & 0xFFFF;
 	rsync_s2 = (rsync_s2 + rsync_s1) & 0xFFFF;
 	/* Is this the end of a chunk? */
 	if (0 == ((rsync_s1 + rsync_s2) % CHUNK_SIZE)) {
 	    add_chunk(cl, buf, last+1, i+1);
 	    last = i;
 	}
     }
     /* After window is full: */
     for ( ; i<size; i++) {
 	/* Old character out */
 	rsync_s1 = (rsync_s1 - ((unsigned char)buf[i-ROLLING_WINDOW]))& 0xFFFF;
 	rsync_s2 = (rsync_s2 - ROLLING_WINDOW * (unsigned char)buf[i-ROLLING_WINDOW]) & 0xFFFF;
 	/* New character in */
 	rsync_s1 = (rsync_s1 + ((unsigned char)buf[i])) & 0xFFFF;
 	rsync_s2 = (rsync_s2 + rsync_s1) & 0xFFFF;
 	/* Is this the end of a chunk? */
 	if (0 == ((rsync_s1 + rsync_s2) % CHUNK_SIZE)) {
 	    add_chunk(cl, buf, last+1, i+1);
 	    last = i;
 	}
     }
     /* One last chunk at the end: */
     if (last+1!=size)
 	add_chunk(cl, buf, last+1, size);
     /* done! */
}

/* A treap is a 'heap-ordered tree'.  There are two constraints maintained:
  *   left tree key < this tree key < right tree key
  * and
  *   this heap key < left and right heap keys.
  * We use the sha1 of the chunk (chunk->sha1) as the heap key and the
  * file location (chunk->start) as the tree key.
  * For more info on treaps, see:
  *   C. R. Aragon and R. G. Seidel, "Randomized search trees",
  *   Proc. 30th IEEE FOCS (1989), 540-545.
  * There are many possible binary trees we could build; enforcing the
  * heap constraint ensures that similar files will build similar trees.
  */

/* Assertion helper: check tree and heap constraints. */
static int
treap_valid(struct treap *t) {
     int valid = 1;
     if (!t) return 1;
     if (t->chunk==NULL) return 0;
     if (t->left!=NULL) {
 	/* Tree constraint. */
 	valid = valid && (t->left->chunk->start < t->chunk->start);
 	/* Heap constraint. */
 	valid = valid && (memcmp(t->chunk->sha1, t->left->chunk->sha1,
 				 sizeof(t->chunk->sha1)) < 0);
     }
     if (t->right!=NULL) {
 	/* Tree constraint. */
 	valid = valid && (t->chunk->start < t->right->chunk->start);
 	/* Heap constraint. */
 	valid = valid && (memcmp(t->chunk->sha1, t->right->chunk->sha1,
 				 sizeof(t->chunk->sha1)) < 0);
     }
     return valid;
}

/* Restore heap constraint without disturbing tree ordering. */
/* Only the root of the given treap will violate the heap constraint. */
static struct treap *
treapify(struct treap *t) {
     struct treap *x, *y, *a, *b, *c;
     int left_ok, right_ok, rotate_left;
     assert(treap_valid(t->left));
     assert(treap_valid(t->right));
     left_ok = (t->left == NULL) ||
 	(memcmp(t->chunk->sha1, t->left->chunk->sha1,
 		sizeof(t->chunk->sha1)) < 0);
     right_ok = (t->right == NULL) ||
 	(memcmp(t->chunk->sha1, t->right->chunk->sha1,
 		sizeof(t->chunk->sha1)) < 0);
     if (left_ok && right_ok) { /* well, that's easy */
 	assert(treap_valid(t));
 	return t;
     }
     /* okay, someone needs to rotate */
     rotate_left = (!left_ok) &&
 	(right_ok || /* if neither is okay, the rotate smallest up */
 	 memcmp(t->left->chunk->sha1, t->right->chunk->sha1,
 		sizeof(t->chunk->sha1)) < 0);
     /*   Rotation:
      *     y   -bring left up->  x
      *    / \                   / \
      *   x   c                 a   y
      *  / \                       / \
      * a   b <-bring right up-   b   c
      */
     if (rotate_left) {
 	y = t;  x = y->left;  c = y->right;  a = x->left;  b = x->right;
 	y->left = b;
 	y->right = c;
 	x->left = a;
 	x->right = treapify(y); // recurse to check heap constraint
 	assert(treap_valid(x));
 	return x;
     } else {
 	x = t;  a = x->left;  y = x->right;  b = y->left;  c = y->right;
 	x->left = a;
 	x->right = b;
 	y->right = c;
 	y->left = treapify(x); // recurse to check heap constraint.
 	assert(treap_valid(y));
 	return y;
     }
}

/* Use list of chunks to build treap bottom-up, calling treapify to
  * restore heap order on the subtree after we add each interior node.
  * This is O(N), where N is the number of chunks. */
static struct treap *
build_treap(struct chunklist *cl, int chunk_st, int chunk_end) {
     struct treap *result;
     /* Some treaps are trivial to build: */
     if (chunk_st >= chunk_end) return NULL;
     /* Claim a chunk in the middle for ourself. */
     int c = (chunk_st + chunk_end)/2;
     result = (struct treap *)malloc(sizeof(*result));
     result->chunk = &(cl->chunk[c]);
     /* Divide and conquer: build well-formed treaps for our kids.*/
     result->left = build_treap(cl, chunk_st, c);
     result->right = build_treap(cl, c+1, chunk_end);
     /* Now we need to ensure that the heap constraint is satisfied; that is,
      * result->chunk->sha1 < result->left->chunk->sha1  and
      * result->chunk->sha1 < result->right->chunk->sha1.
      */
     assert(treap_valid(result->left));
     assert(treap_valid(result->right));
     return treapify(result);
}

static void
free_treap(struct treap *t) {
     if (!t) return;
     if (t->left) free_treap(t->left);
     if (t->right) free_treap(t->right);
     free(t);
}

/* Now that we've broken it down into treap-structured pieces, let's write
  * them to the object store. */

/* Write a single treap piece to the object store.  Note that 't' may be
  * NULL for the special case of a zero-byte file.  Writes the hash of
  * this piece back to 'sha1', which must be non-NULL. Returns 0 on success.*/
static int
write_one(struct treap *t, char *buf, unsigned char *sha1) {
/* two hundred bytes is two 20-byte SHA1 hashes, two presence bytes,
  * six bytes of type, one null, and plus 10^151 file length. (Conservative.) */
#define MAX_METADATA_LEN 200
     z_stream stream;
     size_t max_out_bytes;
     size_t chunk_size = t ? (t->chunk->end - t->chunk->start) : 0;
     size_t content_size = chunk_size;
     char metadata[MAX_METADATA_LEN];
     void *out;
     SHA_CTX c;

     memset(&stream, 0, sizeof(stream));
     deflateInit(&stream, Z_BEST_COMPRESSION);
     max_out_bytes = deflateBound(&stream, chunk_size+sizeof(metadata));
     out = malloc(max_out_bytes);
     stream.next_out = out;
     stream.avail_out = max_out_bytes;
     /*
      * Metadata: Type, ASCII size, null byte, then left & right hashes.
      */
     content_size = chunk_size+2; /* prefix/suffix delimiters */
     if (t && t->left) content_size += sizeof(t->left->sha1);
     if (t && t->right) content_size += sizeof(t->right->sha1);

     stream.next_in = metadata;
     stream.avail_in = 1+sprintf(metadata, "chunk %lu",
 				(unsigned long) content_size);
     if (t && t->left) { /* left hash */
 	stream.next_in[stream.avail_in++] = 1;
 	memcpy(stream.next_in + stream.avail_in,
 	       t->left->sha1, sizeof(t->left->sha1));
 	stream.avail_in += sizeof(t->left->sha1);
     } else
 	stream.next_in[stream.avail_in++] = 0; /* no prefix chunk */
     if (t && t->right) { /* right hash */
 	stream.next_in[stream.avail_in++] = 1;
 	memcpy(stream.next_in + stream.avail_in,
 	       t->right->sha1, sizeof(t->right->sha1));
 	stream.avail_in += sizeof(t->right->sha1);
     } else
 	stream.next_in[stream.avail_in++] = 0; /* no suffix chunk */

     while (deflate(&stream, 0) == Z_OK)
 	/* nothing */;
     /*
      * Chunk content.
      */
     stream.next_in = buf + ( t ? t->chunk->start : 0);
     stream.avail_in = chunk_size; /* possibly zero */
     while (deflate(&stream, Z_FINISH) == Z_OK)
 	/* nothing */;

     deflateEnd(&stream);

     SHA1_Init(&c);
     SHA1_Update(&c, out, stream.total_out);
     SHA1_Final(sha1, &c);

     return write_sha1_buffer(sha1, out, stream.total_out);
}

/* Write a sub-treap to disk, setting the 'sha1' fields of all nodes
  * as we go. */
static int
write_treap(struct treap *t, char *buf, unsigned char *sha1) {
     /* First write children (which initializes their SHA1 info). */
     if (t && t->left)
 	if (write_treap(t->left, buf, NULL) < 0)
 	    return -1; /* failure. */
     if (t && t->right)
 	if (write_treap(t->right, buf, NULL) < 0)
 	    return -1; /* failure. */
     /* Now write us.  Note t may == NULL for a zero-byte file. */
     if (write_one(t, buf, t ? t->sha1 : sha1) < 0)
 	return -1; /* failure. */
     if (t && sha1)
 	memcpy(sha1, t->sha1, sizeof(t->sha1));
     return 0;
}

/* EXPORTED FUNCTION: write the file open on file descriptor 'fd'
  * and described by 'ce' and 'st' to the object store.   Return
  * 0 on success, -1 on failure. */
/* This does the same thing as 'index_fd' in Linus' update-cache.c */
int
chunk_index_fd(struct cache_entry *ce, int fd, struct stat *st) {
     struct chunklist *cl;
     struct treap *t;
     char *in;

     /* We expect there to be 'file length / CHUNK_SIZE' chunks.  Over-estimate
      * a little, and do the initial chunk list allocation. */
     cl = create_chunklist(1 + ((3 * st->st_size) / (2 * CHUNK_SIZE)));
     /* Split the file into chunks. */
     in = mmap(NULL, st->st_size, PROT_READ, MAP_PRIVATE, fd, 0);
     if (!in) return -1;
     chunkify(cl, in, st->st_size);
     /* Build the treap. */
     t = build_treap(cl, 0, cl->num_items);
     assert(treap_valid(t));
     /* Now write all the pieces, updating SHA1 for this file in the process. */
     if (write_treap(t, in, ce->sha1) < 0)
 	return -1;
     /* Free everything; we're done. */
     free_treap(t);
     free_chunklist(cl);
     munmap(in, st->st_size);
     close(fd);
     return 0; /* success! */
}

/*** Functions to read a chunked file into a contiguous buffer. ***/

struct read_chunk {
     void *data, *chunk_data;
     unsigned long chunk_size, total_size;
     struct read_chunk *left, *right;
};
static struct read_chunk *
read_chunk2(const unsigned char *sha1, void *data, unsigned long size);

static struct read_chunk *
read_chunk(const unsigned char *sha1) {
     void *data;
     unsigned long size;
     char type[10];
     data = read_sha1_file(sha1, type, &size);
     assert(strcmp(type, "chunk")==0);
     return read_chunk2(sha1, data, size); 
}
static struct read_chunk *
read_chunk2(const unsigned char *sha1, void *data, unsigned long size) {
     unsigned char *cp;
     struct read_chunk *result = malloc(sizeof(*result));
     cp = result->data = data;
     printf("CHUNK %s (%d bytes)\n", sha1_to_hex(sha1), size);
     /* Parse the chunk data. */
     result->left = result->right = NULL;
     if (*cp++) {
 	result->left = read_chunk(cp); cp+=20;
     }
     if (*cp++) {
 	result->right = read_chunk(cp); cp+=20;
     }
     result->chunk_data = cp;
     result->chunk_size = size - (result->chunk_data - result->data);
     result->total_size = result->chunk_size +
 	(result->left ? result->left->total_size : 0) +
 	(result->right ? result->right->total_size : 0);
     return result;
}
static void
copy_read_chunk(void *dest, struct read_chunk *rc) {
     if (rc->left) {
 	copy_read_chunk(dest, rc->left);
 	dest += rc->left->total_size;
     }
     memcpy(dest, rc->chunk_data, rc->chunk_size);
     if (rc->right)
 	copy_read_chunk(dest + rc->chunk_size, rc->right);
}
static void
free_read_chunk(struct read_chunk *rc) {
     if (rc->left) free_read_chunk(rc->left);
     if (rc->right) free_read_chunk(rc->right);
     free(rc->data);
     free(rc);
}

/* This does the same thing as 'read_sha1_file' in Linus' read_cache.c,
  * except that it knows about the 'chunk' encoding and will transparently
  * stitch together the appropriate prefix and suffix chunks and pass it
  * off as a 'blob'. */
void *
chunk_read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size) {
     struct read_chunk *rc;
     void *result = read_sha1_file(sha1, type, size);
     if (strcmp(type, "chunk")!=0) return result;
     /* This is a 'chunk' object; get the rest of the pieces. */
     rc = read_chunk2(sha1, result, *size);
     /* Now concatenate them together. */
     strcpy(type, "blob");
     *size = rc->total_size;
     result = malloc(*size);
     copy_read_chunk(result, rc);
     /* done! */
     free_read_chunk(rc);
     return result;
}

/* Exercise this code. */
int main(int argc, char **argv) {
     struct cache_entry ce;
     struct stat st;
     char *buf, type[10];
     unsigned long size;
     int fd;
     fd = open(argv[1], O_RDONLY);
     if (fd < 0) exit(1);
     if (fstat(fd, &st) < 0) exit(1);
     if (chunk_index_fd(&ce, fd, &st) < 0) exit(1);
     /* seemed to work! */
     buf = chunk_read_sha1_file(ce.sha1, type, &size);
     if (!buf) exit(1);
     printf("Read file %s, of type %s (%lu bytes):\n",
 	   sha1_to_hex(ce.sha1), type, size);
     fwrite(buf, size, 1, stdout);
     /* done! */
     return 0;
}

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Martin Uecker @ 2005-04-20 13:24 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <2cfc403205042005116484231c@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

On Wed, Apr 20, 2005 at 10:11:10PM +1000, Jon Seymour wrote:
> On 4/20/05, Linus Torvalds <torvalds@osdl.org> wrote:
> > 
> > 
> > I converted my git archives (kernel and git itself) to do the SHA1 hash
> > _before_ the compression phase.
> > 
> 
> Linus,
>  
>  Am I correct to understand that with this change, all the objects in
> the database are still being compressed (so no net performance benefit
> now), but by doing the SHA1 calculations before compression you are
> keeping open the possibility that at some point in the future you may
> use a different compression technique (including none at all) for some
> or all of the objects?

The main point is not about trying different compression
techniques but that you don't need to compress at all just
to calculate the hash of some data. (to know if it is
unchanged for example)

There are still some other design decisions I am worried
about:

The storage method of the database of a collection of
files in the underlying file system. Because of the
random nature of the hashes this leads to a horrible
amount of seeking for all operations which walk the
logical structure of some tree stored in the database.

Why not store all objects linearized in one or more
flat file?


The other thing I don't like is the use of a sha1
for a complete file. Switching to some kind of hash
tree would allow to introduce chunks later. This has
two advantages:

It would allow git to scale to repositories of large
binary files. And it would allow to build a very cool
content transport algorithm for those repositories.
This algorithm could combine all the advantages of
bittorrent and rsync (without the cpu load).


And it would allow trivial merging of patches which
apply to different chunks of a file in exact the same
way as merging changesets which apply to different
files in a tree.


Martin

-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox