Git development
 help / color / mirror / Atom feed
* [PATCH] Tweak git-diff-tree -v output further (take 2).
From: Junio C Hamano @ 2005-05-06 19:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

(This one is simpler than the previous one I just sent out)

The first hunk of this is a pure bugfix---it guards us against a
commit message that does not end with a newline.

This adds the full header information to git-diff-tree -v output
in addition to the log message it already produces.  It also
stops indenting the log message to match what git-export does.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff-tree.c |   22 ++++++++++++----------
1 files changed, 12 insertions(+), 10 deletions(-)

# - linus-mirror: diff-tree: add "verbose header" mode
# + (working tree)
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -278,7 +278,7 @@ static int get_one_line(const char *msg,
 
 	while (len--) {
 		ret++;
-		if (*msg++ == '\n')
+		if (!*msg || *msg++ == '\n')
 			break;
 	}
 	return ret;
@@ -287,12 +287,14 @@ static int get_one_line(const char *msg,
 static char *generate_header(const char *commit, const char *parent, const char *msg, unsigned long len)
 {
 	static char this_header[1000];
-	int offset;
 
-	offset = sprintf(this_header, "%s%s (from %s)\n", header_prefix, commit, parent);
-	if (verbose_header) {
+	if (!verbose_header)
+		sprintf(this_header, "%s%s (from %s)\n", header_prefix,
+			commit, parent);
+	else {
+		int offset;
 		int hdr = 1;
-
+		offset = sprintf(this_header, "Id: %s\n", commit);
 		for (;;) {
 			const char *line = msg;
 			int linelen = get_one_line(msg, len);
@@ -306,11 +308,11 @@ static char *generate_header(const char 
 			len -= linelen;
 			if (linelen == 1)
 				hdr = 0;
-			if (hdr)
-				continue;
-			memset(this_header + offset, ' ', 4);
-			memcpy(this_header + offset + 4, line, linelen);
-			offset += linelen + 4;
+			memcpy(this_header + offset, line, linelen);
+			if (hdr && !strncmp(line, "parent ", 7) &&
+			    !strncmp(line+7, parent, 40))
+				this_header[offset + 6] = '*';
+			offset += linelen;
 		}
 		this_header[offset++] = '\n';
 		this_header[offset] = 0;


^ permalink raw reply

* Re: How do I...
From: Thomas Kolejka @ 2005-05-06 19:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505061158520.2233@ppc970.osdl.org>


I've written a script that shows all commits when a certain file was
changed.

It goes back the commits, looks into the tree ... and if a change is found,
print out the commit.


Thomas



--- /dev/null	1970-01-01 01:00:00.000000000 +0100
+++ git-file-history-script	2005-05-06 21:24:41.000000000 +0200
@@ -0,0 +1,59 @@
+#!/bin/sh
+# 
+# Copyright (C) 2005 Thomas Kolejka
+#
+# usage - $0 [commit] pathname
+
+
+if [ $# -gt 1 ]
+then
+	HEAD=$1
+	shift
+else
+	HEAD=`cat $SHA1_FILE_DIRECTORY/../HEAD`
+fi
+
+git-cat-file commit $HEAD >> /dev/null
+
+if [ $? -ne 0 ]
+then
+	exit
+fi
+
+f_name=$1
+
+
+echo "starting from commit $HEAD"
+
+last_sha1="last-revision"
+last_commit=$HEAD
+
+git-rev-list $HEAD | while read the_commit
+do
+
+	the_tree=`git-cat-file commit $the_commit|head -n1 | awk '{ print $2 }'`
+
+
+	the_sha1=`git-ls-tree -r $the_tree|grep -w "${f_name}$"|awk '{ print $3
}'`
+
+	if [ -z "$the_sha1" ]
+	then
+		continue
+	fi
+
+
+	if [ $last_sha1 != $the_sha1 ]
+	then
+		echo commit $the_commit - tree $the_tree - sha1 $the_sha1
+		echo " "
+		# echo "$the_sha1 -> $last_sha1"
+		last_sha1=$the_sha1
+		git-cat-file commit $last_commit
+
+		echo " "
+		echo " "
+		echo " "
+	fi
+
+	last_commit=$the_commit
+done

-- 
+++ Neu: Echte DSL-Flatrates von GMX - Surfen ohne Limits +++
Always online ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl

^ permalink raw reply

* [PATCH] Tweak git-diff-tree -v output further.
From: Junio C Hamano @ 2005-05-06 19:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

The first hunk of this is a pure bugfix---it guards us against a
commit message that does not end with a newline.

This adds the full header information to git-diff-tree -v output
in addition to the log message it already produces.

Maybe we want to stop indenting so that it matches what
git-export produces better.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff-tree.c |   30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)

# - linus-mirror: diff-tree: add "verbose header" mode
# + (working tree)
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -278,7 +278,7 @@ static int get_one_line(const char *msg,
 
 	while (len--) {
 		ret++;
-		if (*msg++ == '\n')
+		if (!*msg || *msg++ == '\n')
 			break;
 	}
 	return ret;
@@ -287,12 +287,14 @@ static int get_one_line(const char *msg,
 static char *generate_header(const char *commit, const char *parent, const char *msg, unsigned long len)
 {
 	static char this_header[1000];
-	int offset;
 
-	offset = sprintf(this_header, "%s%s (from %s)\n", header_prefix, commit, parent);
-	if (verbose_header) {
+	if (!verbose_header)
+		sprintf(this_header, "%s%s (from %s)\n", header_prefix,
+			commit, parent);
+	else {
+		int offset;
 		int hdr = 1;
-
+		offset = sprintf(this_header, "Id: %s\n", commit);
 		for (;;) {
 			const char *line = msg;
 			int linelen = get_one_line(msg, len);
@@ -306,11 +308,19 @@ static char *generate_header(const char 
 			len -= linelen;
 			if (linelen == 1)
 				hdr = 0;
-			if (hdr)
-				continue;
-			memset(this_header + offset, ' ', 4);
-			memcpy(this_header + offset + 4, line, linelen);
-			offset += linelen + 4;
+			if (hdr) {
+				memcpy(this_header + offset, line, linelen);
+				if (!strncmp(line, "parent ", 7) &&
+				    !strncmp(line+7, parent, 40))
+					this_header[offset + 6] = '*';
+				offset += linelen;
+					
+			}
+			else {
+				memset(this_header + offset, ' ', 4);
+				memcpy(this_header + offset + 4, line, linelen);
+				offset += linelen + 4;
+			}
 		}
 		this_header[offset++] = '\n';
 		this_header[offset] = 0;


^ permalink raw reply

* Re: How do I...
From: David Woodhouse @ 2005-05-06 19:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Frank Sorenson, git
In-Reply-To: <Pine.LNX.4.58.0505061158520.2233@ppc970.osdl.org>

On Fri, 2005-05-06 at 11:59 -0700, Linus Torvalds wrote:
> I don't do no steenking GUI's. That's for others, and you'll need to
> parse the git-rev-list output yourself to do that.

When I said 'show' I meant merely provide sufficient information. 

I believe that what you get at the moment from the command you showed is
an unconnected list of commits, each of which may be in different
branches, where the _parent_ of each commit you show may not even be
included in the list. 

I.e. you're giving them a bag of unconnected objects which each look
something like this...

		 (COMMIT)--> 
or
		 (COMMIT)--->
		         \-->


But you need to give them a _graph_...

	/------> (COMMIT) --> (COMMIT) --\
 (COMMIT)                                 -> (COMMIT) 
	\-----------> (COMMIT) ----------/

Leaving the pretty GUI as an exercise for the reader is fine. But we do
actually have to give enough information to allow them to connect the
bits together.

My recursive script attempted this but wasn't quite good enough -- we
had enough information to track _merges_ but not _branches_. Hence I was
only giving this much information...

	/------> (COMMIT) --> (COMMIT) --\
 (COMMIT)                                 -> (COMMIT) 
	              (COMMIT) ----------/

(...or this one, depending on which order I parsed the tree...)

	         (COMMIT) --> (COMMIT) --\
 (COMMIT)                                 -> (COMMIT) 
	\-----------> (COMMIT) ----------/

-- 
dwmw2


^ permalink raw reply

* Re: How do I...
From: Frank Sorenson @ 2005-05-06 19:07 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Git Mailing List
In-Reply-To: <1115402331.10460.16.camel@localhost>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dave Kleikamp wrote:
> I'm not sure if I was clear.  cg-pull should be 'cg-pull origin".

Got it, and it does just what I wanted.

> I take this to mean you're seeing problems with cg-update too. cg-update
> simply runs cg-pull & cg-merge together, so running them separately
> shouldn't make any difference.

Yes.  cg-pull and cg-update have both shown odd breakage of this sort,
putting my tree into a bad state.  Sometimes deleting files fixes it,
but more often than not, I've needed to just start a new tree again in
order to fix it.  This is probably due to inexperience with git, but
tree corruption probably shouldn't occur like this.

Frank
- --
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCe8BxaI0dwg4A47wRAk2OAJ915s2KHTNxrpi6k3wa7HiDhpOFGgCgrMxp
73JzFxl7lg2c9korTF8L7Ek=
=ILPh
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: Version of dirdiff to display diffs between git trees
From: Linus Torvalds @ 2005-05-06 19:07 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: Paul Mackerras, git
In-Reply-To: <m3d5s4jieh.fsf@defiant.localdomain>



On Fri, 6 May 2005, Krzysztof Halasa wrote:
>
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > 	cat .git/ORIG_HEAD > .git/HEAD
> > 	git-read-tree -m HEAD
> > 	git-checkout-cache -f -a
> > 	git-update-cache --refresh
> >
> > and you're back to your original head (the above is basically "unpull").
> 
> So, is "git-read-tree -m HEAD" actually equivalent to "git-read-tree HEAD"
> and does it simply write complete index (ignoring the old one)
> corresponding to given HEAD?

Yes, "git-read-tree -m HEAD" is 100% equivalent to the version without
"-m" except: that it reads the old index file, and picks up the file stat
information from there if the name/SHA1 pair matches.

This has two implications:

 - "git-read-tree -m HEAD" is a lot better than the non "-m" version, 
   since it means that if most of the files are unchanged between the new
   and the old tree, _most_ of the index is still up-to-date.

   You still need to do "git-update-cache --refresh" to make sure the 
   index is fully up-to-date, but now the refresh has to do a _lot_ less.

 - If your old index file has crap in it, it won't work. If you have a 
   corrupt index file, you can't use "-m". And in particular, if the merge 
   _failed_ and you have unmerged entries in your index file, you can't
   use "-m" (I might change that, and let the single-merge case just 
   ignore unmerged entries).

So the rule is: normally you probably want to use "-m", but if you want to
start from a totally clean slate because something went wrong, you should
skip the "-m" which then does the "reset the whole index" without the
merge of the old index information.

(Also, if the new tree you're reading is totally different from the old 
one, or you don't have anything checked out, you're better off without the 
"-m", since it will just add overhead for no gain).

			Linus

^ permalink raw reply

* Re: How do I...
From: Linus Torvalds @ 2005-05-06 18:59 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Junio C Hamano, Frank Sorenson, git
In-Reply-To: <1115404771.16187.343.camel@hades.cambridge.redhat.com>



On Fri, 6 May 2005, David Woodhouse wrote:
>
> On Fri, 2005-05-06 at 10:09 -0700, Linus Torvalds wrote:
> > So now you can do
> > 
> >         git-rev-list HEAD --max-count=10 | git-diff-tree --stdin update-cache.c
> > 
> > to see which of the last 10 commits changed "update-cache.c".
> 
> Now show the graph of revision history which connects those commits.

I don't do no steenking GUI's. That's for others, and you'll need to parse
the git-rev-list output yourself to do that.

		Linus

^ permalink raw reply

* Re: How do I...
From: Linus Torvalds @ 2005-05-06 18:58 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: git
In-Reply-To: <427B3DB3.4000507@tuxrocks.com>



On Fri, 6 May 2005, Frank Sorenson wrote:
> 
> Okay, I've got some "How can I?" questions.  I hope I'm not the only one
> still working to "git it".
> 
> How can I git a list of commits that have modified a particular file?
> For example, I'd like to do something like this:
> # git-file-revs Makefile
> f7eb55878f11575281add2a5726e483aed5e45bb
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Ok, I now have the perfect interface for _most_ uses of this thing.

For example, let's say that you are interested in "what changed in the 
USB input layer lately". You can now do

	git-rev-list | git-diff-tree -p -v --stdin drivers/usb/input | less -S

and you'll get some very readable output that tells you _exactly_ what has 
changed to any file in that subdirectory.

Or, let's say that you're the author of the gt96100 ethernet driver (just 
to pick one that has both a .c and a .h file and that has had changes in 
the current git tree, you'd do:

	git-rev-list HEAD | git-diff-tree -p -v --stdin drivers/net/gt96100eth.* | less -S

and it gives you _exactly_ what you want (ie thanks to how diff-tree
works, you can give it any number of files or directories you're
interested in).

Normally, this thing will ignore merge commits, but if you want to see the
merges that the changes came through (a merge _can_ have real changes of
its own too), add the "-m" flag to the git-diff-tree thing.

Try out the above examples on the current kernel tree (and with my most
current git version as of five minutes ago - it shows up at least on
gitweb, but I don't know if it's mirrored out with rsync yet). Very
pretty. Very useful.

		Linus

^ permalink raw reply

* Re: How do I...
From: Frank Sorenson @ 2005-05-06 18:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.21.0505061321590.30848-100000@iabervon.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Barkalow wrote:
> On Fri, 6 May 2005, Frank Sorenson wrote:
> 
> 
>>Note that I could be just thinking about this all wrong, so my
>>terminology could be in left field.  Here, I'm mostly just interested in
>>the case where "Hey, something broke with drivers/char/i8k.c.  When was
>>this changed?  Who changed what?"
> 
> 
> The tricky thing is that you want to *not* see commits where somebody
> adopted somebody else's change to drivers/char/i8k.c; you want to ignore
> those commits in favor of the commits where the original author of the
> changes made the changes. Otherwise, you mostly see merges with people
> submitting lines where they didn't change that file.

True.  At least usually.  Sometimes, though, we'll want to see the
entire history of the file, so we can see when it went (for example)
into Greg K-H's tree, then when Linus pulls into his tree, etc.  I guess
that makes "just when the file itself has actually changed" a special
case of the entire history of a particular file.

Frank
- --
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCe73XaI0dwg4A47wRAvxaAJ9E1mFepuHmTfvVfwr8zMwpqqcSZACgsz0M
NVAd1f2ZGzu+NPqD3zDQ3Yo=
=I3EA
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: Version of dirdiff to display diffs between git trees
From: Krzysztof Halasa @ 2005-05-06 18:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, git
In-Reply-To: <Pine.LNX.4.58.0505060916320.2233@ppc970.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> 	cat .git/ORIG_HEAD > .git/HEAD
> 	git-read-tree -m HEAD
> 	git-checkout-cache -f -a
> 	git-update-cache --refresh
>
> and you're back to your original head (the above is basically "unpull").

So, is "git-read-tree -m HEAD" actually equivalent to "git-read-tree HEAD"
and does it simply write complete index (ignoring the old one)
corresponding to given HEAD?
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: How do I...
From: David Woodhouse @ 2005-05-06 18:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Frank Sorenson, git
In-Reply-To: <Pine.LNX.4.58.0505061006060.2233@ppc970.osdl.org>

On Fri, 2005-05-06 at 10:09 -0700, Linus Torvalds wrote:
> So now you can do
> 
>         git-rev-list HEAD --max-count=10 | git-diff-tree --stdin update-cache.c
> 
> to see which of the last 10 commits changed "update-cache.c".

Now show the graph of revision history which connects those commits.

-- 
dwmw2


^ permalink raw reply

* Re: How do I...
From: Dave Kleikamp @ 2005-05-06 17:58 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Git Mailing List
In-Reply-To: <427B9DC5.9060905@tuxrocks.com>

On Fri, 2005-05-06 at 10:39 -0600, Frank Sorenson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Dave Kleikamp wrote:
> > On Fri, 2005-05-06 at 03:49 -0600, Frank Sorenson wrote:
> > 
> > 
> >>After doing a cg-update, can I cg-log just the changes since the last
> >>update?  Alternatively, how can I tell cg-log I'm caught up, and don't
> >>need anything historical?
> > 
> > 
> > (Assuming pulling from "origin")
> > Instead of doing cg-update, do cg-pull.  Then "cg-log :origin" will give
> > you you the changesets you just pulled.

I'm not sure if I was clear.  cg-pull should be 'cg-pull origin".

> Super.  This works great.  Thanks.
> 
> > "cg-merge origin" will then
> > complete operation, thereby catching you up.
> 
> Okay, not quite so great.  Here's the ouput when I ran it to update my
> kernel this morning.  Note that I haven't made any local modifications.
>  I'm seeing this sort of thing often enough that I'm blowing away my
> whole git tree and regenerating it to get back to a stable state once or
> twice a week.  I'm sure there's another way, but without me making
> modifications on my end, I wouldn't expect this to happen.  Suggestions
> are welcome! :)

I take this to mean you're seeing problems with cg-update too. cg-update
simply runs cg-pull & cg-merge together, so running them separately
shouldn't make any difference.

> # cg-merge origin
> Fast-forwarding 6741f3a7f9922391cd02b3ca1329e669497dc22f ->
> 2512809255d018744fe6c2f5e996c83769846c07
>         on top of 6741f3a7f9922391cd02b3ca1329e669497dc22f...
> patching file fs/proc/Makefile
> patching file fs/proc/array.c
> patching file fs/proc/base.c
> patching file fs/proc/generic.c
> patching file fs/proc/inode-alloc.txt
> patching file fs/proc/inode.c
> patching file fs/proc/internal.h
> patching file fs/proc/kcore.c
> patching file fs/proc/kmsg.c
> patching file fs/proc/mmu.c
> patching file fs/proc/nommu.c
> patching file fs/proc/proc_devtree.c
> patching file fs/proc/proc_misc.c
> patching file fs/proc/proc_tty.c
> patching file fs/proc/root.c
> patching file fs/proc/task_mmu.c
> patching file fs/proc/task_nommu.c
> touch: cannot touch `fs/proc/Makefile': No such file or directory
> touch: cannot touch `fs/proc/array.c': No such file or directory
> touch: cannot touch `fs/proc/base.c': No such file or directory
> touch: cannot touch `fs/proc/generic.c': No such file or directory
> touch: cannot touch `fs/proc/inode-alloc.txt': No such file or directory
> touch: cannot touch `fs/proc/inode.c': No such file or directory
> touch: cannot touch `fs/proc/internal.h': No such file or directory
> touch: cannot touch `fs/proc/kcore.c': No such file or directory
> touch: cannot touch `fs/proc/kmsg.c': No such file or directory
> touch: cannot touch `fs/proc/mmu.c': No such file or directory
> touch: cannot touch `fs/proc/nommu.c': No such file or directory
> touch: cannot touch `fs/proc/proc_devtree.c': No such file or directory
> touch: cannot touch `fs/proc/proc_misc.c': No such file or directory
> touch: cannot touch `fs/proc/proc_tty.c': No such file or directory
> touch: cannot touch `fs/proc/root.c': No such file or directory
> touch: cannot touch `fs/proc/task_mmu.c': No such file or directory
> touch: cannot touch `fs/proc/task_nommu.c': No such file or directory
> rm: cannot remove `fs/proc/Makefile': No such file or directory
> rm: cannot remove `fs/proc/array.c': No such file or directory
> rm: cannot remove `fs/proc/base.c': No such file or directory
> rm: cannot remove `fs/proc/generic.c': No such file or directory
> rm: cannot remove `fs/proc/inode-alloc.txt': No such file or directory
> rm: cannot remove `fs/proc/inode.c': No such file or directory
> rm: cannot remove `fs/proc/internal.h': No such file or directory
> rm: cannot remove `fs/proc/kcore.c': No such file or directory
> rm: cannot remove `fs/proc/kmsg.c': No such file or directory
> rm: cannot remove `fs/proc/mmu.c': No such file or directory
> rm: cannot remove `fs/proc/nommu.c': No such file or directory
> rm: cannot remove `fs/proc/proc_devtree.c': No such file or directory
> rm: cannot remove `fs/proc/proc_misc.c': No such file or directory
> rm: cannot remove `fs/proc/proc_tty.c': No such file or directory
> rm: cannot remove `fs/proc/root.c': No such file or directory
> rm: cannot remove `fs/proc/task_mmu.c': No such file or directory
> rm: cannot remove `fs/proc/task_nommu.c': No such file or directory
> fs/proc/Makefile: needs update
> fs/proc/array.c: needs update
> fs/proc/base.c: needs update
> fs/proc/generic.c: needs update
> fs/proc/inode-alloc.txt: needs update
> fs/proc/inode.c: needs update
> fs/proc/internal.h: needs update
> fs/proc/kcore.c: needs update
> fs/proc/kmsg.c: needs update
> fs/proc/mmu.c: needs update
> fs/proc/nommu.c: needs update
> fs/proc/proc_devtree.c: needs update
> fs/proc/proc_misc.c: needs update
> fs/proc/proc_tty.c: needs update
> fs/proc/root.c: needs update
> fs/proc/task_mmu.c: needs update
> fs/proc/task_nommu.c: needs update

I've seen some isolated problems running cg-update/cg-merge to a clean
tree with files that have been deleted.

Saw this this morning:

shaggy@kleikamp linus-clean $ cg-merge origin
Fast-forwarding bfd4bda097f8758d28e632ff2035e25577f6b060 ->
2512809255d018744fe6c2f5e996c83769846c07
        on top of bfd4bda097f8758d28e632ff2035e25577f6b060...
patching file drivers/video/intelfb/intelfb.h
shaggy@kleikamp linus-clean $ cg-status
? drivers/video/intelfb/intelfbdrv.h

Removing the file manually appears to fix it.

> Thanks,
> Frank
> - --
> Frank Sorenson - KD7TZK
> Systems Manager, Computer Science Department
> Brigham Young University
> frank@tuxrocks.com

-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply

* [PATCH] don't load and decompress objects twice with parse_object()
From: Nicolas Pitre @ 2005-05-06 17:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

It turns out that parse_object() is loading and decompressing given 
object to free it just before calling the specific object parsing 
function which does mmap and decompress the same object again. This 
patch introduces the ability to parse specific objects directly from a 
memory buffer.

Without this patch, running git-fsck-cache on the kernel repositorytake:

	real    0m13.006s
	user    0m11.421s
	sys     0m1.218s

With this patch applied:

	real    0m8.060s
	user    0m7.071s
	sys     0m0.710s

The performance increase is significant, and this is kind of a 
prerequisite for sane delta object support with fsck.

Signed-off-by: Nicolas Pitre <nico@cam.org>

Index: git/tag.c
===================================================================
--- git.orig/tag.c
+++ git/tag.c
@@ -21,11 +21,8 @@
         return (struct tag *) obj;
 }
 
-int parse_tag(struct tag *item)
+int parse_tag_buffer(struct tag *item, void *data, unsigned long size)
 {
-        char type[20];
-        void *data, *bufptr;
-        unsigned long size;
 	int typelen, taglen;
 	unsigned char object[20];
 	const char *type_line, *tag_line, *sig_line;
@@ -33,20 +30,11 @@
         if (item->object.parsed)
                 return 0;
         item->object.parsed = 1;
-        data = bufptr = read_sha1_file(item->object.sha1, type, &size);
-        if (!data)
-                return error("Could not read %s",
-                             sha1_to_hex(item->object.sha1));
-        if (strcmp(type, tag_type)) {
-		free(data);
-                return error("Object %s not a tag",
-                             sha1_to_hex(item->object.sha1));
-	}
 
 	if (size < 64)
-		goto err;
+		return -1;
 	if (memcmp("object ", data, 7) || get_sha1_hex(data + 7, object))
-		goto err;
+		return -1;
 
 	item->tagged = parse_object(object);
 	if (item->tagged)
@@ -54,29 +42,47 @@
 
 	type_line = data + 48;
 	if (memcmp("\ntype ", type_line-1, 6))
-		goto err;
+		return -1;
 
 	tag_line = strchr(type_line, '\n');
 	if (!tag_line || memcmp("tag ", ++tag_line, 4))
-		goto err;
+		return -1;
 
 	sig_line = strchr(tag_line, '\n');
 	if (!sig_line)
-		goto err;
+		return -1;
 	sig_line++;
 
 	typelen = tag_line - type_line - strlen("type \n");
 	if (typelen >= 20)
-		goto err;
+		return -1;
 	taglen = sig_line - tag_line - strlen("tag \n");
 	item->tag = xmalloc(taglen + 1);
 	memcpy(item->tag, tag_line + 4, taglen);
 	item->tag[taglen] = '\0';
 
-	free(data);
 	return 0;
+}
+
+int parse_tag(struct tag *item)
+{
+        char type[20];
+        void *data;
+        unsigned long size;
+	int ret;
 
-err:
+        if (item->object.parsed)
+                return 0;
+        data = read_sha1_file(item->object.sha1, type, &size);
+        if (!data)
+                return error("Could not read %s",
+                             sha1_to_hex(item->object.sha1));
+        if (strcmp(type, tag_type)) {
+		free(data);
+                return error("Object %s not a tag",
+                             sha1_to_hex(item->object.sha1));
+	}
+	ret = parse_tag_buffer(item, data, size);
 	free(data);
-	return -1;
+	return ret;
 }
Index: git/tree.c
===================================================================
--- git.orig/tree.c
+++ git/tree.c
@@ -88,24 +88,14 @@
 	return (struct tree *) obj;
 }
 
-int parse_tree(struct tree *item)
+int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 {
-	char type[20];
-	void *buffer, *bufptr;
-	unsigned long size;
+	void *bufptr = buffer;
 	struct tree_entry_list **list_p;
+
 	if (item->object.parsed)
 		return 0;
 	item->object.parsed = 1;
-	buffer = bufptr = read_sha1_file(item->object.sha1, type, &size);
-	if (!buffer)
-		return error("Could not read %s",
-			     sha1_to_hex(item->object.sha1));
-	if (strcmp(type, tree_type)) {
-		free(buffer);
-		return error("Object %s not a tree",
-			     sha1_to_hex(item->object.sha1));
-	}
 	list_p = &item->entries;
 	while (size) {
 		struct object *obj;
@@ -115,10 +105,8 @@
 		char *path = strchr(bufptr, ' ');
 		unsigned int mode;
 		if (size < len + 20 || !path || 
-		    sscanf(bufptr, "%o", &mode) != 1) {
-			free(buffer);
+		    sscanf(bufptr, "%o", &mode) != 1)
 			return -1;
-		}
 
 		entry = xmalloc(sizeof(struct tree_entry_list));
 		entry->name = strdup(path + 1);
@@ -144,6 +132,28 @@
 		*list_p = entry;
 		list_p = &entry->next;
 	}
-	free(buffer);
 	return 0;
 }
+
+int parse_tree(struct tree *item)
+{
+	 char type[20];
+	 void *buffer;
+	 unsigned long size;
+	 int ret;
+
+	if (item->object.parsed)
+		return 0;
+	buffer = read_sha1_file(item->object.sha1, type, &size);
+	if (!buffer)
+		return error("Could not read %s",
+			     sha1_to_hex(item->object.sha1));
+	if (strcmp(type, tree_type)) {
+		free(buffer);
+		return error("Object %s not a tree",
+			     sha1_to_hex(item->object.sha1));
+	}
+	ret = parse_tree_buffer(item, buffer, size);
+	free(buffer);
+	return ret;
+}
Index: git/blob.c
===================================================================
--- git.orig/blob.c
+++ git/blob.c
@@ -22,21 +22,29 @@
 	return (struct blob *) obj;
 }
 
+int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
+{
+        item->object.parsed = 1;
+	return 0;
+}
+
 int parse_blob(struct blob *item)
 {
         char type[20];
         void *buffer;
         unsigned long size;
+	int ret;
+
         if (item->object.parsed)
                 return 0;
-        item->object.parsed = 1;
         buffer = read_sha1_file(item->object.sha1, type, &size);
         if (!buffer)
                 return error("Could not read %s",
                              sha1_to_hex(item->object.sha1));
-	free(buffer);
         if (strcmp(type, blob_type))
                 return error("Object %s not a blob",
                              sha1_to_hex(item->object.sha1));
-	return 0;
+	ret = parse_blob_buffer(item, buffer, size);
+	free(buffer);
+	return ret;
 }
Index: git/tag.h
===================================================================
--- git.orig/tag.h
+++ git/tag.h
@@ -13,6 +13,7 @@
 };
 
 extern struct tag *lookup_tag(unsigned char *sha1);
+extern int parse_tag_buffer(struct tag *item, void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
 
 #endif /* TAG_H */
Index: git/commit.c
===================================================================
--- git.orig/commit.c
+++ git/commit.c
@@ -41,24 +41,14 @@
 	return date;
 }
 
-int parse_commit(struct commit *item)
+int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
 {
-	char type[20];
-	void * buffer, *bufptr;
-	unsigned long size;
+	void *bufptr = buffer;
 	unsigned char parent[20];
+
 	if (item->object.parsed)
 		return 0;
 	item->object.parsed = 1;
-	buffer = bufptr = read_sha1_file(item->object.sha1, type, &size);
-	if (!buffer)
-		return error("Could not read %s",
-			     sha1_to_hex(item->object.sha1));
-	if (strcmp(type, commit_type)) {
-		free(buffer);
-		return error("Object %s not a commit",
-			     sha1_to_hex(item->object.sha1));
-	}
 	get_sha1_hex(bufptr + 5, parent);
 	item->tree = lookup_tree(parent);
 	if (item->tree)
@@ -74,10 +64,32 @@
 		bufptr += 48;
 	}
 	item->date = parse_commit_date(bufptr);
-	free(buffer);
 	return 0;
 }
 
+int parse_commit(struct commit *item)
+{
+	char type[20];
+	void *buffer;
+	unsigned long size;
+	int ret;
+
+	if (item->object.parsed)
+		return 0;
+	buffer = read_sha1_file(item->object.sha1, type, &size);
+	if (!buffer)
+		return error("Could not read %s",
+			     sha1_to_hex(item->object.sha1));
+	if (strcmp(type, commit_type)) {
+		free(buffer);
+		return error("Object %s not a commit",
+			     sha1_to_hex(item->object.sha1));
+	}
+	ret = parse_commit_buffer(item, buffer, size);
+	free(buffer);
+	return ret;
+}
+
 void commit_list_insert(struct commit *item, struct commit_list **list_p)
 {
 	struct commit_list *new_list = xmalloc(sizeof(struct commit_list));
Index: git/object.c
===================================================================
--- git.orig/object.c
+++ git/object.c
@@ -104,6 +104,7 @@
 	unsigned long mapsize;
 	void *map = map_sha1_file(sha1, &mapsize);
 	if (map) {
+		struct object *obj;
 		char type[100];
 		unsigned long size;
 		void *buffer = unpack_sha1_file(map, mapsize, type, &size);
@@ -112,26 +113,27 @@
 			return NULL;
 		if (check_sha1_signature(sha1, buffer, size, type) < 0)
 			printf("sha1 mismatch %s\n", sha1_to_hex(sha1));
-		free(buffer);
 		if (!strcmp(type, "blob")) {
-			struct blob *ret = lookup_blob(sha1);
-			parse_blob(ret);
-			return &ret->object;
+			struct blob *blob = lookup_blob(sha1);
+			parse_blob_buffer(blob, buffer, size);
+			obj = &blob->object;
 		} else if (!strcmp(type, "tree")) {
-			struct tree *ret = lookup_tree(sha1);
-			parse_tree(ret);
-			return &ret->object;
+			struct tree *tree = lookup_tree(sha1);
+			parse_tree_buffer(tree, buffer, size);
+			obj = &tree->object;
 		} else if (!strcmp(type, "commit")) {
-			struct commit *ret = lookup_commit(sha1);
-			parse_commit(ret);
-			return &ret->object;
+			struct commit *commit = lookup_commit(sha1);
+			parse_commit_buffer(commit, buffer, size);
+			obj = &commit->object;
 		} else if (!strcmp(type, "tag")) {
-			struct tag *ret = lookup_tag(sha1);
-			parse_tag(ret);
-			return &ret->object;
+			struct tag *tag = lookup_tag(sha1);
+			parse_tag_buffer(tag, buffer, size);
+			obj = &tag->object;
 		} else {
-			return NULL;
+			obj = NULL;
 		}
+		free(buffer);
+		return obj;
 	}
 	return NULL;
 }
Index: git/tree.h
===================================================================
--- git.orig/tree.h
+++ git/tree.h
@@ -25,6 +25,8 @@
 
 struct tree *lookup_tree(unsigned char *sha1);
 
+int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
+
 int parse_tree(struct tree *tree);
 
 #endif /* TREE_H */
Index: git/blob.h
===================================================================
--- git.orig/blob.h
+++ git/blob.h
@@ -11,6 +11,8 @@
 
 struct blob *lookup_blob(unsigned char *sha1);
 
+int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
+
 int parse_blob(struct blob *item);
 
 #endif /* BLOB_H */
Index: git/commit.h
===================================================================
--- git.orig/commit.h
+++ git/commit.h
@@ -20,6 +20,8 @@
 
 struct commit *lookup_commit(unsigned char *sha1);
 
+int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size);
+
 int parse_commit(struct commit *item);
 
 void commit_list_insert(struct commit *item, struct commit_list **list_p);

^ permalink raw reply

* final report: chunking
From: C. Scott Ananian @ 2005-05-06 17:48 UTC (permalink / raw)
  To: git

So it looks like the xdelta/zdelta approach is the clear win for 
repository compression.  I had been working on a merkle-hash-treap 
chunking scheme, and I thought I'd present my results for the record, even 
though the benefits turned out to be not-that-great.

Basically, the chunking scheme aims to save space by only storing *one* 
copy of each unique 'chunk' in the file.  The chunking points are decided 
using a content-sensitive checksum over a window, so 
additions/deletions/mutations of the file won't cause all the chunking 
points to move.  The chunks are then arranged in a heap-structured tree 
(treap) with content stored in the nodes as well as at the leaves, so 
that: 1) a one-chunk file stays in one file (there's no a separate 'chunk 
index'), and 2) versions of the file while share entire subtrees of chunks 
can share those subtrees.

Space savings are expected due to the chunk-treap sharing; balanced 
against increases in space caused by:
  1) filesystem blocking (small files waste space)
  2) headers on each chunk (more of a problem the smaller the chunks get)
and
  3) zlib compression doesn't work as well on small chunks as it does on
     large files.

The last was addressed by realizing that interior nodes of the tree can 
use their entire left subtree as a zlib 'dictionary', which greatly 
improves compression efficiency.  The tree structure constrains this:
about half of the nodes are expected to be leaves, which can't use 
dictionaries, another 1/4 are on level up, so they can only use a 
single-chunk dictionary, etc.  Nevertheless, this dictionary hack 
basically solves the issue, at little cost in decompression time.

The 'chunk headers' issue was addressed by using a very efficient chunk 
representation.  The 'type' of the chunk was set to the one-character 
string '0', '1', '2', or '3' (depending on whether it had left/right 
children), and the SHA1 of the child chunks was suffixed to the end of the 
file in binary form, so that zlib wouldn't try to 'compress' this 
uncompressible data.

Using a benchmark repository consisting of all released versions of the 
2.4, 2.5, and 2.6 kernels, using these two techniques the repository was 
compressed from 821 Mb in 176,251 files to 802 Mb in 337,273 files, using
a 'expected chunk size' of 16,381 bytes.  (I ran benchmarks with chunk 
sizes ranging from ~500 bytes to ~65k bytes; this was the best.)
For this 2% 'ideal' (or 'network') size improvement, the 'real' disk space 
(on an ext3 filesystem) increased from 1.2G to 1.6G.

The 'rolling checksum' scheme of chunking files yields a geometric size 
distribution, which is rather suboptimal, as you have lots of very small 
pieces and a long tail extending to rather large chunk sizes.  You'd 
rather have something like a normal distribution centered around something 
just shy of a disk block.  Further, since chunking is done before 
compression, the compressed chunk sizes are very unpredictable.  This 
makes it hard to pack them efficiently in disk blocks.

Also, it's not clear that a 'binary' tree structure is best; most on-disk 
structures benefit from higher arity.

So, the final improvement to the algorithm was to 'block' subtrees 
together so that their *compressed* size was as large as possible w/o 
overflowing the 'block size'.  This necessitated a tweak to the on-disk 
block header format as well.  The algorithm was then much less sensitive 
to 'chunk size', although a number of chunk sizes and block sizes were 
benchmarked.  On the same respository as described before, the best space 
savings were obtained using an expected average chunk size of 16,381 
bytes, and a 4k 'block size'.  This yielded 275,105 files taking up 803 Mb 
ideal/network, and 1.4Gb 'real' ext3 disk space.

I've got a lot more numbers, if anyone is interested.

Note that I'd expect to do slightly better on the benchmarks if I were 
looking at 'smaller' changesets.  Since I'm only looking at 'released' 
kernels, the changes between one version of the file and the next are 
rather big, which means a rather larger number of chunks are unshareable 
between versions.
  --scott

jihad Sugar Grove LCPANES QKFLOWAGE ESSENCE milita TASS Marxist Justice 
BOND HTPLUME JMTIDE JMWAVE MHCHAOS security Saddam Hussein genetic
                          ( http://cscott.net/ )
------
This is the code, against (ancient) git head
5750e913cfe75e20d0bbee4e368b6f1321014877

--- /dev/null	2005-04-27 10:20:44.511990864 -0400
+++ git.repo/chunk.c	2005-05-06 09:54:06.529832174 -0400
@@ -0,1 +1,900 @@
+/*
+ * This file implements a treap-based chunked content store.  The
+ * idea is that every stored file is broken down into tree-structured
+ * chunks (that is, every chunk has an optional 'prefix' and 'suffix'
+ * chunk), and these chunks are put in the object store.  This way
+ * similar files will be expected to share chunks, saving space.
+ * Files less than one disk block long are expected to fit in a single
+ * chunk, so there is no extra indirection overhead for this case.
+ *
+ * Copyright (C) 2005 C. Scott Ananian <cananian@alumni.princeton.edu>
+ */
+
+/*
+ * We assume that the file and the chunk information all fits in memory.
+ * A slightly more-clever implementation would work even if the file
+ * didn't fit.  Basically, we could scan it an keep the
+ * 'N' lowest heap keys (chunk hashes), where 'N' is chosen to fit
+ * comfortably in memory.  These would form the root and top
+ * of the resulting treap, constructing it top-down.  Then we'd scan
+ * again any only keep the next 'N' lowest heap keys, etc.
+ *
+ * But we're going to keep things simple.  We do try to maintain locality
+ * where possible, so if you need to swap things still shouldn't be too bad.
+ */
+
+#include <assert.h>
+#include <stdlib.h>
+#include "cache.h"
+#include "chunk.h"
+
+typedef unsigned long ch_size_t;
+
+/* Our magic numbers: these can be tuned without breaking files already
+ * in the archive, although space re-use is only expected between files which
+ * have these constants set to the same values. */
+
+/* The window size determines how much context we use when looking for a
+ * chunk boundary.
+ * C source has approx 5 bits per character of entropy.
+ * We'd like to get 32 bits of good entropy into our boundary checksum;
+ * that means 7 bytes is a rough minimum for the window size.
+ * 30 bytes is what 'rsyncable zlib' uses; that should be fine. */
+#define ROLLING_WINDOW 33
+/* The ideal chunk size will fit most chunks into a disk block.  A typical
+ * disk block size is 4k, and we expect (say) 50% compression. */
+/* some primes: 61 127 251 509 1021 2039 4091 8191 16381 32749 65521 */
+//#define CHUNK_SIZE 7901 /* primes are nice to use */
+#define CHUNK_SIZE 16381
+/* This is the ideal size of a compressed on-disk chunk, which will include
+ * several CHUNK_SIZE pieces.  Typically larger than CHUNK_SIZE. */
+#define FILE_BLOCK_SIZE 4096
+
+#define WINDOW_MAGIC 0x0000 /* less than CHUNK_SIZE */
+
+/* Data structures: */
+struct chunk {
+    /* a chunk represents some range of the underlying file */
+    ch_size_t start /* inclusive */, end /*exclusive*/;
+    unsigned char sha1[20]; /* sha1 for this chunk; used as the heap key */
+};
+struct chunklist {
+    /* a dynamically-sized list of chunks */
+    struct chunk *chunk; /* an array of chunks */
+    ch_size_t num_items; /* how many items are currently in the list */
+    ch_size_t allocd;    /* how many items we've allocated space for */
+};
+struct treap {
+    /* A treap node represents a run of consecutive chunks. */
+
+    /* the start and end of the run: */
+    ch_size_t start /* inclusive */, end /*exclusive*/;
+    struct chunk *chunk; /* some chunk in the run. */
+    /* treaps representing the run before 'chunk' (left) and
+     * after 'chunk' (right).  */
+    struct treap *left, *right;
+    /* sha1 for the run represented by this treap */
+    unsigned char sha1[20];
+    /* is this the root of a 'large enough' run? */
+    int block_root;
+};
+
+static struct chunklist *
+create_chunklist(int expected_items) {
+    struct chunklist *cl = malloc(sizeof(*cl));
+    assert(expected_items > 0);
+    cl->num_items = 0;
+    cl->allocd = expected_items;
+    cl->chunk = malloc(sizeof(cl->chunk[0]) * cl->allocd);
+    return cl;
+}
+static void
+free_chunklist(struct chunklist *cl) {
+    free(cl->chunk);
+    free(cl);
+}
+
+/* Add a chunk to the chunk list, calculating its SHA1 in the process. */
+/* The chunk includes buf[start] to buf[end-1].                        */
+static void
+add_chunk(struct chunklist *cl, char *buf, ch_size_t start, ch_size_t end) {
+    struct chunk *ch;
+    SHA_CTX c;
+    assert(start<end); assert(cl); assert(buf);
+    if (cl->num_items >= cl->allocd) {
+	cl->allocd *= 2;
+	cl->chunk = realloc(cl->chunk, cl->allocd * sizeof(*(cl->chunk)));
+    }
+    assert(cl->num_items < cl->allocd);
+    ch = cl->chunk + (cl->num_items++);
+    ch->start = start;
+    ch->end = end;
+    /* compute SHA-1 of the chunk. */
+    SHA1_Init(&c);
+    SHA1_Update(&c, buf+start, end-start);
+    SHA1_Final(ch->sha1, &c);
+    /* done! */
+}
+
+/* Split a buffer into chunks, using an adler-32 checksum over ROLLING_WINDOW
+ * bytes to determine chunk boundaries.  We try to split chunks into pieces
+ * whose size averages out to be 'CHUNK_SIZE' (nice if this is a prime).
+ * Note however that we get a geometric distribution of chunk sizes, with
+ * a preponderance of 'small' chunks, and a very long tail occassionally
+ * yielding very large chunks.   Our later treap-blocking pass attempts to
+ * normalize this somewhat by lumping together small chunks. */
+static void
+chunkify(struct chunklist *cl, char *buf, ch_size_t size) {
+    int i, adler_s1=1, adler_s2=0, last=-1;
+
+    for (i=0; i<size; i++) {
+	if (i >= ROLLING_WINDOW) { /* After window is full: */
+	    /* Old character out */
+	    adler_s1 = (65521 + adler_s1 - (unsigned char)buf[i-ROLLING_WINDOW]) % 65521;
+	    adler_s2 = (65521 + adler_s2 - ROLLING_WINDOW * (unsigned char)buf[i-ROLLING_WINDOW]) % 65521;
+	}
+	/* New character in */
+	adler_s1 = (adler_s1 + (unsigned char)buf[i]) % 65521;
+	adler_s2 = (adler_s2 + adler_s1) % 65521;
+	/* Is this the end of a chunk? */
+	if (WINDOW_MAGIC == ((adler_s1 + adler_s2*65536) % CHUNK_SIZE)) {
+	    add_chunk(cl, buf, last+1, i+1);
+	    last = i;
+	}
+    }
+    /* One last chunk at the end: */
+    if (last+1!=size)
+	add_chunk(cl, buf, last+1, size);
+    /* done! */
+}
+
+/* A treap is a 'heap-ordered tree'.  There are two constraints maintained:
+ *   left tree key < this tree key < right tree key
+ * and
+ *   this heap key < left and right heap keys.
+ * We use the sha1 of the chunk (chunk->sha1) as the heap key and the
+ * file location (chunk->start) as the tree key.
+ * For more info on treaps, see:
+ *   C. R. Aragon and R. G. Seidel, "Randomized search trees",
+ *   Proc. 30th IEEE FOCS (1989), 540-545.
+ * There are many possible binary trees we could build; enforcing the
+ * heap constraint ensures that similar files will build similar trees.
+ * (The root of the constructed tree will always be the chunk with the
+ *  smallest hash key; its left child will be the chunk with the smallest
+ *  hash among those chunks before the root in file order; and so on
+ *  recursively.)
+ */
+
+/* Compare the 'heap keys' of two chunks. */
+static int
+chunk_hash_cmp(struct chunk *c1, struct chunk *c2) {
+    int c = memcmp(c1->sha1, c2->sha1, sizeof(c1->sha1));
+    if (c!=0) return c;
+    /* Use file location to break ties (caused by repeated content w/in
+     * a single file).  This ensures that our heap keys are unique. */
+    return c1->start - c2->start;
+}
+
+/* Assertion helper: check tree and heap constraints. */
+static int
+treap_valid(struct treap *t) {
+    if (!t) return 1;
+    if (t->chunk==NULL) return 0;
+    if (t->left!=NULL) {
+	/* Tree constraint. */
+	assert(t->left->chunk->start < t->chunk->start);
+	/* Heap constraint. */
+	assert(chunk_hash_cmp(t->chunk, t->left->chunk) < 0);
+	/* 'start' validity */
+	assert(t->start == t->left->start);
+    } else
+	assert(t->start == t->chunk->start);
+    if (t->right!=NULL) {
+	/* Tree constraint. */
+	assert(t->chunk->start < t->right->chunk->start);
+	/* Heap constraint. */
+	assert(chunk_hash_cmp(t->chunk, t->right->chunk) < 0);
+	/* 'end' validity. */
+	assert(t->end == t->right->end);
+    } else
+	assert(t->end == t->chunk->end);
+    return 1;
+}
+
+/* Restore heap constraint without disturbing tree ordering. */
+/* Only the root of the given treap will violate the heap constraint. */
+static struct treap *
+treapify(struct treap *t) {
+    struct treap *x, *y, *a, *b, *c;
+    int left_ok, right_ok, rotate_left;
+    assert(treap_valid(t->left));
+    assert(treap_valid(t->right));
+    left_ok = (t->left == NULL) ||
+	(chunk_hash_cmp(t->chunk, t->left->chunk) < 0);
+    right_ok = (t->right == NULL) ||
+	(chunk_hash_cmp(t->chunk, t->right->chunk) < 0);
+    if (left_ok && right_ok) { /* well, that's easy */
+	assert(treap_valid(t));
+	return t;
+    }
+    /* okay, someone needs to rotate */
+    rotate_left = (!left_ok) &&
+	(right_ok || /* if neither is okay, then rotate smallest up */
+	 chunk_hash_cmp(t->left->chunk, t->right->chunk) < 0);
+    /*   Rotation: (note that tree order is maintained)
+     *     y   -bring left up->  x 
+     *    / \                   / \
+     *   x   c                 a   y
+     *  / \                       / \
+     * a   b <-bring right up-   b   c
+     */
+    if (rotate_left) {
+	y = t;  x = y->left;  c = y->right;  a = x->left;  b = x->right;
+	y->left = b;
+	y->right = c;
+	y->start = y->left ? y->left->start : y->chunk->start;
+	y->end = y->right ? y->right->end : y->chunk->end;
+	x->left = a;
+	x->right = treapify(y); // recurse to check heap constraint
+	x->start = x->left ? x->left->start : x->chunk->start;
+	x->end = x->right ? x->right->end : x->chunk->end;
+	assert(treap_valid(x));
+	return x;
+    } else {
+	x = t;  a = x->left;  y = x->right;  b = y->left;  c = y->right;
+	x->left = a;
+	x->right = b;
+	x->start = x->left ? x->left->start : x->chunk->start;
+	x->end = x->right ? x->right->end : x->chunk->end;
+	y->right = c;
+	y->left = treapify(x); // recurse to check heap constraint.
+	y->start = y->left ? y->left->start : y->chunk->start;
+	y->end = y->right ? y->right->end : y->chunk->end;
+	assert(treap_valid(y));
+	return y;
+    }
+}
+
+/* Use list of chunks to build treap bottom-up, calling treapify to
+ * restore heap order on the subtree after we add each interior node.
+ * This is O(N), where N is the number of chunks. */
+static struct treap *
+build_treap(struct chunklist *cl, int chunk_st, int chunk_end) {
+    struct treap *result;
+    /* Some treaps are trivial to build: */
+    if (chunk_st >= chunk_end) return NULL;
+    /* Claim a chunk in the middle for ourself. */
+    int c = (chunk_st + chunk_end)/2;
+    result = (struct treap *)malloc(sizeof(*result));
+    result->chunk = &(cl->chunk[c]);
+    /* Divide and conquer: build well-formed treaps for our kids.*/
+    result->left = build_treap(cl, chunk_st, c);
+    result->right = build_treap(cl, c+1, chunk_end);
+    result->start = result->left ? result->left->start : result->chunk->start;
+    result->end = result->right ? result->right->end : result->chunk->end;
+    result->block_root = 0;
+    /* Now we need to ensure that the heap constraint is satisfied; that is,
+     * result->chunk->sha1 < result->left->chunk->sha1  and
+     * result->chunk->sha1 < result->right->chunk->sha1.
+     */
+    assert(treap_valid(result->left));
+    assert(treap_valid(result->right));
+    return treapify(result);
+}
+
+static void
+free_treap(struct treap *t) {
+    if (!t) return;
+    free_treap(t->left);
+    free_treap(t->right);
+    free(t);
+}
+
+static int
+treap_depth(struct treap *t) {
+    int l, r;
+    if (!t) return 0;
+    l = treap_depth(t->left);
+    r = treap_depth(t->right);
+    return 1 + ((l > r) ? l : r);
+}
+
+/* Fill in the treap hashes.  This will be O(N ln M), where N is the
+ * file length and M is the number of chunks.  We could actually do
+ * this in 2*N time if the subtree hashes were prefix-identical.
+ * Since we need to include the chunk length in the hash prefix,
+ * we can't reuse the hashing context and we need to pay the extra
+ * O(ln M) factor. */
+static void
+do_treap_hash(struct treap *t, void *data, SHA_CTX *accum, int accum_len) {
+    char prefix[200];
+    SHA_CTX *cp;
+    int i;
+
+    assert(treap_valid(t));
+    if (!t) return;
+
+    if (t->block_root) {
+	/* Start a new treap context. */
+	cp = &(accum[accum_len++]);
+	SHA1_Init(cp);
+	/* Sticking the size in the prefix makes me unhappy. =( */
+	SHA1_Update
+	    (cp, prefix, 1+sprintf(prefix, "blob %lu", t->end - t->start));
+    }
+    /* Recurse on the left. */
+    do_treap_hash(t->left, data, accum, accum_len);
+    /* Add in our chunk. */
+    for (i=0; i<accum_len; i++)
+	SHA1_Update(accum + i, data + t->chunk->start,
+		    t->chunk->end - t->chunk->start);
+    /* Recurse on the right. */
+    do_treap_hash(t->right, data, accum, accum_len);
+    /* Finalize and write it to t->sha1. */
+    if (t->block_root)
+	SHA1_Final(t->sha1, cp);
+    /* Done! */
+}
+/* Helper method. */
+static void
+compute_treap_hashes(struct treap *t, void *data) {
+    /* Allocate space for each level of the treap to have its own context. */
+    SHA_CTX contexts[treap_depth(t)];
+    do_treap_hash(t, data, contexts, 0);
+}
+/* Yuck. */
+static const char *
+compute_null_treap_hash() {
+    static const char fixed[] = { "blob 0" };
+    static char sha1[20], *cp=NULL;
+    SHA_CTX c;
+    if (cp) return cp;
+    SHA1_Init(&c);
+    SHA1_Update(&c, fixed, sizeof(fixed));
+    SHA1_Final(sha1, &c);
+    cp = sha1;
+    return cp;
+}
+
+/* Chunk-blocking code. */
+/* Traverse tree in-order to find largest subtrees whose compressed stream
+ * is less than FILE_BLOCK_SIZE. */
+struct block_info {
+    z_stream z_ctxt; /* compression state (dictionaries, etc) */
+    int size; /* size of the compressed subtree, including header info */
+    int has_data; /* have we put anything in this block yet? */
+};
+#define SIZE_IS_OKAY(block) ((block)->size < (FILE_BLOCK_SIZE-20/*slop*/))
+
+static void
+init_block(struct block_info *block, int is_reset) {
+    if (is_reset) {
+	deflateReset(&(block->z_ctxt));
+	block->has_data = 0;
+    } else {
+	memset(block, 0, sizeof(*block));
+	deflateInit(&(block->z_ctxt), Z_BEST_COMPRESSION);
+    }
+    /* 1 byte blocked-chunk header, possible leading/trailing zeros */
+    block->size = 3;
+}
+
+static void
+block_add_chunk(struct chunk *c, void *data, struct block_info *block) {
+    char buf[512];
+    block->z_ctxt.next_in = data + c->start;
+    block->z_ctxt.avail_in = c->end - c->start;
+    while (SIZE_IS_OKAY(block)) {
+	block->z_ctxt.next_out = buf;
+	block->z_ctxt.avail_out = sizeof(buf);
+	if (deflate(&(block->z_ctxt), 0)!=Z_OK) break; // done.
+	block->size += sizeof(buf) - block->z_ctxt.avail_out;
+    }
+    block->size += 2; // chunk size bytes (estimate)
+    block->has_data = 1;
+}
+static void
+block_add_treap_chunk(struct treap *t, void *data, struct block_info *block) {
+    if (t->left && !block->has_data) // use better dictionary.
+	deflateSetDictionary(&(block->z_ctxt), data + t->left->start,
+			     t->left->end - t->left->start);
+    block_add_chunk(t->chunk, data, block);
+}
+
+static void
+block_add_subtree(struct treap *t, void *data, struct block_info *block) {
+    if (!SIZE_IS_OKAY(block)) return; // bail early
+    if (t==NULL) return;
+    if (t->block_root) {
+	block->size+=20; // SHA for the omitted root
+	return;
+    }
+    if (t->left)
+	block_add_subtree(t->left, data, block);
+    block_add_chunk(t->chunk, data, block);
+    if (t->right)
+	block_add_subtree(t->right, data, block);
+}
+
+static void
+copy_finish_and_free(struct block_info *dst, struct block_info *src) {
+    char buf[512];
+    if (dst) {
+	*dst = *src;
+	deflateCopy(&(dst->z_ctxt), &(src->z_ctxt));
+    }
+    while (SIZE_IS_OKAY(src)) {
+	src->z_ctxt.next_out = buf;
+	src->z_ctxt.avail_out = sizeof(buf);
+	if (deflate(&(src->z_ctxt), Z_FINISH)!=Z_OK) break; // done.
+	src->size += sizeof(buf) - src->z_ctxt.avail_out;
+    }
+    deflateEnd(&(src->z_ctxt));
+}
+
+/* Mark subtrees such that each can be represented on disk in no more than
+ * FILE_BLOCK_SIZE bytes.  (Oversized chunks might create larger blocks.)
+ * Return size and compression context information about the subtree rooted
+ * at 't' in 'block', which should represent an empty block when this function
+ * is invoked. */
+static void
+block_mark_treap_roots(struct treap *t, void *data,
+		       struct block_info *block, int leave_open) {
+    struct block_info empty_block, copy_block, copy2_block;
+    assert(!block->has_data);
+    assert(t);
+
+    /* Recurse on the left.  On return, 'block' will contain the compressed
+     * size of the left subtree (excluding already-blocked subtrees). */
+    if (t->left)
+	block_mark_treap_roots(t->left, data, block, 1);
+
+    if (SIZE_IS_OKAY(block)) /* skip this if we're going to redo it anyway */
+	block_add_treap_chunk(t, data, block);
+    else assert (t->left);
+
+    if (t->left) {
+	copy_finish_and_free(&copy_block, block);
+	if (!SIZE_IS_OKAY(block)) {
+	    /* Left subtree plus this chunk doesn't fit in a block.  Make
+	     * left subtree a block root. */
+	    t->left->block_root = 1;
+	    deflateEnd(&(copy_block.z_ctxt));
+	    init_block(block, 0);
+	    block->size += 20; /* reserve space to reverse left-hand block */
+	    block_add_treap_chunk(t, data, block);
+	} else
+	    *block = copy_block;
+    }
+
+    /* Recurse on the right, creating blocks as necessary. */
+    if (!t->right) {
+	if (!leave_open) deflateEnd(&(block->z_ctxt));
+	return; /* save some work: we're done already */
+    }
+    init_block(&empty_block, 0);
+    block_mark_treap_roots(t->right, data, &empty_block, 0);
+
+    /* Now that we've marked subtrees on the right, add unmarked chunks to
+     * our current block, then check our size. */
+
+    if (!SIZE_IS_OKAY(block)) {
+	if (!leave_open) deflateEnd(&(block->z_ctxt));
+	goto skip_deflate; /* avoid doing unnecessary work */
+    }
+    /* save a copy of our block state, in case this doesn't work out */
+    if (leave_open) {
+	copy2_block = *block;
+	deflateCopy(&(copy2_block.z_ctxt), &(block->z_ctxt));
+    }
+    /* okay, add the chunks on the right to this block */
+    block_add_subtree(t->right, data, block);
+
+    copy_finish_and_free(leave_open ? &copy_block : NULL, block);
+    if (SIZE_IS_OKAY(block)) {
+	if (!leave_open) return;
+	deflateEnd(&(copy2_block.z_ctxt)); /* free saved state */
+	*block = copy_block;
+    } else {
+	if (!leave_open) goto skip_deflate;
+	/* Didn't work out; restore our block state */
+	deflateEnd(&(copy_block.z_ctxt));
+	*block = copy2_block;
+    skip_deflate:
+	/* Right-hand subtree didn't fit.  Make it a block root. */
+	t->right->block_root = 1;
+	block->size += 20; /* Space needed for the reference */
+    }
+
+    /* done! */
+}
+static void
+mark_block_roots(struct treap *t, void *data) {
+    struct block_info empty_block;
+    if (!t) return;
+    init_block(&empty_block, 0);
+    block_mark_treap_roots(t, data, &empty_block, 0);
+    t->block_root = 1; /* topmost block */
+}
+
+
+/* Now that we've broken it down into blocked treap-structured pieces, let's
+ * write them to the object store. */
+
+/* Write as 0x50 ( <size> <hash> )* <size> <compressed content>
+ * where <size> is 7 bit chunks of the big-endian (size*2), with the MSB set
+ * on all but the last chunk, and the LSB of the last chunk set unless
+ * this is the last <size> in the header.  <hash> is a 20-byte SHA-1
+ * hash, of course.  The contents should be reconstituted by weaving together
+ * <size> bytes of the uncompressed content, the contents of the
+ * referenced block/blob, the next <size> bytes of the uncompressed content,
+ * and so on. */
+
+static void
+emit_num(ch_size_t num, unsigned char **buf, unsigned char *buf_end) {
+    unsigned char scratch[sizeof(num)*2];
+    int i;
+    for (i=0; num>0 || i==0; num>>=7, i++)
+	scratch[i] = 0x80 | (num & 0x7F);
+    scratch[0] &= 0x7F; /* zero indicates 'last byte' */
+    for (i--; i>=0; i--, (*buf)++)
+	if (*buf < buf_end)
+	    **buf = scratch[i];
+}
+
+/* returns the uncompressed size of this block's data, and the header space
+ * needed (indirectly, in buf). */
+static int
+write_block_header(struct treap *t,
+		   unsigned char **buf, unsigned char *buf_end,
+		   ch_size_t *trailing_chunk, int follow) {
+    ch_size_t csz = t->chunk->end - t->chunk->start, sz=csz;
+    assert(t);
+    /* write the left-hand header */
+    if (follow && t->left)
+	sz += write_block_header(t->left, buf, buf_end, trailing_chunk, !t->left->block_root);
+
+    /* now write us. */
+    if (follow)
+	*trailing_chunk += csz;
+    else {
+	assert(t->block_root);
+	/* emit chunk size (even if zero) */
+	emit_num(((*trailing_chunk)<<1) | 1, buf, buf_end);
+	*trailing_chunk = 0;
+	/* emit sha1 hash */
+	if (*buf + sizeof(t->sha1) < buf_end)
+	    memcpy(*buf, t->sha1, sizeof(t->sha1));
+	*buf += sizeof(t->sha1);
+    }
+
+    /* write the right-hand header */
+    if (follow && t->right)
+	sz += write_block_header(t->right, buf, buf_end, trailing_chunk, !t->right->block_root);
+
+    /* write trailing size (even if zero) if this is the topmost call */
+    if (follow && t->block_root)
+	emit_num((*trailing_chunk)<<1, buf, buf_end);
+
+    /* done */
+    return follow ? sz : 0;
+}
+static void
+write_block_data(struct treap *t, char *data, z_streamp streamp, int first,
+		 int bufsize) {
+    /* Write left-hand block data */
+    if (t->left && !t->left->block_root) {
+	write_block_data(t->left, data, streamp, first, bufsize);
+	first = 0;
+    }
+    /* Compress this chunk */
+    if (first && t->left) {
+	/* Use left subtree as dictionary to improve compression. */
+	assert(t->left->block_root);
+	deflateSetDictionary(streamp, data + t->left->start,
+			     t->left->end - t->left->start);
+    }
+    streamp->next_in = data + t->chunk->start;
+    streamp->avail_in = t->chunk->end - t->chunk->start;
+    while (streamp->avail_in > 0 && deflate(streamp, 0) == Z_OK)
+	if (streamp->avail_out==0) {
+		streamp->next_out -= bufsize;
+		streamp->avail_out = bufsize;
+	}
+    /* Write right-hand block data */
+    if (t->right && !t->right->block_root)
+	write_block_data(t->right, data, streamp, 0, bufsize);
+}
+
+/* Write a single treap block to the object store.  */
+static int
+write_one(struct treap *t, char *data) {
+    z_stream stream;
+    int n;
+ 
+    assert(t); assert(data);
+
+    memset(&stream, 0, sizeof(stream));
+    deflateInit(&stream, Z_BEST_COMPRESSION);
+
+    for (n=FILE_BLOCK_SIZE; ;) {
+	unsigned char buf[n], *cp=buf, *buf_end=buf+sizeof(buf);
+	ch_size_t zero = 0;
+	int sz;
+
+	/* This is a little tricky: all zlib streams start with 0xW8, where W
+	 * is usually '7'.  Write a tag byte which can be differentiated
+	 * from this. */
+	assert(n>0);
+	*cp++ = BLOCK_HEADER_BYTE;
+	/* Now write the header (chunk sizes and block references). */
+	sz = write_block_header(t, &cp, buf_end, &zero, 1);
+	/* If we didn't allocate enough space the first time, go back and
+	 * do so now. */
+	if (cp > buf_end) {
+	    n = (cp-buf)+deflateBound(&stream, sz);
+	    continue;
+	}
+	/* Now let's compress our chunk content. */
+	stream.next_out = cp;
+	stream.avail_out = buf_end - cp;
+	stream.total_out = cp-buf;
+	write_block_data(t, data, &stream, 1, n);
+	/* Finish off the zlib stream (write trailer). */
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+	    if (stream.avail_out==0) {
+		stream.next_out = buf;
+		stream.avail_out = sizeof(buf);
+	    }
+	/* If we didn't allocate enough space, go back and do so now. */
+	if (stream.total_out > n) {
+	    n = stream.total_out;
+	    deflateReset(&stream);
+	    continue;
+	}
+	/* hey, success! our block data is in 'buf' */
+	deflateEnd(&stream);
+	return write_sha1_buffer(t->sha1, buf, stream.total_out);
+    }
+}
+
+/* Write all treap blocks to disk. */
+static int
+write_block(struct treap *t, char *buf, char *sha1_ret) {
+    const char *sha1 = t ? (const char*)t->sha1 : compute_null_treap_hash();
+    /* Provide sha1 to parent, if asked for. */
+    if (sha1_ret) memcpy(sha1_ret, sha1, sizeof(t->sha1));
+    /* Write us, if we're a block root. */
+    if (t==NULL) {
+	unsigned char empty_block[] = { BLOCK_HEADER_BYTE, 0 };
+	return write_sha1_buffer(sha1, empty_block, sizeof(empty_block));
+    }
+    if (t->block_root) {
+	if (write_one(t, buf) < 0)
+	    return -1; /* failure. */
+	/* We don't need to write children if this already existed. */
+	if (errno == EEXIST) return 0;
+    }
+    /* No such luck.  Write our children. */
+    if (t->left)
+	if (write_block(t->left, buf, NULL) < 0)
+	    return -1; /* failure. */
+    if (t->right)
+	if (write_block(t->right, buf, NULL) < 0)
+	    return -1; /* failure. */
+    /* Now write us.  Note t may == NULL for a zero-byte file. */
+    /* Write back sha1, if wanted. */
+    errno = 0;
+    return 0;
+}
+
+static int
+chunky_write_buffer(unsigned char *sha1, void *buffer, unsigned long size,
+		    int force_write) {
+    struct chunklist *cl;
+    struct treap *t;
+    int st = 0;
+    /* We expect there to be 'file length / CHUNK_SIZE' chunks.  Over-estimate
+     * a little, and do the initial chunk list allocation. */
+    cl = create_chunklist(1 + ((3 * size) / (2 * CHUNK_SIZE)));
+    /* Split the file into chunks. */
+    chunkify(cl, buffer, size);
+    /* Build the treap. */
+    t = build_treap(cl, 0, cl->num_items);
+    assert(treap_valid(t));
+    /* Block it. */
+    mark_block_roots(t, buffer);
+    /* Compute all the hashes. */
+    compute_treap_hashes(t, buffer);
+    /* Now write all the pieces, updating SHA1 for this file in the process. */
+    st = write_block(t, buffer, sha1);
+    if (force_write && st==0 && errno == EEXIST)
+	if (unlink(sha1_file_name(sha1))==0)
+	    st = write_block(t, buffer, sha1);
+    /* Free everything; we're done. */
+    free_treap(t);
+    free_chunklist(cl);
+    return st;
+}
+
+/* EXPORTED FUNCTION: write the file open on file descriptor 'fd'
+ * and described by 'ce' and 'st' to the object store.   Return
+ * 0 on success, -1 on failure. */
+/* This does the same thing as 'index_fd' in Linus' update-cache.c */
+int
+chunk_index_fd(unsigned char *sha1, int fd, struct stat *st) {
+    char *in; int rc;
+
+    in = "";
+    if (st->st_size)
+	in = mmap(NULL, st->st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+    close(fd);
+    if (in==MAP_FAILED) return -1;
+
+    rc = chunky_write_buffer(sha1, in, st->st_size, 0/* don't force write*/);
+
+    if (st->st_size)
+	munmap(in, st->st_size);
+
+    return rc;
+}
+
+/*** A similar function: this just chunkifies an existing blob. */
+void
+chunkify_blob(void *buffer, unsigned long size, unsigned char *sha1) {
+    unsigned char t[20];
+    chunky_write_buffer(sha1?sha1:t, buffer, size, 1/*force write*/);
+}
+
+
+/*** Functions to read a chunked file into a contiguous buffer. ***/
+
+struct read_block {
+    void *block_map, *chunk_start;
+    ch_size_t mapsize, total_size;
+    unsigned num_refs;
+    struct read_block *ref[0];
+};
+static void
+free_read_block(struct read_block *rb) {
+    int i;
+    for (i=0; i<rb->num_refs; i++)
+	free_read_block(rb->ref[i]);
+    if (rb->block_map) munmap(rb->block_map, rb->mapsize);
+    free(rb);
+}
+static struct read_block *read_block_header_map(void *, ch_size_t);
+
+static struct read_block *
+read_block_header(unsigned char *sha1) {
+    void *block_map;
+    ch_size_t mapsize;
+    /* Memory map block file. */
+    block_map = map_sha1_file(sha1, &mapsize);
+    if (!block_map) return NULL; /* error! */
+    // XXX should check & accept blobs, too.
+    return read_block_header_map(block_map, mapsize);
+}
+static struct read_block *
+read_block_header_map(void *block_map, ch_size_t mapsize) {
+    struct read_block *result;
+    ch_size_t sz;
+    unsigned char *cp;
+    unsigned num_refs;
+
+    assert(PACKED_IS_BLOCK(block_map, mapsize));
+    /* Count the number of external references. */
+    for (cp=block_map+1, num_refs=0; *cp & 0x81 ; cp++)
+	if (!(*cp & 0x80)) cp+=20, num_refs++;
+    /* Allocate the result */
+    result = malloc(sizeof(*result) + num_refs*sizeof(*(result->ref)));
+    result->block_map = block_map;
+    result->chunk_start = (cp+1);
+    result->mapsize = mapsize;
+    result->num_refs = num_refs;
+    result->total_size = 0;
+    /* Now calculate size (and recurse to create read_block tree) */
+    for (cp=block_map+1, num_refs=0, sz=0; ; cp++) {
+	sz = (sz<<7) | (*cp & 0x7F);
+	if (0 == (*cp & 0x80)) {
+	    result->total_size += (sz>>1); /* ignore lsb */
+	    sz = 0; /* reset */
+	    if (0 == (*cp & 1)) break; /* done */
+	    result->ref[num_refs] = read_block_header(cp+1);
+	    if (!result->ref[num_refs]) return NULL; /* error! */
+	    result->total_size += result->ref[num_refs]->total_size;
+	    num_refs++; cp+=20;
+	}
+    }
+    /* Done. */
+    return result;
+}
+
+/* Decompress data and copy it to the output buffer, starting at 'offset'. */
+static void
+read_block_data(struct read_block *rb, void *buffer) {
+    z_stream stream;
+    unsigned char *cp;
+    unsigned num_refs;
+    ch_size_t sz;
+    memset(&stream, 0, sizeof(stream));
+    stream.next_in = rb->chunk_start;
+    stream.avail_in = (rb->block_map + rb->mapsize) - rb->chunk_start;
+    stream.next_out = buffer;
+    stream.avail_out = 0;
+    inflateInit(&stream);
+    for (cp=rb->block_map+1, num_refs=0, sz=0; ; cp++) {
+	sz = (sz<<7) | (*cp & 0x7F);
+	if (0 == (*cp & 0x80)) {
+	    /* decompress 'sz>>1' bytes. */
+	    stream.avail_out = (sz>>1);
+	    while (stream.avail_out > 0)
+		switch(inflate(&stream, 0)) {
+		case Z_NEED_DICT:
+		    /* use previous chunk as dictionary */
+		    assert(num_refs>0);
+		    sz = rb->ref[num_refs-1]->total_size;
+		    inflateSetDictionary(&stream, stream.next_out-sz, sz);
+		    break;
+		case Z_OK: case Z_STREAM_END: break;
+		default:
+		    return; /* error! bail. */
+		}
+	    sz = 0; /* reset */
+	    if (0 == (*cp & 1)) break; /* done */
+	    /* splice in referenced chunk */
+	    read_block_data(rb->ref[num_refs], stream.next_out);
+	    stream.next_out += rb->ref[num_refs]->total_size;
+	    num_refs++; cp+=20;
+	}
+    }
+}
+
+/* This is called from unpack_sha1_file in sha1_file.c as a lookaside
+ * when a 'block' type file is found.  It will transparently stick
+ * together the prefix and suffix chunks and pass the result off as a
+ * 'blob'. */
+void * unpack_block_file(void *map, unsigned long mapsize, char *type, unsigned long *size) {
+    struct read_block *rb;
+    void *result;
+    assert(PACKED_IS_BLOCK(map, mapsize));
+    /* This is a 'blocked-chunk' object; get the rest of the pieces. */
+    rb = read_block_header_map(map, mapsize);
+    if (!rb) return NULL; /* error! */
+    /* Now concatenate them together. */
+    strcpy(type, "blob");
+    *size = rb->total_size;
+    result = malloc(*size);
+    read_block_data(rb, result);
+    /* done! */
+    rb->block_map = NULL; /* don't unmap top-level block memory */
+    free_read_block(rb);
+    return result;
+
+}
+
+
+#if 0
+/* Exercise this code. */
+int main(int argc, char **argv) {
+    struct cache_entry ce;
+    struct stat st;
+    char *buf, type[10];
+    unsigned long size;
+    int fd;
+    fd = open(argv[1], O_RDONLY);
+    if (fd < 0) exit(1);
+    if (fstat(fd, &st) < 0) exit(1);
+    if (chunk_index_fd(ce.sha1, fd, &st) < 0) exit(1);
+    printf("Wrote file %s.\n", sha1_to_hex(ce.sha1));
+    /* seemed to work! */
+    buf = read_sha1_file(ce.sha1, type, &size);
+    if (!buf) exit(1);
+    printf("Read file %s, of type %s (%lu bytes):\n",
+	   sha1_to_hex(ce.sha1), type, size);
+    fwrite(buf, size, 1, stdout);
+    /* done! */
+    return 0;
+}
+#endif
--- /dev/null	2005-04-27 10:20:44.511990864 -0400
+++ git.repo/chunk.h	2005-05-03 13:27:12.000000000 -0400
@@ -0,0 +1,27 @@
+#ifndef CHUNK_H
+#define CHUNK_H
+
+/* note: zlib streams all start with 0xW8YZ, where W is usually 7, and
+ * (0xW8YZ % 31) is zero.  We will start with a distinguishable header
+ * byte. */
+
+#define BLOCK_HEADER_BYTE (0x50) /* low nybble must not be '8' */
+#define PACKED_IS_BLOCK(map,mapsize) \
+	((mapsize) > 0 && BLOCK_HEADER_BYTE == (*((unsigned char*)(map))))
+
+extern int
+chunk_index_fd(unsigned char *sha1, int fd, struct stat *st);
+
+void *
+unpack_block_file(void *map, unsigned long mapsize,
+		  char *type, unsigned long *size);
+
+void
+chunkify_blob(void *buffer, unsigned long size, unsigned char *sha1);
+
+/* Avoid encoding the file length explicitly for files smaller than this.
+ * Should always be large enough to hold all the file metadata (type, length
+ * in ASCII, and a null byte) at least. */
+#define SMALL_FILE_LIMIT 16384
+
+#endif /* CHUNK_H */
diff -ruHp -x .dircache -x .git -x '*.o' -x '*~' -x 'blow-chunks.?.*' git.repo.orig/Makefile git.repo/Makefile
--- git.repo.orig/Makefile	2005-04-21 15:45:39.000000000 -0400
+++ git.repo/Makefile	2005-05-03 15:01:03.000000000 -0400
@@ -16,16 +17,17 @@ AR=ar
  PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
  	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
  	check-files ls-tree merge-base merge-cache unpack-file git-export \
-	diff-cache convert-cache
+	diff-cache convert-cache \
+	blow-chunks

  all: $(PROG)

  install: $(PROG)
  	install $(PROG) $(HOME)/bin/

-LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o
+LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o chunk.o
  LIB_FILE=libgit.a
-LIB_H=cache.h object.h
+LIB_H=cache.h object.h chunk.h

  $(LIB_FILE): $(LIB_OBJS)
  	$(AR) rcs $@ $(LIB_OBJS)
@@ -91,7 +93,11 @@ diff-cache: diff-cache.o $(LIB_FILE)
  convert-cache: convert-cache.o $(LIB_FILE)
  	$(CC) $(CFLAGS) -o convert-cache convert-cache.o $(LIBS)

+blow-chunks: blow-chunks.o $(LIB_FILE)
+	$(CC) $(CFLAGS) -o $@ $< $(LIBS)
+blow-chunks.o chunk-ref.o chunk-size.o: $(LIB_H)
+
  blob.o: $(LIB_H)
  cat-file.o: $(LIB_H)
  check-files.o: $(LIB_H)
diff -ruHp -x .dircache -x .git -x '*.o' -x '*~' -x 'blow-chunks.?.*' git.repo.orig/sha1_file.c git.repo/sha1_file.c
--- git.repo.orig/sha1_file.c	2005-04-21 15:45:41.000000000 -0400
+++ git.repo/sha1_file.c	2005-05-03 12:59:21.000000000 -0400
@@ -8,6 +8,7 @@
   */
  #include <stdarg.h>
  #include "cache.h"
+#include "chunk.h"

  const char *sha1_file_directory = NULL;

@@ -120,8 +121,13 @@ void * unpack_sha1_file(void *map, unsig
  {
  	int ret, bytes;
  	z_stream stream;
 	char buffer[8192];
  	char *buf;

+	/* Specially-process 'chunk' files. */
+	if (PACKED_IS_BLOCK(map, mapsize))
+	    return unpack_block_file(map, mapsize, type, size);
+
  	/* Get the data stream */
  	memset(&stream, 0, sizeof(stream));
  	stream.next_in = map;
@@ -215,6 +225,7 @@ int write_sha1_file(char *buf, unsigned

  	if (write(fd, compressed, size) != size)
  		die("unable to write file");
+	free(compressed);
  	close(fd);

  	return 0;
@@ -259,9 +270,11 @@ int write_sha1_buffer(const unsigned cha
  		if (collision_check(filename, buf, size))
  			return error("SHA1 collision detected!"
  					" This is bad, bad, BAD!\a\n");
+		errno = EEXIST; /* indicate to caller that this exists */
  		return 0;
  	}
  	write(fd, buf, size);
  	close(fd);
+	errno = 0;
  	return 0;
  }
diff -ruHp -x .dircache -x .git -x '*.o' -x '*~' -x 'blow-chunks.?.*' git.repo.orig/update-cache.c git.repo/update-cache.c
--- git.repo.orig/update-cache.c	2005-04-21 15:45:41.000000000 -0400
+++ git.repo/update-cache.c	2005-04-20 17:32:07.000000000 -0400
@@ -4,6 +4,7 @@
   * Copyright (C) Linus Torvalds, 2005
   */
  #include "cache.h"
+#include "chunk.h"

  /*
   * Default to not allowing changes to the list of files. The
@@ -23,6 +24,7 @@ static int index_fd(unsigned char *sha1,
  	void *metadata = malloc(200);
  	int metadata_size;
  	void *in;
+	int retval, err;
  	SHA_CTX c;

  	in = "";
@@ -51,6 +53,7 @@ static int index_fd(unsigned char *sha1,
  	stream.avail_out = max_out_bytes;
  	while (deflate(&stream, 0) == Z_OK)
  		/* nothing */;
+	free(metadata);

  	/*
  	 * File content
@@ -62,7 +65,11 @@ static int index_fd(unsigned char *sha1,

  	deflateEnd(&stream);

-	return write_sha1_buffer(sha1, out, stream.total_out);
+	retval = write_sha1_buffer(sha1, out, stream.total_out);
+	err = errno;
+	free(out);
+	errno = err;
+	return retval;
  }

  /*
@@ -113,8 +120,8 @@ static int add_file_to_cache(char *path)
  	ce->ce_mode = create_ce_mode(st.st_mode);
  	ce->ce_flags = htons(namelen);

-	if (index_fd(ce->sha1, fd, &st) < 0)
+	if (chunk_index_fd(ce->sha1, fd, &st) < 0)
  		return -1;

  	return add_cache_entry(ce, allow_add);
--- /dev/null	2005-04-27 10:20:44.511990864 -0400
+++ git.repo/blow-chunks.c	2005-04-20 21:33:33.000000000 -0400
@@ -0,1 +1,35 @@
+#include <stdlib.h>
+#include "cache.h"
+#include "chunk.h"
+
+/* For every file on the command-line, if it is a blob, convert it to a chunk.
+ */
+void convert_one(char *filename) {
+    char type[10];
+    int fd = open(filename, O_RDONLY);
+    struct stat st;
+    void *map, *buf;
+    unsigned long size;
+    if (fd < 0) { perror(filename); return; }
+    if (fstat(fd, &st) < 0) { perror(filename); close(fd); return; }
+    map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+    close(fd);
+    if (map == MAP_FAILED) { perror("mmap failed"); return; }
+    buf = unpack_sha1_file(map, st.st_size, type, &size);
+    munmap(map, st.st_size);
+    if (buf == NULL) { perror("Couldn't open file"); return; }
+    if (strcmp(type, "blob")==0) {
+	unsigned char sha1[20];
+	/* a-ha! */
+	chunkify_blob(buf, size, sha1);
+    }
+    free(buf);
+}
+
+int main(int argc, char **argv) {
+    int i;
+    for (i=1; i<argc; i++)
+	convert_one(argv[i]);
+    return 0;
+}

^ permalink raw reply

* Re: How do I...
From: Daniel Barkalow @ 2005-05-06 17:31 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Junio C Hamano, Linus Torvalds, git
In-Reply-To: <427B9FB3.1040002@tuxrocks.com>

On Fri, 6 May 2005, Frank Sorenson wrote:

> Note that I could be just thinking about this all wrong, so my
> terminology could be in left field.  Here, I'm mostly just interested in
> the case where "Hey, something broke with drivers/char/i8k.c.  When was
> this changed?  Who changed what?"

The tricky thing is that you want to *not* see commits where somebody
adopted somebody else's change to drivers/char/i8k.c; you want to ignore
those commits in favor of the commits where the original author of the
changes made the changes. Otherwise, you mostly see merges with people
submitting lines where they didn't change that file.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: [PATCH] fix compare symlink against readlink not data
From: Greg KH @ 2005-05-06 17:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kay Sievers, git
In-Reply-To: <Pine.LNX.4.58.0505061009340.2233@ppc970.osdl.org>

On Fri, May 06, 2005 at 10:11:49AM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 6 May 2005, Greg KH wrote:
> > 
> > Odd.  If I reclone the whole tree from the udev kernel.org tree, then it
> > works just fine.  If I create a new copy of my local tree, I still have
> > the same problem.  Diffing the trees shows no difference in the objects
> > at all...
> 
> You've not updated your cache.
> 
> Guys, remember this command:
> 
> 	git-diff-files
> 
> Just like that, with no arguments. It shows you what is different in your 
> cache. If you get a lot of output, it means that your index file isn't 
> up-to-date.
> 
> The other magic command is
> 
> 	git-update-cache --refresh

Damm, I still was using update-cache and checkout-cache from an old git
version.  That was my problem.

Sorry for the noise, it works just fine.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] fix compare symlink against readlink not data
From: Linus Torvalds @ 2005-05-06 17:11 UTC (permalink / raw)
  To: Greg KH; +Cc: Kay Sievers, git
In-Reply-To: <20050506163603.GA17766@kroah.com>



On Fri, 6 May 2005, Greg KH wrote:
> 
> Odd.  If I reclone the whole tree from the udev kernel.org tree, then it
> works just fine.  If I create a new copy of my local tree, I still have
> the same problem.  Diffing the trees shows no difference in the objects
> at all...

You've not updated your cache.

Guys, remember this command:

	git-diff-files

Just like that, with no arguments. It shows you what is different in your 
cache. If you get a lot of output, it means that your index file isn't 
up-to-date.

The other magic command is

	git-update-cache --refresh

and you need to do that after you merge.

If you use cogito, and cogito doesn't refresh after pulls etc, that would
be a cogito bug. But if you do things like "cp -a" of a git tree, and 
you forget to refresh the cache in the new tree, than that is _your_ bug..

		Linus

^ permalink raw reply

* Re: How do I...
From: Linus Torvalds @ 2005-05-06 17:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Frank Sorenson, git
In-Reply-To: <7vsm10cnx3.fsf@assigned-by-dhcp.cox.net>



On Fri, 6 May 2005, Junio C Hamano wrote:
> 
> LT> Guys - whoever wrote one of the scripts, can you please send out your 
> LT> current version to the git list and cc me, and explain why yours is 
> LT> superior to the other peoples version. Please?
> 
> I think I mentioned and posted an interactive version called
> jit-trackdown.  It is part of JIT found at [*1*].  You may have
> an older version that does not have the command.

Actually, I spent some time just now to improve the interface to 
git-diff-tree, allowing it to take a list of trees to compare from stdin.

And if you only give it one entry on stdin, it assumes it is a commit,
and lists the tree difference between that and the parent(s). If it's a 
merge commit, it will ignore it unless you use "-m".

So now you can do

	git-rev-list HEAD --max-count=10 | git-diff-tree --stdin update-cache.c

to see which of the last 10 commits changed "update-cache.c".

And if you add the "-p" flag, it will show it as a diff too.

Damn, I'm good. It was just a few lines of code added to git-diff-tree.

		Linus

^ permalink raw reply

* Re: [PATCH] fix compare symlink against readlink not data
From: Kay Sievers @ 2005-05-06 16:59 UTC (permalink / raw)
  To: Greg KH; +Cc: Linus Torvalds, git
In-Reply-To: <20050506163603.GA17766@kroah.com>

On Fri, 2005-05-06 at 09:36 -0700, Greg KH wrote:
> On Fri, May 06, 2005 at 06:23:34PM +0200, Kay Sievers wrote:
> > On Fri, 2005-05-06 at 09:03 -0700, Greg KH wrote:
> > > On Fri, May 06, 2005 at 03:45:01PM +0200, Kay Sievers wrote:
> > > > Fix update-cache to compare the blob of a symlink against the link-target
> > > > and not the file it points to. Also ignore all permissions applied to
> > > > links.
> > > > Thanks to Greg for recognizing this while he added our list of symlinks
> > > > back to the udev repository.
> > > 
> > > Hm, even with this patch applied (it's in Linus's tree right now), I
> > > still get the following with a clean checked out udev tree:
> > >  $ cg-diff
> > >  Index: test/sys/block/cciss!c0d0/device
> > >  ===================================================================
> > 
> > I can't reproduce this. Are you sure, that the git-core binaries are
> > called and not the cogito ones?
> > 
> >   git-update-cache --refresh
> >   git-diff-cache -r HEAD
> > from the core-git should print nothing.
> 
> Odd.  If I reclone the whole tree from the udev kernel.org tree, then it
> works just fine.  If I create a new copy of my local tree, I still have
> the same problem.  Diffing the trees shows no difference in the objects
> at all...
> 
> Can you add a symlink to a local tree and see if you can duplicate this?

Works as it should be.
What happens when you throw away your old .git/index with?
  git-read-tree HEAD

Kay


^ permalink raw reply

* Re: How do I...
From: Frank Sorenson @ 2005-05-06 16:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7vsm10cnx3.fsf@assigned-by-dhcp.cox.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Junio C Hamano wrote:
> Yourself or not, I think it is a good idea to do something that
> does exactly Frank wants, namely, just list commits.  Even
> better would be, to take commits with multiple parents into
> account, list of <commit> <parent> pairs, like:
> 
>     $ git-file-revs Makefile
>     f7eb55....... aaaaaa.......
>     f7eb55....... bbbbbb.......
>     aaaaaa....... dddddd.......
> 
> which shows commit f7eb55... changed it relative to both of its
> parents aaaaaa... and bbbbbb...

Note that I could be just thinking about this all wrong, so my
terminology could be in left field.  Here, I'm mostly just interested in
the case where "Hey, something broke with drivers/char/i8k.c.  When was
this changed?  Who changed what?"

Thanks,

Frank
- --
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCe5+yaI0dwg4A47wRAhIwAKD7R/D3ZU7JZz0mytEO04u04OTVZwCbBFhG
sEYSTYiLLIMuLxU+r1mNuGw=
=Mcg2
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: How do I...
From: Frank Sorenson @ 2005-05-06 16:39 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Git Mailing List
In-Reply-To: <1115390221.10459.4.camel@localhost>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dave Kleikamp wrote:
> On Fri, 2005-05-06 at 03:49 -0600, Frank Sorenson wrote:
> 
> 
>>After doing a cg-update, can I cg-log just the changes since the last
>>update?  Alternatively, how can I tell cg-log I'm caught up, and don't
>>need anything historical?
> 
> 
> (Assuming pulling from "origin")
> Instead of doing cg-update, do cg-pull.  Then "cg-log :origin" will give
> you you the changesets you just pulled.

Super.  This works great.  Thanks.

> "cg-merge origin" will then
> complete operation, thereby catching you up.

Okay, not quite so great.  Here's the ouput when I ran it to update my
kernel this morning.  Note that I haven't made any local modifications.
 I'm seeing this sort of thing often enough that I'm blowing away my
whole git tree and regenerating it to get back to a stable state once or
twice a week.  I'm sure there's another way, but without me making
modifications on my end, I wouldn't expect this to happen.  Suggestions
are welcome! :)

# cg-merge origin
Fast-forwarding 6741f3a7f9922391cd02b3ca1329e669497dc22f ->
2512809255d018744fe6c2f5e996c83769846c07
        on top of 6741f3a7f9922391cd02b3ca1329e669497dc22f...
patching file fs/proc/Makefile
patching file fs/proc/array.c
patching file fs/proc/base.c
patching file fs/proc/generic.c
patching file fs/proc/inode-alloc.txt
patching file fs/proc/inode.c
patching file fs/proc/internal.h
patching file fs/proc/kcore.c
patching file fs/proc/kmsg.c
patching file fs/proc/mmu.c
patching file fs/proc/nommu.c
patching file fs/proc/proc_devtree.c
patching file fs/proc/proc_misc.c
patching file fs/proc/proc_tty.c
patching file fs/proc/root.c
patching file fs/proc/task_mmu.c
patching file fs/proc/task_nommu.c
touch: cannot touch `fs/proc/Makefile': No such file or directory
touch: cannot touch `fs/proc/array.c': No such file or directory
touch: cannot touch `fs/proc/base.c': No such file or directory
touch: cannot touch `fs/proc/generic.c': No such file or directory
touch: cannot touch `fs/proc/inode-alloc.txt': No such file or directory
touch: cannot touch `fs/proc/inode.c': No such file or directory
touch: cannot touch `fs/proc/internal.h': No such file or directory
touch: cannot touch `fs/proc/kcore.c': No such file or directory
touch: cannot touch `fs/proc/kmsg.c': No such file or directory
touch: cannot touch `fs/proc/mmu.c': No such file or directory
touch: cannot touch `fs/proc/nommu.c': No such file or directory
touch: cannot touch `fs/proc/proc_devtree.c': No such file or directory
touch: cannot touch `fs/proc/proc_misc.c': No such file or directory
touch: cannot touch `fs/proc/proc_tty.c': No such file or directory
touch: cannot touch `fs/proc/root.c': No such file or directory
touch: cannot touch `fs/proc/task_mmu.c': No such file or directory
touch: cannot touch `fs/proc/task_nommu.c': No such file or directory
rm: cannot remove `fs/proc/Makefile': No such file or directory
rm: cannot remove `fs/proc/array.c': No such file or directory
rm: cannot remove `fs/proc/base.c': No such file or directory
rm: cannot remove `fs/proc/generic.c': No such file or directory
rm: cannot remove `fs/proc/inode-alloc.txt': No such file or directory
rm: cannot remove `fs/proc/inode.c': No such file or directory
rm: cannot remove `fs/proc/internal.h': No such file or directory
rm: cannot remove `fs/proc/kcore.c': No such file or directory
rm: cannot remove `fs/proc/kmsg.c': No such file or directory
rm: cannot remove `fs/proc/mmu.c': No such file or directory
rm: cannot remove `fs/proc/nommu.c': No such file or directory
rm: cannot remove `fs/proc/proc_devtree.c': No such file or directory
rm: cannot remove `fs/proc/proc_misc.c': No such file or directory
rm: cannot remove `fs/proc/proc_tty.c': No such file or directory
rm: cannot remove `fs/proc/root.c': No such file or directory
rm: cannot remove `fs/proc/task_mmu.c': No such file or directory
rm: cannot remove `fs/proc/task_nommu.c': No such file or directory
fs/proc/Makefile: needs update
fs/proc/array.c: needs update
fs/proc/base.c: needs update
fs/proc/generic.c: needs update
fs/proc/inode-alloc.txt: needs update
fs/proc/inode.c: needs update
fs/proc/internal.h: needs update
fs/proc/kcore.c: needs update
fs/proc/kmsg.c: needs update
fs/proc/mmu.c: needs update
fs/proc/nommu.c: needs update
fs/proc/proc_devtree.c: needs update
fs/proc/proc_misc.c: needs update
fs/proc/proc_tty.c: needs update
fs/proc/root.c: needs update
fs/proc/task_mmu.c: needs update
fs/proc/task_nommu.c: needs update


Thanks,
Frank
- --
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCe53EaI0dwg4A47wRAt5dAJ4wEG8KmRvEnqLMOtDiNrZqRhURMgCfTUaE
JLGGFnRN4YGhix/7SkOwAtg=
=aDQu
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH] fix compare symlink against readlink not data
From: Greg KH @ 2005-05-06 16:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kay Sievers, Linus Torvalds, git
In-Reply-To: <7vy8ascod4.fsf@assigned-by-dhcp.cox.net>

On Fri, May 06, 2005 at 09:26:15AM -0700, Junio C Hamano wrote:
> 
> Also could you try the low-level git command, git-diff-cache -p,
> against the tree you are comparing?  The built-in diff stuff
> might get this wrong too.

No, 'git-diff-cache -r HEAD' shows me:

$ git-diff-cache -r HEAD
*120000->100644 blob    2d78258b1a0fe49afabc8c16a352117df5dc338a->2d78258b1a0fe49afabc8c16a352117df5dc338a      test/sys/block/cciss!c0d0/device
*120000->100644 blob    2d78258b1a0fe49afabc8c16a352117df5dc338a->2d78258b1a0fe49afabc8c16a352117df5dc338a      test/sys/block/rd!c0d0/device
*120000->100644 blob    2d78258b1a0fe49afabc8c16a352117df5dc338a->2d78258b1a0fe49afabc8c16a352117df5dc338a      test/sys/block/sda/device
*120000->100644 blob    1c776568bdc9dc750addd0885dded6b008a44460->1c776568bdc9dc750addd0885dded6b008a44460      test/sys/bus/pci/devices/0000:00:09.0
*120000->100644 blob    e000c77614a23ad57fed284bd007ed7c1cb7872e->e000c77614a23ad57fed284bd007ed7c1cb7872e      test/sys/bus/pci/devices/0000:00:1e.0
*120000->100644 blob    630d35bf617944a4ba6afc90ca5176cb342a2662->630d35bf617944a4ba6afc90ca5176cb342a2662      test/sys/bus/pci/devices/0000:02:05.0
*120000->100644 blob    bd644e0e9d0c2f289bc4a3e3a034d528d5d671cc->bd644e0e9d0c2f289bc4a3e3a034d528d5d671cc      test/sys/bus/pci/drivers/aic7xxx/0000:02:05.0
*120000->100644 blob    ebb65b3bac36ef935a55a7f1010e4d3a242188eb->ebb65b3bac36ef935a55a7f1010e4d3a242188eb      test/sys/bus/scsi/devices/0:0:0:0
*120000->100644 blob    239003f712dd9112171e635a44160da1898f5996->239003f712dd9112171e635a44160da1898f5996      test/sys/bus/scsi/drivers/sd/0:0:0:0
*120000->100644 blob    b7733a68e08e564300212a22c9f81888c12bb55a->b7733a68e08e564300212a22c9f81888c12bb55a      test/sys/bus/usb-serial/devices/ttyUSB0
*120000->100644 blob    177f109e4899cf4008b9413933392d4f07832fdc->177f109e4899cf4008b9413933392d4f07832fdc      test/sys/bus/usb-serial/drivers/PL-2303/ttyUSB0
*120000->100644 blob    9137978832942ecce572d376f14244c1588748a2->9137978832942ecce572d376f14244c1588748a2      test/sys/bus/usb/devices/3-0:1.0
*120000->100644 blob    e47b4d58c4e5406bdba3ea1384c0c3efe007b8f6->e47b4d58c4e5406bdba3ea1384c0c3efe007b8f6      test/sys/bus/usb/devices/3-1
*120000->100644 blob    f519185eb36af29f79ca89d4b3d51011756b6837->f519185eb36af29f79ca89d4b3d51011756b6837      test/sys/bus/usb/devices/3-1:1.0
*120000->100644 blob    fb1919e7c9794ce31a257b50621f71f6f4f8bdef->fb1919e7c9794ce31a257b50621f71f6f4f8bdef      test/sys/bus/usb/devices/usb3
*120000->100644 blob    2bc160c20cd950c52e34d4bab30e1e25d6f4df34->2bc160c20cd950c52e34d4bab30e1e25d6f4df34      test/sys/bus/usb/drivers/hub/3-0:1.0
*120000->100644 blob    49d32d5abd7e26766f4c905f1d4edf1e28f8b322->49d32d5abd7e26766f4c905f1d4edf1e28f8b322      test/sys/bus/usb/drivers/pl2303/3-1:1.0
*120000->100644 blob    03c76193e99a93c7ff45c9ac986d2bc8e0706b0b->03c76193e99a93c7ff45c9ac986d2bc8e0706b0b      test/sys/bus/usb/drivers/usb/3-1
*120000->100644 blob    61dc52a61345178c8c171ecfe96df9646af2f16c->61dc52a61345178c8c171ecfe96df9646af2f16c      test/sys/bus/usb/drivers/usb/usb3
*120000->100644 blob    b7733a68e08e564300212a22c9f81888c12bb55a->b7733a68e08e564300212a22c9f81888c12bb55a      test/sys/class/tty/ttyUSB0/device
*120000->100644 blob    9ff2c81f529a95bd93ddaf66de6a72c74166c268->9ff2c81f529a95bd93ddaf66de6a72c74166c268      test/sys/devices/pci0000:00/0000:00:09.0/usb3/3-1/ttyUSB0/driver

thanks,

greg k-h

^ permalink raw reply

* Re: How do I...
From: David Woodhouse @ 2005-05-06 16:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Frank Sorenson, git
In-Reply-To: <Pine.LNX.4.58.0505060905090.2233@ppc970.osdl.org>

On Fri, 2005-05-06 at 09:13 -0700, Linus Torvalds wrote:
> There has been at least two different scripts for this posted, and one C 
> source code version.
> 
> I just haven't integrated them, because I'm an idiot, and too much choice 
> makes me run around in small circles and clucking.
> 
> Guys - whoever wrote one of the scripts, can you please send out your 
> current version to the git list and cc me, and explain why yours is 
> superior to the other peoples version. Please?

I already explained why mine sucks and shouldn't be merged. It was a
proof of concept; hoping for the stone soup effect.

I haven't seen a C version or indeed anything which actually does the
right thing, although I outlined how it would work and _threatened_ to
do one. I had a half-arsed attempt at it on the way home from
linux.conf.au but my brain tends to melt while I'm on airplanes so I
didn't get very far.

-- 
dwmw2


^ permalink raw reply

* Re: [PATCH] fix compare symlink against readlink not data
From: Greg KH @ 2005-05-06 16:36 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, git
In-Reply-To: <1115396614.32065.23.camel@localhost.localdomain>

On Fri, May 06, 2005 at 06:23:34PM +0200, Kay Sievers wrote:
> On Fri, 2005-05-06 at 09:03 -0700, Greg KH wrote:
> > On Fri, May 06, 2005 at 03:45:01PM +0200, Kay Sievers wrote:
> > > Fix update-cache to compare the blob of a symlink against the link-target
> > > and not the file it points to. Also ignore all permissions applied to
> > > links.
> > > Thanks to Greg for recognizing this while he added our list of symlinks
> > > back to the udev repository.
> > 
> > Hm, even with this patch applied (it's in Linus's tree right now), I
> > still get the following with a clean checked out udev tree:
> >  $ cg-diff
> >  Index: test/sys/block/cciss!c0d0/device
> >  ===================================================================
> 
> I can't reproduce this. Are you sure, that the git-core binaries are
> called and not the cogito ones?
> 
>   git-update-cache --refresh
>   git-diff-cache -r HEAD
> from the core-git should print nothing.

Odd.  If I reclone the whole tree from the udev kernel.org tree, then it
works just fine.  If I create a new copy of my local tree, I still have
the same problem.  Diffing the trees shows no difference in the objects
at all...

Can you add a symlink to a local tree and see if you can duplicate this?

thanks,

greg k-h

^ permalink raw reply

* Re: How do I...
From: Junio C Hamano @ 2005-05-06 16:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Frank Sorenson, git
In-Reply-To: <Pine.LNX.4.58.0505060905090.2233@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Fri, 6 May 2005, Frank Sorenson wrote:
>> 
>> Okay, I've got some "How can I?" questions.  I hope I'm not the only one
>> still working to "git it".
>> 
>> How can I git a list of commits that have modified a particular file?
>> For example, I'd like to do something like this:
>> # git-file-revs Makefile
>> f7eb55878f11575281add2a5726e483aed5e45bb
>> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>> bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

LT> Guys - whoever wrote one of the scripts, can you please send out your 
LT> current version to the git list and cc me, and explain why yours is 
LT> superior to the other peoples version. Please?

I think I mentioned and posted an interactive version called
jit-trackdown.  It is part of JIT found at [*1*].  You may have
an older version that does not have the command.

LT> But I might cook something up myself too. 

Yourself or not, I think it is a good idea to do something that
does exactly Frank wants, namely, just list commits.  Even
better would be, to take commits with multiple parents into
account, list of <commit> <parent> pairs, like:

    $ git-file-revs Makefile
    f7eb55....... aaaaaa.......
    f7eb55....... bbbbbb.......
    aaaaaa....... dddddd.......

which shows commit f7eb55... changed it relative to both of its
parents aaaaaa... and bbbbbb...

[Footnotes]
*1* http://members.cox.net/junkio/


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox