Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-11 23:40 UTC (permalink / raw)
  To: tglx; +Cc: git
In-Reply-To: <1115854733.22180.202.camel@tglx>

Thomas Gleixner wrote:
> 
> Which is complety error prone due to rsync. Some of the repositories on
> kernel.org keep identical copies of .git/description already. Why should
> they preserve an unique .git/repoid ?
> 
> There is one clean way to solve this. Managed repository id's and a lot
> of discipline.
> 
> I expect neither of those two things to happen, but a complete working
> directory path is better than nothing to make educated guesses.
> Committer names (maintainers) can be the same over repositories, but its
> unlikely that somebody who manages more than one subsystems uses the
> same working directory for them.
> 

I can tell you what would happen in at least my case: you'll see each 
"repository" with about 23 different IDs.

	-hpa

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-11 23:44 UTC (permalink / raw)
  To: tglx; +Cc: git
In-Reply-To: <1115854419.22180.196.camel@tglx>

On Wed, May 11, 2005 7:33 pm, Thomas Gleixner said:

> He? What the hell have the sparc-2.6 and net-2.6 in common except the
> same owner/maintainer ? Should we base the heuristics on directories and
> filenames ? Cool.

What problem are you trying to solve?  Has dave or russell or anybody with
multiple repositories given you reason to think they have a problem
tracking their personal repositories?   I doubt it very much.

>> The only point would be to show chain of command, but you don't seem
>> interested in that.
>
> What is the chain of commands good for ? Does the chain of commands
> change the history information in a specific repository ?

The chain of command might be good to know in the same way that an
accurate signed-off-by chain is good to know.

> No.

Yes.  Not that I care personally very much.

> If you buy food, then it is relevant if you get it from A directly or
> via B. The commit and the referenced tree is immutable and does neither
> change the consistency nor gets uneatable.

Lol..

> True, but not with a plain rsync approach

Agreed.

Sean.

^ permalink raw reply

* Re: [PATCH] Stop git-rev-list at sha1 match
From: Petr Baudis @ 2005-05-11 23:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Junio C Hamano, git
In-Reply-To: <1115852914.22180.170.camel@tglx>

Dear diary, on Thu, May 12, 2005 at 01:08:34AM CEST, I got a letter
where Thomas Gleixner <tglx@linutronix.de> told me that...
> On Thu, 2005-05-12 at 00:50 +0200, Petr Baudis wrote:
> > > Rn
> > > ---- Stop = Rn-1
> > > Rn-1
> > > ---- Stop = Rn-2
> > 
> > Mn
> > Mn-1
> > 
> > > Rn-2
> > > ---- Stop = Rn-3
> > > 
> > > The diff between Rn and Rn-1 contains always the changes merged from M
> > 
> > Yes, but you get the merge commits again since rev-list follows all the
> > parents.
> 
> That's plain wrong. The Mn(1) change hit repository r between revision
> Rn and Rn-1 and nowhere else. 
> 
> Date is irrelevant. The only relevant thing is the parent child(s)
> relationship.

What I described is just how rev-list works (now), nothing more. This is
what you get when you use rev-list.

Please see the thread of

5730     Apr 27 H. Peter Anvin  ( 0.2K) kernel.org now has gitweb installed

for extensive discussion on how (it is impossible or very hard) to do
better.

So how would you order the list of commits?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-11 23:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, git
In-Reply-To: <428297DB.8030905@zytor.com>

On Wed, May 11, 2005 7:40 pm, H. Peter Anvin said:

> I can tell you what would happen in at least my case: you'll see each
> "repository" with about 23 different IDs.
>

Amongst other issues and complexity this will introduce.   This is really
a solution in search of a problem anyway.

Sean

^ permalink raw reply

* [PATCH] Test suite
From: Junio C Hamano @ 2005-05-12  0:01 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050511224044.GI22686@pasky.ji.cz>

Commit    1da683e1247046796a094c4917bc0c4591530272
Author    Junio C Hamano <junkio@cox.net>, Wed May 11 16:59:35 2005 -0700
Committer Junio C Hamano <junkio@cox.net>, Wed May 11 16:59:35 2005 -0700

Test suite: infrastructure and examples.

This adds the test suite infrastructure with two example tests.
The current git-checkout-cache the example tests would fail this
test and will be corrected in a separate patch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Created: t/t1000-checkout-cache.sh (mode:100755)
--- /dev/null
+++ b/t/t1000-checkout-cache.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+. ./test-lib.sh
+test_description "$@" 'git-checkout-cache test.
+
+This test registers the following filesystem structure in the
+cache:
+
+    path0       - a file
+    path1/file1 - a file in a directory
+
+And then tries to checkout in a work tree that has the following:
+
+    path0/file0 - a file in a directory
+    path1       - a file
+
+The git-checkout-cache command should fail when attempting to checkout
+path0, finding it is occupied by a directory, and path1/file1, finding
+path1 is occupied by a non-directory.  With "-f" flag, it should remove
+the conflicting paths and succeed.
+'
+
+date >path0
+mkdir path1
+date >path1/file1
+git-update-cache --add path0 path1/file1
+test_debug 'git-ls-files --stage'
+
+rm -fr path0 path1
+mkdir path0
+date >path0/file0
+date >path1
+test_debug 'git-ls-files --stage'
+test_debug 'find path*'
+
+test_expect_failure 'git-checkout-cache -a'
+test_debug 'find path*'
+
+test_expect_success 'git-checkout-cache -f -a'
+test_debug 'find path*'
+
+if test -f path0 && test -d path1 && test -f path1/file1
+then
+	test_ok "checkout successful"
+else
+	test_failure "checkout failed"
+fi
+
+test_done
+
+
Created: t/t1001-checkout-cache.sh (mode:100755)
--- /dev/null
+++ b/t/t1001-checkout-cache.sh
@@ -0,0 +1,76 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+. ./test-lib.sh
+test_description "$@" 'git-checkout-cache test.
+
+This test registers the following filesystem structure in the cache:
+
+    path0/file0	- a file in a directory
+    path1/file1 - a file in a directory
+
+and attempts to check it out when the work tree has:
+
+    path0/file0 - a file in a directory
+    path1       - a symlink pointing at "path0"
+
+Checkout cache should fail to extract path1/file1 because the leading
+path path1 is occupied by a non-directory.  With "-f" it should remove
+the symlink path1 and create directory path1 and file path1/file1.
+'
+
+show_files() {
+	# show filesystem files, just [-dl] for type and name
+	find path? -ls |
+	sed -e 's/^[0-9]* * [0-9]* * \([-bcdl]\)[^ ]* *[0-9]* *[^ ]* *[^ ]* *[0-9]* [A-Z][a-z][a-z] [0-9][0-9] [^ ]* /fs: \1 /'
+	# what's in the cache, just mode and name
+	git-ls-files --stage |
+	sed -e 's/^\([0-9]*\) [0-9a-f]* [0-3] /ca: \1 /'
+	# what's in the tree, just mode and name.
+	git-ls-tree -r "$1" |
+	sed -e 's/^\([0-9]*\)	[^ ]*	[0-9a-f]*	/tr: \1 /'
+}
+
+mkdir path0
+date >path0/file0
+git-update-cache --add path0/file0
+tree1=$(git-write-tree)
+test_debug 'show_files $tree1'
+
+mkdir path1
+date >path1/file1
+git-update-cache --add path1/file1
+tree2=$(git-write-tree)
+test_debug 'show_files $tree2'
+
+rm -fr path1
+git-read-tree -m $tree1
+git-checkout-cache -f -a
+test_debug 'show_files $tree1'
+
+ln -s path0 path1
+git-update-cache --add path1
+tree3=$(git-write-tree)
+test_debug 'show_files $tree3'
+
+# Morten says "Got that?" here.
+# Test begins.
+
+git-read-tree $tree2
+test_expect_success 'git-checkout-cache -f -a'
+test_debug show_files $tree2
+
+if test ! -h path0 && test -d path0 &&
+   test ! -h path1 && test -d path1 &&
+   test ! -h path0/file0 && test -f path0/file0 &&
+   test ! -h path1/file1 && test -f path1/file1
+then
+    test_ok "checked out correctly."
+else
+    test_failure "did not check out correctly."
+fi
+
+test_done
+
Created: t/test-lib.sh (mode:100755)
--- /dev/null
+++ b/t/test-lib.sh
@@ -0,0 +1,106 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# For repeatability, reset the environment to known value.
+export LANG C
+export TZ UTC
+unset AUTHOR_DATE
+unset AUTHOR_EMAIL
+unset AUTHOR_NAME
+unset COMMIT_AUTHOR_EMAIL
+unset COMMIT_AUTHOR_NAME
+unset GIT_ALTERNATE_OBJECT_DIRECTORIES
+unset GIT_AUTHOR_DATE
+unset GIT_AUTHOR_EMAIL
+unset GIT_AUTHOR_NAME
+unset GIT_COMMITTER_EMAIL
+unset GIT_COMMITTER_NAME
+unset GIT_DIFF_OPTS
+unset GIT_DIR
+unset GIT_EXTERNAL_DIFF
+unset GIT_INDEX_FILE
+unset GIT_OBJECT_DIRECTORY
+unset SHA1_FILE_DIRECTORIES
+unset SHA1_FILE_DIRECTORY
+
+# Each test should start with something like this, after copyright notices:
+#
+# . ./testlib.sh
+# test_description "$@" 'Description of this test...
+# This test checks if command xyzzy does the right thing...
+# '
+#
+
+test_description () {
+	while case "$#" in 0) break;; esac
+	do
+		case "$1" in
+		-d|--d|--de|--deb|--debu|--debug)
+			debug=t; shift ;;
+		-h|--h|--he|--hel|--help)
+			eval echo '"$'$#'"'
+			exit 0
+			;;
+		*)
+			break ;;
+		esac
+	done
+	test_failure=0
+}
+
+say () {
+	echo "* $*"
+}
+
+test_debug () {
+	case "$debug" in '') ;; ?*) eval "$*" ;; esac
+}
+
+test_ok () {
+	echo "* $*";
+}
+
+test_failure () {
+	echo "* $*";
+	test_failure=1;
+}
+
+test_expect_failure () {
+	say "expecting failure: $1"
+	eval "$1"
+	case $? in
+	0)	test_failure "did not fail as expected" ;;
+	*) 	test_ok "failed as expected" ;;
+	esac
+}
+
+test_expect_success () {
+	say "expecting success: $1"
+	eval "$1"
+	case $? in
+	0) 	test_ok "succeeded as expected" ;;
+	*)	test_failure "did not succeed as expected" ;;
+	esac
+}
+
+test_done () {
+	case "$test_failure" in
+	0)	exit 0 ;;
+	'')	echo "*** test script did not start with test_description";
+		exit 2 ;;
+	*)	exit 1 ;;
+	esac
+}
+
+# Test the binaries we have just built.  The tests are kept in
+# t/ subdirectory and are run in test-repo subdirectory.
+PATH=$(pwd)/..:$PATH
+
+# Test repository
+test=test-repo
+rm -fr "$test"
+mkdir "$test"
+cd "$test"
+git-init-db 2>/dev/null || error "cannot run git-init-db"
------------------------------------------------


^ permalink raw reply

* [PATCH] checkout-cache fix
From: Junio C Hamano @ 2005-05-12  0:02 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050511224044.GI22686@pasky.ji.cz>

Commit    cc01b05f0a3dfdf5ed114e429a7bec1ad549ab1c
Author    Junio C Hamano <junkio@cox.net>, Wed May 11 17:00:16 2005 -0700
Committer Junio C Hamano <junkio@cox.net>, Wed May 11 17:00:16 2005 -0700

Fix checkout-cache when existing work tree interferes with the checkout.

This is essentially the same one as the last one I sent to the
GIT list, except that the patch is rebased to the current tip of
the git-pb tree, and an unnecessary call to create_directories()
removed.

The checkout-cache command gets confused when checking out a
file in a subdirectory and the work tree has a symlink to the
subdirectory.  Also it fails to check things out when there is a
non-directory in the work tree when cache expects a directory
there, and vice versa.  This patch fixes the first problem by
making sure all the leading paths in the file being checked out
are indeed directories, and also fixes directory vs
non-directory conflicts when '-f' is specified by removing the
offending paths.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -32,6 +32,8 @@
  * of "-a" causing problems (not possible in the above example,
  * but get used to it in scripting!).
  */
+#include <sys/types.h>
+#include <dirent.h>
 #include "cache.h"
 
 static int force = 0, quiet = 0, not_new = 0;
@@ -46,20 +48,61 @@ static void create_directories(const cha
 		len = slash - path;
 		memcpy(buf, path, len);
 		buf[len] = 0;
-		mkdir(buf, 0755);
+		if (mkdir(buf, 0755)) {
+			if (errno == EEXIST) {
+				struct stat st;
+				if (!lstat(buf, &st) && S_ISDIR(st.st_mode))
+					continue; /* ok */
+				if (force && !unlink(buf) && !mkdir(buf, 0755))
+					continue;
+			}
+			die("cannot create directory at %s", buf);
+		}
 	}
 	free(buf);
 }
 
+static void remove_subtree(const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *de;
+	char pathbuf[PATH_MAX];
+	char *name;
+	
+	if (!dir)
+		die("cannot opendir %s", path);
+	strcpy(pathbuf, path);
+	name = pathbuf + strlen(path);
+	*name++ = '/';
+	while ((de = readdir(dir)) != NULL) {
+		struct stat st;
+		if ((de->d_name[0] == '.') &&
+		    ((de->d_name[1] == 0) ||
+		     ((de->d_name[1] == '.') && de->d_name[2] == 0)))
+			continue;
+		strcpy(name, de->d_name);
+		if (lstat(pathbuf, &st))
+			die("cannot lstat %s", pathbuf);
+		if (S_ISDIR(st.st_mode))
+			remove_subtree(pathbuf);
+		else if (unlink(pathbuf))
+			die("cannot unlink %s", pathbuf);
+	}
+	closedir(dir);
+	if (rmdir(path))
+		die("cannot rmdir %s", path);
+}
+
 static int create_file(const char *path, unsigned int mode)
 {
 	int fd;
 
 	mode = (mode & 0100) ? 0777 : 0666;
+	create_directories(path);
 	fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
 	if (fd < 0) {
-		if (errno == ENOENT) {
-			create_directories(path);
+		if (errno == EISDIR && force) {
+			remove_subtree(path);
 			fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
 		}
 	}
------------------------------------------------


^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-12  0:04 UTC (permalink / raw)
  To: Sean; +Cc: tglx, git
In-Reply-To: <3004.10.10.10.24.1115855130.squirrel@linux1>

Sean wrote:
> 
> Amongst other issues and complexity this will introduce.   This is really
> a solution in search of a problem anyway.
> 

You mean repoid?

	-hpa

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-12  0:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, git
In-Reply-To: <42829D9F.3010403@zytor.com>

On Wed, May 11, 2005 8:04 pm, H. Peter Anvin said:
> Sean wrote:
>>
>> Amongst other issues and complexity this will introduce.   This is
>> really a solution in search of a problem anyway.
>>
> You mean repoid?

Hey Peter,

   Yes, it will create just as many problems as it sets out to solve. 
Actually, I still don't know what problem is being addressed by the
current proposal.

Sean

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12  0:30 UTC (permalink / raw)
  To: Sean; +Cc: git
In-Reply-To: <2997.10.10.10.24.1115855049.squirrel@linux1>

On Wed, 2005-05-11 at 19:44 -0400, Sean wrote:
> What problem are you trying to solve?  

The problem to explain the obvious facts to an agnostic

> Has dave or russell or anybody with
> multiple repositories given you reason to think they have a problem
> tracking their personal repositories?   I doubt it very much.

Aarg. Did you ever get in contact with QA departements ?

Assume you have:  bugfix - stable - devel repositories.

You have to track down a problem in bugfix and the source of it.
It does not matter whether the maintainer of "bugfix" pulled it from
devel or from stable. It's his fault anyway. 

But we are not talking about faults and guiltiness. We want to identify
the location and the context _where_ and _why_ this change was created.

The current solution of git makes it impossible to retrieve this
information in a consistent way. 

So you have no quick solution to figure out what happened. Quite
contrary, you have to dissect inconsistent information.

See also the thread about "Stop git-rev-list at sha1 match".

> The chain of command might be good to know in the same way that an
> accurate signed-off-by chain is good to know.

This sentence makes me guess, that you actually are working in a QA
departement and therefor trying to maximize the amount of irrelevant
information.

tglx

^ permalink raw reply

* Re: [PATCH] Stop git-rev-list at sha1 match
From: Thomas Gleixner @ 2005-05-12  0:31 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git
In-Reply-To: <20050511234455.GL22686@pasky.ji.cz>

On Thu, 2005-05-12 at 01:44 +0200, Petr Baudis wrote:
> for extensive discussion on how (it is impossible or very hard) to do
> better.

:)

> So how would you order the list of commits?

Rn
  merged Mn
  merged Mn-1
Rn-1
....

That's the relevant information in repository R. Looking at it from
repository M after M updated to Rn

(Mn+1) == Rn	; Mn+1 is not created due to head forward
  merged Rn
  .. 
  merged Rn-3
Mn
Mn-1

Thats the historical correct ordering from a repository point of view.
Thats the only relevant information IMNSHO.

The dates of author and committer are retrievable in each repository,
but the order of commits are not.

tglx

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12  0:33 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: git
In-Reply-To: <428297DB.8030905@zytor.com>

On Wed, 2005-05-11 at 16:40 -0700, H. Peter Anvin wrote:
> > I expect neither of those two things to happen, but a complete working
> > directory path is better than nothing to make educated guesses.
> > Committer names (maintainers) can be the same over repositories, but its
> > unlikely that somebody who manages more than one subsystems uses the
> > same working directory for them.
> > 
> 
> I can tell you what would happen in at least my case: you'll see each 
> "repository" with about 23 different IDs.

You won. :)

So what alternatives do we have ?

- commit history per repository
  .git/head-history               rsync and user error prone 
- .git/repoid                     rsync error prone
- GIT_REPO_ID=xyz                 user  error prone
- directory name based guessing   hpa error prone

What's your preferred error scenario ?

tglx



^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Dmitry Torokhov @ 2005-05-12  0:41 UTC (permalink / raw)
  To: git, tglx; +Cc: H. Peter Anvin
In-Reply-To: <1115854733.22180.202.camel@tglx>

On Wednesday 11 May 2005 18:38, Thomas Gleixner wrote:
> On Wed, 2005-05-11 at 16:14 -0700, H. Peter Anvin wrote:
> > I would like to suggest a few limiters are set on the repoid.  In 
> > particular, I'd like to suggest that a repoid is a UUID, that a file is 
> > used to track it (.git/repoid), and that if it doesn't exist, a new one 
> > is created from /dev/urandom.
> 
> Which is complety error prone due to rsync. Some of the repositories on
> kernel.org keep identical copies of .git/description already. Why should
> they preserve an unique .git/repoid ?

I think that an unique repoid should be created automatically every time
you clone. It is ok for it to go away when you discard a tree, it will just
identify a line (set) of changes originating from some place.

-- 
Dmitry

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12  0:44 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: git, H. Peter Anvin
In-Reply-To: <200505111941.04104.dtor_core@ameritech.net>

On Wed, 2005-05-11 at 19:41 -0500, Dmitry Torokhov wrote:
> > 
> > Which is complety error prone due to rsync. Some of the repositories on
> > kernel.org keep identical copies of .git/description already. Why should
> > they preserve an unique .git/repoid ?
> 
> I think that an unique repoid should be created automatically every time
> you clone. It is ok for it to go away when you discard a tree, it will just
> identify a line (set) of changes originating from some place.

Yes, as long as you make sure that rsync does _NOT_ pollute/populate it

tglx



^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-12  0:45 UTC (permalink / raw)
  To: tglx; +Cc: git
In-Reply-To: <1115857838.22180.250.camel@tglx>

On Wed, May 11, 2005 8:30 pm, Thomas Gleixner said:
> On Wed, 2005-05-11 at 19:44 -0400, Sean wrote:
>> What problem are you trying to solve?
>
> The problem to explain the obvious facts to an agnostic

No the problem is you're seeing dragons.

> Aarg. Did you ever get in contact with QA departements ?

Can we please not _invent_ problems where there are none?  Can you show a
specific case today where repoid would make one ounce of difference in the
life of anyone?

> Assume you have:  bugfix - stable - devel repositories.

Why does this imaginary QA department use the same committer and author
for all of them?  And why is it you switch from imaginary problems of
dave, greg and russell to imaginary problems of a fictitious QA
department?

> You have to track down a problem in bugfix and the source of it.
> It does not matter whether the maintainer of "bugfix" pulled it from
> devel or from stable. It's his fault anyway.
>
> But we are not talking about faults and guiltiness. We want to identify
> the location and the context _where_ and _why_ this change was created.
>
> The current solution of git makes it impossible to retrieve this
> information in a consistent way.

Wrong.  When a commit is pulled from a repository, all the surrounding
context of every commit that came before it and after it on that branch is
pulled right along with it.

> So you have no quick solution to figure out what happened. Quite
> contrary, you have to dissect inconsistent information.
>
> See also the thread about "Stop git-rev-list at sha1 match".

Sorry, this one is entertaining enough <g>

>> The chain of command might be good to know in the same way that an
>> accurate signed-off-by chain is good to know.
>
> This sentence makes me guess, that you actually are working in a QA
> departement and therefor trying to maximize the amount of irrelevant
> information.

No, you seem to want it both ways.  Sometimes it's important to you to
know where an object came from and how it got there, and sometimes it's
not.  Interesting blind spot.

Sean

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12  0:56 UTC (permalink / raw)
  To: Sean; +Cc: git
In-Reply-To: <3185.10.10.10.24.1115858739.squirrel@linux1>

On Wed, 2005-05-11 at 20:45 -0400, Sean wrote:
> Can we please not _invent_ problems where there are none?  Can you show a
> specific case today where repoid would make one ounce of difference in the
> life of anyone?

Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in correct
order, using the available tools. 

Come back to me when you are done.

> No, you seem to want it both ways.  Sometimes it's important to you to
> know where an object came from and how it got there, and sometimes it's
> not.  Interesting blind spot.

He ? 

I was not aware, that omitting irrelevant information is creating a
blind spot. 

Period. End of thread.

tglx



^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-12  0:58 UTC (permalink / raw)
  To: tglx; +Cc: git
In-Reply-To: <1115859372.22180.266.camel@tglx>

On Wed, May 11, 2005 8:56 pm, Thomas Gleixner said:

> Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in correct
> order, using the available tools.
>
> Come back to me when you are done.

Ask me any question that matters and i'll answer it with available tools.

> I was not aware, that omitting irrelevant information is creating a
> blind spot.

Sorry, your assessment that it is irrelevant is incorrect and overlooks
that  there is information loss.

> Period. End of thread.

Fair enough.

Sean

^ permalink raw reply

* New version of gitk
From: Paul Mackerras @ 2005-05-12  1:00 UTC (permalink / raw)
  To: git

I have just put a new version of gitk at:

	http://ozlabs.org/~paulus/gitk-0.9

I'm pretty happy with the display side of it now.  When you select a
commit it displays the full diff below the commit comments in the
bottom-left pane, with the diff displayed nicely with red and green
backgrounds for the removed and added lines.  There is still plenty to
do in the areas of user preferences, menus, find facility, etc.

Paul.

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-12  1:09 UTC (permalink / raw)
  To: tglx; +Cc: Dmitry Torokhov, git
In-Reply-To: <1115858670.22180.259.camel@tglx>

Thomas Gleixner wrote:
> On Wed, 2005-05-11 at 19:41 -0500, Dmitry Torokhov wrote:
> 
>>>Which is complety error prone due to rsync. Some of the repositories on
>>>kernel.org keep identical copies of .git/description already. Why should
>>>they preserve an unique .git/repoid ?
>>
>>I think that an unique repoid should be created automatically every time
>>you clone. It is ok for it to go away when you discard a tree, it will just
>>identify a line (set) of changes originating from some place.
> 
> 
> Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
> 

You shouldn't be rsyncing the .git directory, only .git/objects anyway. 
   Some people seem to have merely copied Linus' entire tree, and that's 
what causing problems.

That one you can't win.

	-hpa

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-12  1:13 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, Dmitry Torokhov, git
In-Reply-To: <4282ACD3.50009@zytor.com>

H. Peter Anvin wrote:
>>
>> Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
>>
> 
> You shouldn't be rsyncing the .git directory, only .git/objects anyway. 
>   Some people seem to have merely copied Linus' entire tree, and that's 
> what causing problems.
> 
> That one you can't win.
> 

What I meant with that is I think .git/repoid is the right thing, if the 
file doesn't exist a new ID file is generated.

If people are copying their repoid file explicitly it's up to them to 
know what they're doing.

	-hpa

^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Junio C Hamano @ 2005-05-12  1:46 UTC (permalink / raw)
  To: tglx; +Cc: H. Peter Anvin, git
In-Reply-To: <1115858022.22180.256.camel@tglx>

>>>>> "TG" == Thomas Gleixner <tglx@linutronix.de> writes:

TG> So what alternatives do we have ?

How about doing nothing of this sort, introducing repo-id?  I do
not understand what problem repo-id is solving.

Earlier in your response to Sean <seanlkml@sympaticoca>, you
gave a QA department example.

TG> You have to track down a problem in bugfix and the source of it.
TG> It does not matter whether the maintainer of "bugfix" pulled it from
TG> devel or from stable. It's his fault anyway. 
TG> 
TG> But we are not talking about faults and guiltiness. We want
TG> to identify the location and the context _where_ and _why_
TG> this change was created.

Here is my understanding of the scenario you are describing.
Are these correct?

 - There is a problem in the source.

 - You know what lines of which file is causing the problem.
   But you cannot tell how the file got into that state and why
   by just looking at the problem revision.

 - You have the complete history (commit chain) leading to the
   revision.

 - You want to get some context to help you understand why those
   offending lines are there.

Assuming I am with you so far, I would like to know what kind of
information you are looking for ("some context to help you
understand").  Is a specific commit object (rather, one pair of
commits that is parent-child) that made those lines into the
current shape enough?

My understanding of Sean's argument is that finding such a
commit (or a commit-pair) is a good enough place to start
understanding why that change was introduced and finding who to
ask for help, and it does not matter in which repository the
change was introduced.  I tend to agree with him if that is what
is being discussed.

If the owner has multiple repositories and he needs to know in
which of his repositories the change was introduced, I assume he
would xsbe able to run the same procedure the QA department run
to find the problem commit on each of his repositories to find
such a commit, and commits around it (its ancestors and
descendants).  So a maintainer having more than one repositories
does not seem to be an issue, either.

So I am having a hard time understanding what problem repo-id
solves.

^ permalink raw reply

* Re: [PATCH] Stop git-rev-list at sha1 match
From: Junio C Hamano @ 2005-05-12  1:54 UTC (permalink / raw)
  To: Petr Baudis; +Cc: tglx, git
In-Reply-To: <20050511221719.GH22686@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> it will show the merged revisions properly, but for

PB>      o
PB>      | \
PB>      o  |
PB>     ------
PB>      |  o
PB>      |  o
PB>      o /
PB>      o

PB> it won't show the full merge. Whilst when you do

PB> 	*-log --since foo

PB> I think you mean it to show everything going into the tree since foo -
PB> that would include the whole branch you cut off now.

I use "rev-tree HEAD ^$(git-merge-base HEAD foo)" for this
kind of thing, so rev-list does not really matter.

>> --- a/checkout-cache.c
>> +++ b/checkout-cache.c
PB> I assume this is irrelevant here?

Sorry for sending a dirty patch in.  Will fix it up.

^ permalink raw reply

* Re: [PATCH] Stop git-rev-list at sha1 match
From: Junio C Hamano @ 2005-05-12  2:11 UTC (permalink / raw)
  To: Petr Baudis; +Cc: tglx, git
In-Reply-To: <7v4qd9mcp1.fsf@assigned-by-dhcp.cox.net>

>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:
>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

>>> --- a/checkout-cache.c
>>> +++ b/checkout-cache.c
PB> I assume this is irrelevant here?
JCH> Sorry for sending a dirty patch in.  Will fix it up.

------------
Introduce "rev-list --stop-at=<commit>".

Additional option, --stop-at=<commit>, is introduced.  The
git-rev-list output stops just before showing the named commit.

This is based on Thoms Gleixner's patch but slightly reworked,
with documentation updates.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Documentation/git-rev-list.txt |   18 +++++++++++++++++-
rev-list.c                     |   20 ++++++++++++++++----
2 files changed, 33 insertions(+), 5 deletions(-)

--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -9,7 +9,10 @@
 
 SYNOPSIS
 --------
-'git-rev-list' <commit>
+'git-rev-list'	[--max-count=<number>]
+		[--max-age=<unixtime>]
+		[--min-age=<unixtime>]
+		[--stop-at=<commit>] <commit>
 
 DESCRIPTION
 -----------
@@ -17,6 +20,19 @@
 given commit, taking ancestry relationship into account.  This is
 useful to produce human-readable log output.
 
+OPTIONS
+-------
+--max-count=<number>::
+	Stop after showing <number> commits.
+
+--max-age=<unixtime>::
+	Stop after showing commit made before <unixtime>.
+
+--min-age=<unixtime>::
+	Skip until commit made before <unixtime>.
+
+--stop-at=<commit>::
+	Stop just before showing <commit>.
 
 Author
 ------
--- a/rev-list.c
+++ b/rev-list.c
@@ -1,12 +1,21 @@
 #include "cache.h"
 #include "commit.h"
 
+static const char *rev_list_usage = 
+"usage: rev-list [OPTION] commit-id\n"
+"  --max-count=nr\n"
+"  --max-age=epoch\n"
+"  --min-age=epoch\n"
+"  --stop-at=commit\n";
+
 int main(int argc, char **argv)
 {
 	unsigned char sha1[20];
 	struct commit_list *list = NULL;
 	struct commit *commit;
 	char *commit_arg = NULL;
+	unsigned char stop_at[20];
+	int has_stop_at = 0;
 	int i;
 	unsigned long max_age = -1;
 	unsigned long min_age = -1;
@@ -21,16 +30,17 @@
 			max_age = atoi(arg + 10);
 		} else if (!strncmp(arg, "--min-age=", 10)) {
 			min_age = atoi(arg + 10);
+		} else if (!strncmp(arg, "--stop-at=", 10)) {
+			if (get_sha1(arg + 10, stop_at))
+				usage(rev_list_usage);
+			has_stop_at = 1;
 		} else {
 			commit_arg = arg;
 		}
 	}
 
 	if (!commit_arg || get_sha1(commit_arg, sha1))
-		usage("usage: rev-list [OPTION] commit-id\n"
-		      "  --max-count=nr\n"
-		      "  --max-age=epoch\n"
-		      "  --min-age=epoch\n");
+		usage(rev_list_usage);
 
 	commit = lookup_commit(sha1);
 	if (!commit || parse_commit(commit) < 0)
@@ -46,6 +56,8 @@
 			break;
 		if (max_count != -1 && !max_count--)
 			break;
+		if (has_stop_at && !memcmp(stop_at, commit->object.sha1, 20))
+			break;
 		printf("%s\n", sha1_to_hex(commit->object.sha1));
 	} while (list);
 	return 0;
------------------------------------------------


^ permalink raw reply

* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Joel Becker @ 2005-05-12  3:30 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, Dmitry Torokhov, git
In-Reply-To: <4282ADC9.2010900@zytor.com>

On Wed, May 11, 2005 at 06:13:45PM -0700, H. Peter Anvin wrote:
> What I meant with that is I think .git/repoid is the right thing, if the 
> file doesn't exist a new ID file is generated.

	Count me in the "what does repoid help?" camp.  If we create a
new UUID on each clone, imagine this typical usage:

	linux-2.6.git has repoid AAAAAA.
	I clone it locally, local-2.6-clean, repoid BBBBBB
	I clone the local one, local-2.6-working, repoid CCCCCC
	I work in the local one and commit my change.  commit abcd,
		repoid CCCCCC.
	I then rsync, copy, or clone that working repository to some
		place that Linus can pull from.
	I then throw away the copy with repoid CCCCCC, because I'm done
		with that temporary work area.
	lather, rinse, repeat.

	IOW, each of my changes, if I work like this, has a different
repoid.  And when a problem arises, the repoid tells us diddly.  I
thought one of the tenents of bk/git/codeville/whatever development is
that clone is the way to do any temporary area.  You work in a clone or
10, and then clean up for submission.  Which of the 10 clones is the
associated repoid seems, well, unimporant.
	
Joel

-- 

Life's Little Instruction Book #99

	"Think big thoughts, but relish small pleasures."

Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply

* [PATCH] improved delta support for git
From: Nicolas Pitre @ 2005-05-12  3:51 UTC (permalink / raw)
  To: git


OK, here's some improved support for delta objects in a git repository. 
This patch adds the ability to create and restore delta objects with the 
git-mkdelta command.  A list of objects is provided and the 
corresponding delta chain is created.  The maximum depth of a delta 
chain can be specified with the -d argument.  If a max depth of 0 is 
provided then all given objects are undeltafied and replaced by their 
original version.  With the -v argument a lot of lovely details are 
printed out.

Also included is a script to deltafy an entire repository.  Simply 
execute git-deltafy-script to create deltas of objects corresponding to 
successive previous versions of every files.  Running 
'git-deltafy-script -d 0' will revert everything to non deltafied form.

I've yet to add suport to fsck-cache to understand delta objects.  It is 
advised to undeltafy your repository before running it otherwise you'll 
see lots of reported errors.  Once undeltafied you should have good 
output from fsck-cache again.

Please backup your repository before playing with this for now... just 
in case.

If you happen to have the whole kernel history in your repository I'd be 
interested to know what the space figure is and how it performs.  So far 
I tested a tar of the .git/objects directory from git's git repository.  
This is to estimate the real data size without the filesystem block 
round up. The undeltafied repository created a 1708kb tar file while the 
deltafied repository created a 1173kb tar file.  The chunking storage 
code should be considered for real life usage of course.

There are probably things to experiment in order to save space further, 
such as deltafying tree objects, and in the context of Linux, deltafying 
files with lots of similitudes between content in diferent include/asm-* 
subdirectories.

Signed-off-by: Nicolas Pitre <nico@cam.org>

Index: git/diff-delta.c
===================================================================
--- /dev/null
+++ git/diff-delta.c
@@ -0,0 +1,330 @@
+/*
+ * diff-delta.c: generate a delta between two buffers
+ *
+ *  Many parts of this file have been lifted from LibXDiff version 0.10.
+ *  http://www.xmailserver.org/xdiff-lib.html
+ *
+ *  LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>
+ *  Copyright (C) 2003	Davide Libenzi
+ *
+ *  Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.
+ *
+ *  This file is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ */
+
+#include <stdlib.h>
+#include "delta.h"
+
+
+/* block size: min = 16, max = 64k, power of 2 */
+#define BLK_SIZE 16
+
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+#define GR_PRIME 0x9e370001
+#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
+	
+/* largest prime smaller than 65536 */
+#define BASE 65521
+
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
+#define NMAX 5552
+
+#define DO1(buf, i)  { s1 += buf[i]; s2 += s1; }
+#define DO2(buf, i)  DO1(buf, i); DO1(buf, i + 1);
+#define DO4(buf, i)  DO2(buf, i); DO2(buf, i + 2);
+#define DO8(buf, i)  DO4(buf, i); DO4(buf, i + 4);
+#define DO16(buf)    DO8(buf, 0); DO8(buf, 8);
+
+static unsigned int adler32(unsigned int adler, const unsigned char *buf, int len)
+{
+	int k;
+	unsigned int s1 = adler & 0xffff;
+	unsigned int s2 = adler >> 16;
+
+	while (len > 0) {
+		k = MIN(len, NMAX);
+		len -= k;
+		while (k >= 16) {
+			DO16(buf);
+			buf += 16;
+			k -= 16;
+		}
+		if (k != 0)
+			do {
+				s1 += *buf++;
+				s2 += s1;
+			} while (--k);
+		s1 %= BASE;
+		s2 %= BASE;
+	}
+
+	return (s2 << 16) | s1;
+}
+
+static unsigned int hashbits(unsigned int size)
+{
+	unsigned int val = 1, bits = 0;
+	while (val < size && bits < 32) {
+		val <<= 1;
+	       	bits++;
+	}
+	return bits ? bits: 1;
+}
+
+typedef struct s_chanode {
+	struct s_chanode *next;
+	int icurr;
+} chanode_t;
+
+typedef struct s_chastore {
+	chanode_t *head, *tail;
+	int isize, nsize;
+	chanode_t *ancur;
+	chanode_t *sncur;
+	int scurr;
+} chastore_t;
+
+static void cha_init(chastore_t *cha, int isize, int icount)
+{
+	cha->head = cha->tail = NULL;
+	cha->isize = isize;
+	cha->nsize = icount * isize;
+	cha->ancur = cha->sncur = NULL;
+	cha->scurr = 0;
+}
+
+static void *cha_alloc(chastore_t *cha)
+{
+	chanode_t *ancur;
+	void *data;
+
+	ancur = cha->ancur;
+	if (!ancur || ancur->icurr == cha->nsize) {
+		ancur = malloc(sizeof(chanode_t) + cha->nsize);
+		if (!ancur)
+			return NULL;
+		ancur->icurr = 0;
+		ancur->next = NULL;
+		if (cha->tail)
+			cha->tail->next = ancur;
+		if (!cha->head)
+			cha->head = ancur;
+		cha->tail = ancur;
+		cha->ancur = ancur;
+	}
+
+	data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
+	ancur->icurr += cha->isize;
+	return data;
+}
+
+static void cha_free(chastore_t *cha)
+{
+	chanode_t *cur = cha->head;
+	while (cur) {
+		chanode_t *tmp = cur;
+		cur = cur->next;
+		free(tmp);
+	}
+}
+
+typedef struct s_bdrecord {
+	struct s_bdrecord *next;
+	unsigned int fp;
+	const unsigned char *ptr;
+} bdrecord_t;
+
+typedef struct s_bdfile {
+	const unsigned char *data, *top;
+	chastore_t cha;
+	unsigned int fphbits;
+	bdrecord_t **fphash;
+} bdfile_t;
+
+static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
+{
+	unsigned int fphbits;
+	int i, hsize;
+	const unsigned char *base, *data, *top;
+	bdrecord_t *brec;
+	bdrecord_t **fphash;
+
+	fphbits = hashbits(bufsize / BLK_SIZE + 1);
+	hsize = 1 << fphbits;
+	fphash = malloc(hsize * sizeof(bdrecord_t *));
+	if (!fphash)
+		return -1;
+	for (i = 0; i < hsize; i++)
+		fphash[i] = NULL;
+	cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
+
+	bdf->data = data = base = buf;
+	bdf->top = top = buf + bufsize;
+	data += (bufsize / BLK_SIZE) * BLK_SIZE;
+	if (data == top)
+		data -= BLK_SIZE;
+
+	for ( ; data >= base; data -= BLK_SIZE) {
+		brec = cha_alloc(&bdf->cha);
+		if (!brec) {
+			cha_free(&bdf->cha);
+			free(fphash);
+			return -1;
+		}
+		brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
+		brec->ptr = data;
+		i = HASH(brec->fp, fphbits);
+		brec->next = fphash[i];
+		fphash[i] = brec;
+	}
+
+	bdf->fphbits = fphbits;
+	bdf->fphash = fphash;
+
+	return 0;
+}
+
+static void delta_cleanup(bdfile_t *bdf)
+{
+	free(bdf->fphash);
+	cha_free(&bdf->cha);
+}
+
+#define COPYOP_SIZE(o, s) \
+    (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
+     !!(s & 0xff) + !!(s & 0xff00) + 1)
+
+void *diff_delta(void *from_buf, unsigned long from_size,
+		 void *to_buf, unsigned long to_size,
+		 unsigned long *delta_size)
+{
+	int i, outpos, outsize, inscnt, csize, msize, moff;
+	unsigned int fp;
+	const unsigned char *data, *top, *ptr1, *ptr2;
+	unsigned char *out, *orig;
+	bdrecord_t *brec;
+	bdfile_t bdf;
+
+	if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf))
+		return NULL;
+	
+	outpos = 0;
+	outsize = 8192;
+	out = malloc(outsize);
+	if (!out) {
+		delta_cleanup(&bdf);
+		return NULL;
+	}
+
+	data = to_buf;
+	top = to_buf + to_size;
+
+	/* store reference buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (from_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = from_size;
+		}
+		i++;
+		from_size >>= 8;
+	} while (from_size);
+
+	/* store target buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (to_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = to_size;
+		}
+		i++;
+		to_size >>= 8;
+	} while (to_size);
+
+	inscnt = 0;
+	moff = 0;
+	while (data < top) {
+		msize = 0;
+		fp = adler32(0, data, MIN(top - data, BLK_SIZE));
+		i = HASH(fp, bdf.fphbits);
+		for (brec = bdf.fphash[i]; brec; brec = brec->next) {
+			if (brec->fp == fp) {
+				csize = bdf.top - brec->ptr;
+				if (csize > top - data)
+					csize = top - data;
+				for (ptr1 = brec->ptr, ptr2 = data; 
+				     csize && *ptr1 == *ptr2;
+				     csize--, ptr1++, ptr2++);
+
+				csize = ptr1 - brec->ptr;
+				if (csize > msize) {
+					moff = brec->ptr - bdf.data;
+					msize = csize;
+					if (msize >= 0x10000) {
+						msize = 0x10000;
+						break;
+					}
+				}
+			}
+		}
+
+		if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+			if (!inscnt)
+				outpos++;
+			out[outpos++] = *data++;
+			inscnt++;
+			if (inscnt == 0x7f) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+		} else {
+			if (inscnt) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+
+			data += msize;
+			orig = out + outpos++;
+			i = 0x80;
+
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x02; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x04; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x08; }
+
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x10; }
+			msize >>= 8;
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
+
+			*orig = i;
+		}
+
+		/* next time around the largest possible output is 1 + 4 + 3 */
+		if (outpos > outsize - 8) {
+			void *tmp = out;
+			outsize = outsize * 3 / 2;
+			out = realloc(out, outsize);
+			if (!out) {
+				free(tmp);
+				delta_cleanup(&bdf);
+				return NULL;
+			}
+		}
+	}
+
+	if (inscnt)
+		out[outpos - inscnt - 1] = inscnt;
+
+	delta_cleanup(&bdf);
+	*delta_size = outpos;
+	return out;
+}
Index: git/delta.h
===================================================================
--- /dev/null
+++ git/delta.h
@@ -0,0 +1,6 @@
+extern void *diff_delta(void *from_buf, unsigned long from_size,
+			void *to_buf, unsigned long to_size,
+		        unsigned long *delta_size);
+extern void *patch_delta(void *src_buf, unsigned long src_size,
+			 void *delta_buf, unsigned long delta_size,
+			 unsigned long *dst_size);
Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -29,7 +29,7 @@
 	install $(PROG) $(SCRIPTS) $(HOME)/bin/
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
-	 tag.o date.o
+	 tag.o date.o diff-delta.o patch-delta.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h blob.h tree.h commit.h tag.h
 
@@ -63,6 +63,9 @@
 test-date: test-date.c date.o
 	$(CC) $(CFLAGS) -o $@ test-date.c date.o
 
+test-delta: test-delta.c diff-delta.o patch-delta.o
+	$(CC) $(CFLAGS) -o $@ $^
+
 git-%: %.c $(LIB_FILE)
 	$(CC) $(CFLAGS) -o $@ $(filter %.c,$^) $(LIBS)
 
Index: git/patch-delta.c
===================================================================
--- /dev/null
+++ git/patch-delta.c
@@ -0,0 +1,88 @@
+/*
+ * patch-delta.c:
+ * recreate a buffer from a source and the delta produced by diff-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include "delta.h"
+
+void *patch_delta(void *src_buf, unsigned long src_size,
+		  void *delta_buf, unsigned long delta_size,
+		  unsigned long *dst_size)
+{
+	const unsigned char *data, *top;
+	unsigned char *dst_buf, *out, cmd;
+	unsigned long size;
+	int i;
+
+	/* the smallest delta size possible is 6 bytes */
+	if (delta_size < 6)
+		return NULL;
+
+	data = delta_buf;
+	top = delta_buf + delta_size;
+
+	/* make sure the orig file size matches what we expect */
+	size = i = 0;
+	cmd = *data++;
+	while (cmd) {
+		if (cmd & 1)
+			size |= *data++ << i;
+		i += 8;
+		cmd >>= 1;
+	}
+	if (size != src_size)
+		return NULL;
+
+	/* now the result size */
+	size = i = 0;
+	cmd = *data++;
+	while (cmd) {
+		if (cmd & 1)
+			size |= *data++ << i;
+		i += 8;
+		cmd >>= 1;
+	}
+	dst_buf = malloc(size);
+	if (!dst_buf)
+		return NULL;
+
+	out = dst_buf;
+	while (data < top) {
+		cmd = *data++;
+		if (cmd & 0x80) {
+			unsigned long cp_off = 0, cp_size = 0;
+			const unsigned char *buf;
+			if (cmd & 0x01) cp_off = *data++;
+			if (cmd & 0x02) cp_off |= (*data++ << 8);
+			if (cmd & 0x04) cp_off |= (*data++ << 16);
+			if (cmd & 0x08) cp_off |= (*data++ << 24);
+			if (cmd & 0x10) cp_size = *data++;
+			if (cmd & 0x20) cp_size |= (*data++ << 8);
+			if (cp_size == 0) cp_size = 0x10000;
+			buf = (cmd & 0x40) ? dst_buf : src_buf;
+			memcpy(out, buf + cp_off, cp_size);
+			out += cp_size;
+		} else {
+			memcpy(out, data, cmd);
+			out += cmd;
+			data += cmd;
+		}
+	}
+
+	/* sanity check */
+	if (data != top || out - dst_buf != size) {
+		free(dst_buf);
+		return NULL;
+	}
+
+	*dst_size = size;
+	return dst_buf;
+}
Index: git/test-delta.c
===================================================================
--- /dev/null
+++ git/test-delta.c
@@ -0,0 +1,79 @@
+/*
+ * test-delta.c: test code to exercise diff-delta.c and patch-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include "delta.h"
+
+static const char *usage =
+	"test-delta (-d|-p) <from_file> <data_file> <out_file>";
+
+int main(int argc, char *argv[])
+{
+	int fd;
+	struct stat st;
+	void *from_buf, *data_buf, *out_buf;
+	unsigned long from_size, data_size, out_size;
+
+	if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) {
+		fprintf(stderr, "Usage: %s\n", usage);
+		return 1;
+	}
+
+	fd = open(argv[2], O_RDONLY);
+	if (fd < 0 || fstat(fd, &st)) {
+		perror(argv[2]);
+		return 1;
+	}
+	from_size = st.st_size;
+	from_buf = mmap(NULL, from_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (from_buf == MAP_FAILED) {
+		perror(argv[2]);
+		return 1;
+	}
+	close(fd);
+
+	fd = open(argv[3], O_RDONLY);
+	if (fd < 0 || fstat(fd, &st)) {
+		perror(argv[3]);
+		return 1;
+	}
+	data_size = st.st_size;
+	data_buf = mmap(NULL, data_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (data_buf == MAP_FAILED) {
+		perror(argv[3]);
+		return 1;
+	}
+	close(fd);
+
+	if (argv[1][1] == 'd')
+		out_buf = diff_delta(from_buf, from_size,
+				     data_buf, data_size, &out_size);
+	else
+		out_buf = patch_delta(from_buf, from_size,
+				      data_buf, data_size, &out_size);
+	if (!out_buf) {
+		fprintf(stderr, "delta operation failed (returned NULL)\n");
+		return 1;
+	}
+
+	fd = open (argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666);
+	if (fd < 0 || write(fd, out_buf, out_size) != out_size) {
+		perror(argv[4]);
+		return 1;
+	}
+
+	return 0;
+}
Index: git/sha1_file.c
===================================================================
--- git.orig/sha1_file.c
+++ git/sha1_file.c
@@ -8,6 +8,7 @@
  */
 #include <stdarg.h>
 #include "cache.h"
+#include "delta.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -224,6 +225,19 @@
 	if (map) {
 		buf = unpack_sha1_file(map, mapsize, type, size);
 		munmap(map, mapsize);
+		if (buf && !strcmp(type, "delta")) {
+			void *ref = NULL, *delta = buf;
+			unsigned long ref_size, delta_size = *size;
+			buf = NULL;
+			if (delta_size > 20)
+				ref = read_sha1_file(delta, type, &ref_size);
+			if (ref)
+				buf = patch_delta(ref, ref_size,
+						  delta+20, delta_size-20, 
+						  size);
+			free(delta);
+			free(ref);
+		}
 		return buf;
 	}
 	return NULL;
Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -13,7 +13,7 @@
 AR=ar
 
 SCRIPTS=git-apply-patch-script git-merge-one-file-script git-prune-script \
-	git-pull-script git-tag-script git-resolve-script
+	git-pull-script git-tag-script git-resolve-script git-deltafy-script
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
@@ -21,7 +21,8 @@
 	git-check-files git-ls-tree git-merge-base git-merge-cache \
 	git-unpack-file git-export git-diff-cache git-convert-cache \
 	git-http-pull git-rpush git-rpull git-rev-list git-mktag \
-	git-diff-tree-helper git-tar-tree git-local-pull git-write-blob
+	git-diff-tree-helper git-tar-tree git-local-pull git-write-blob \
+	git-mkdelta
 
 all: $(PROG)
 
@@ -95,6 +96,7 @@
 git-rpull: rsh.c pull.c
 git-rev-list: rev-list.c
 git-mktag: mktag.c
+git-mkdelta: mkdelta.c
 git-diff-tree-helper: diff-tree-helper.c
 git-tar-tree: tar-tree.c
 git-write-blob: write-blob.c
Index: git/mkdelta.c
===================================================================
--- /dev/null
+++ git/mkdelta.c
@@ -0,0 +1,283 @@
+/*
+ * Deltafication of a GIT database.
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "cache.h"
+#include "delta.h"
+
+static int replace_object(char *buf, unsigned long len, unsigned char *sha1,
+			  char *hdr, int hdrlen)
+{
+	char tmpfile[PATH_MAX];
+	int size;
+	char *compressed;
+	z_stream stream;
+	int fd;
+
+	snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory());
+	fd = mkstemp(tmpfile);
+	if (fd < 0)
+		return error("%s: %s\n", tmpfile, strerror(errno));
+	
+	/* Set it up */
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_BEST_COMPRESSION);
+	size = deflateBound(&stream, len+hdrlen);
+	compressed = xmalloc(size);
+
+	/* Compress it */
+	stream.next_out = compressed;
+	stream.avail_out = size;
+
+	/* First header.. */
+	stream.next_in = hdr;
+	stream.avail_in = hdrlen;
+	while (deflate(&stream, 0) == Z_OK)
+		/* nothing */;
+
+	/* Then the data itself.. */
+	stream.next_in = buf;
+	stream.avail_in = len;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		/* nothing */;
+	deflateEnd(&stream);
+	size = stream.total_out;
+
+	if (write(fd, compressed, size) != size) {
+		perror("unable to write file");
+		close(fd);
+		unlink(tmpfile);
+		return -1;
+	}
+	fchmod(fd, 0444);
+	close(fd);
+
+	if (rename(tmpfile, sha1_file_name(sha1))) {
+		perror("unable to replace original object");
+		unlink(tmpfile);
+		return -1;
+	}
+	return 0;
+}
+
+static int write_delta_file(char *buf, unsigned long len,
+			    unsigned char *sha1_ref, unsigned char *sha1_trg)
+{
+	char hdr[50];
+	int hdrlen;
+
+	/* Generate the header + sha1 of reference for delta */
+	hdrlen = sprintf(hdr, "delta %lu", len+20)+1;
+	memcpy(hdr + hdrlen, sha1_ref, 20);
+	hdrlen += 20;
+
+	return replace_object(buf, len, sha1_trg, hdr, hdrlen);
+}
+
+static int replace_sha1_file(char *buf, unsigned long len,
+			     char *type, unsigned char *sha1)
+{
+	char hdr[50];
+	int hdrlen;
+
+	hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
+	return replace_object(buf, len, sha1, hdr, hdrlen);
+}
+
+static void *get_buffer(unsigned char *sha1, char *type, unsigned long *size)
+{
+	unsigned long mapsize;
+	void *map = map_sha1_file(sha1, &mapsize);
+	if (map) {
+		void *buffer = unpack_sha1_file(map, mapsize, type, size);
+		munmap(map, mapsize);
+		if (buffer)
+			return buffer;
+	}
+	error("unable to get object %s", sha1_to_hex(sha1));
+	return NULL;
+}
+
+static void *expand_delta(void *delta, unsigned long delta_size, char *type,
+			  unsigned long *size, unsigned int *depth, char *head)
+{
+	void *buf = NULL;
+	*depth++;
+	if (delta_size < 20) {
+		error("delta object is bad");
+		free(delta);
+	} else {
+		unsigned long ref_size;
+		void *ref = get_buffer(delta, type, &ref_size);
+		if (ref && !strcmp(type, "delta"))
+			ref = expand_delta(ref, ref_size, type, &ref_size,
+					   depth, head);
+		else
+			memcpy(head, delta, 20);
+		if (ref)
+			buf = patch_delta(ref, ref_size, delta+20,
+					  delta_size-20, size);
+		free(ref);
+		free(delta);
+	}
+	return buf;
+}
+
+static char *mkdelta_usage =
+"mkdelta [ --max-depth=N ] <reference_sha1> <target_sha1> [ <next_sha1> ... ]";
+
+int main(int argc, char **argv)
+{
+	unsigned char sha1_ref[20], sha1_trg[20], head_ref[20], head_trg[20];
+	char type_ref[20], type_trg[20];
+	void *buf_ref, *buf_trg, *buf_delta;
+	unsigned long size_ref, size_trg, size_orig, size_delta;
+	unsigned int depth_ref, depth_trg, depth_max = -1;
+	int i, verbose = 0;
+
+	for (i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "-v")) {
+			verbose = 1;
+		} else if (!strcmp(argv[i], "-d") && i+1 < argc) {
+			depth_max = atoi(argv[++i]);
+		} else if (!strncmp(argv[i], "--max-depth=", 12)) {
+			depth_max = atoi(argv[i]+12);
+		} else
+			break;
+	}
+
+	if (i + (depth_max != 0) >= argc)
+		usage(mkdelta_usage);
+
+	if (get_sha1(argv[i], sha1_ref))
+		die("bad sha1 %s", argv[i]);
+	depth_ref = 0;
+	buf_ref = get_buffer(sha1_ref, type_ref, &size_ref);
+	if (buf_ref && !strcmp(type_ref, "delta"))
+		buf_ref = expand_delta(buf_ref, size_ref, type_ref,
+				       &size_ref, &depth_ref, head_ref);
+	else
+		memcpy(head_ref, sha1_ref, 20);
+	if (!buf_ref)
+		die("unable to obtain initial object %s", argv[i]);
+
+	if (depth_ref > depth_max) {
+		if (replace_sha1_file(buf_ref, size_ref, type_ref, sha1_ref))
+			die("unable to restore %s", argv[i]);
+		if (verbose)
+			printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
+		depth_ref = 0;
+	}
+
+	while (++i < argc) {
+		if (get_sha1(argv[i], sha1_trg))
+			die("bad sha1 %s", argv[i]);
+		depth_trg = 0;
+		buf_trg = get_buffer(sha1_trg, type_trg, &size_trg);
+		if (buf_trg && !size_trg) {
+			if (verbose)
+				printf("skip    %s (object is empty)\n", argv[i]);
+			continue;
+		}
+		size_orig = size_trg;
+		if (buf_trg && !strcmp(type_trg, "delta")) {
+			if (!memcmp(buf_trg, sha1_ref, 20)) {
+				/* delta already in place */
+				depth_ref++;
+				memcpy(sha1_ref, sha1_trg, 20);
+				buf_ref = patch_delta(buf_ref, size_ref,
+						      buf_trg+20, size_trg-20,
+						      &size_ref);
+				if (!buf_ref)
+					die("unable to apply delta %s", argv[i]);
+				if (depth_ref > depth_max) {
+					if (replace_sha1_file(buf_ref, size_ref,
+							      type_ref, sha1_ref))
+						die("unable to restore %s", argv[i]);
+					if (verbose)
+						printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
+					depth_ref = 0;
+					continue;
+				}
+				if (verbose)
+					printf("skip    %s (delta already in place)\n", argv[i]);
+				continue;
+			}
+			buf_trg = expand_delta(buf_trg, size_trg, type_trg,
+					       &size_trg, &depth_trg, head_trg);
+		} else
+			memcpy(head_trg, sha1_trg, 20);
+		if (!buf_trg)
+			die("unable to read target object %s", argv[i]);
+
+		if (depth_trg > depth_max) {
+			if (replace_sha1_file(buf_trg, size_trg, type_trg, sha1_trg))
+				die("unable to restore %s", argv[i]);
+			if (verbose)
+				printf("undelta %s (depth was %d)\n", argv[i], depth_trg);
+			depth_trg = 0;
+			size_orig = size_trg;
+		}
+
+		if (depth_max == 0)
+			goto skip;
+
+		if (strcmp(type_ref, type_trg))
+			die("type mismatch for object %s", argv[i]);
+
+		if (!size_ref) {
+			if (verbose)
+				printf("skip    %s (initial object is empty)\n", argv[i]);
+			goto skip;
+		}
+		
+		depth_ref++;
+		if (depth_ref > depth_max) {
+			if (verbose)
+				printf("skip    %s (exceeding max link depth)\n", argv[i]);
+			goto skip;
+		}
+
+		if (!memcmp(head_ref, sha1_trg, 20)) {
+			if (verbose)
+				printf("skip    %s (would create a loop)\n", argv[i]);
+			goto skip;
+		}
+
+		buf_delta = diff_delta(buf_ref, size_ref, buf_trg, size_trg, &size_delta);
+		if (!buf_delta)
+			die("out of memory");
+
+		if (size_delta+20 < size_orig) {
+			if (write_delta_file(buf_delta, size_delta,
+					     sha1_ref, sha1_trg))
+				die("unable to write delta for %s", argv[i]);
+			free(buf_delta);
+			if (verbose)
+				printf("delta   %s (size=%ld.%02ld%%, depth=%d)\n",
+				       argv[i], (size_delta+20)*100 / size_trg,
+				       ((size_delta+20)*10000 / size_trg)%100,
+				       depth_ref);
+		} else {
+			free(buf_delta);
+			if (verbose)
+				printf("skip    %s (original is smaller)\n", argv[i]);
+			skip:
+			depth_ref = depth_trg;
+			memcpy(head_ref, head_trg, 20);
+		}
+
+		free(buf_ref);
+		buf_ref = buf_trg;
+		size_ref = size_trg;
+		memcpy(sha1_ref, sha1_trg, 20);
+	}
+
+	return 0;
+}
Index: git/mktag.c
===================================================================
--- git.orig/mktag.c
+++ git/mktag.c
@@ -25,20 +25,14 @@
 static int verify_object(unsigned char *sha1, const char *expected_type)
 {
 	int ret = -1;
-	unsigned long mapsize;
-	void *map = map_sha1_file(sha1, &mapsize);
+	char type[100];
+	unsigned long size;
+	void *buffer = read_sha1_file(sha1, type, &size);
 
-	if (map) {
-		char type[100];
-		unsigned long size;
-		void *buffer = unpack_sha1_file(map, mapsize, type, &size);
-
-		if (buffer) {
-			if (!strcmp(type, expected_type))
-				ret = check_sha1_signature(sha1, buffer, size, type);
-			free(buffer);
-		}
-		munmap(map, mapsize);
+	if (buffer) {
+		if (!strcmp(type, expected_type))
+			ret = check_sha1_signature(sha1, buffer, size, type);
+		free(buffer);
 	}
 	return ret;
 }
Index: git/git-deltafy-script
===================================================================
--- /dev/null
+++ git/git-deltafy-script
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+# Script to deltafy an entire GIT repository based on the commit list.
+# The most recent version of a file is the reference and the previous version
+# is changed into a delta from that most recent version. And so on for
+# successive versions going back in time.
+#
+# The -d argument allows to provide a limit on the delta chain depth.
+# If 0 is passed then everything is undeltafied.
+
+set -e
+
+depth=
+[ "$1" == "-d" ] && depth="--max-depth=$2" && shift 2
+
+curr_file=""
+
+git-rev-list HEAD |
+git-diff-tree -r --stdin |
+sed -n '/^\*/ s/^.*->\(.\{41\}\)\(.*\)$/\2 \1/p' | sort | uniq |
+while read file sha1; do
+	if [ "$file" == "$curr_file" ]; then
+		list="$list $sha1"
+	else
+		if [ "$list" ]; then
+			echo "Processing $curr_file"
+			echo "$head $list" | xargs git-mkdelta $depth -v
+		fi
+		curr_file="$file"
+		list=""
+		head="$sha1"
+	fi
+done

^ permalink raw reply

* [RFC] Support projects including other projects
From: Daniel Barkalow @ 2005-05-12  4:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Petr Baudis, Linus Torvalds

I've come up with a way to handle projects like cogito which are based on
other projects. I think that it actually solves the real problem with such
projects, and it is actually very simple.

The problem that such projects run into, especially while both the core
and the non-core projects are in a state of substantial flux and when the 
non-core developer(s) contribute needed changes to the core, is that the
two projects not only have to be tracked, they have to be kept in 
sync. That is, a particular version of cogito requires a particular
version of git. There is a bit of convenience to having the tools
magically do the right thing when you check out the child project, but the
thing that really requires tool support is that you need to be able to
find the version of git-pb which matches the version of cogito you're
trying to build (and you might be searching the history for where a bug
was introduced, so you may not be able to use the latest of either).

The solution is to add a header to commits: "include {hash}", which simply
says that the given hash, which is from the core project, is the commit
needed to build this commit of the non-core project. This comes from an
argument to commit-tree ("-I", perhaps), and the parsing code needs to
identify the reference so that fsck-cache stays happy.

Git doesn't do anything more; wrapping layers would be able to take care
of the rest. When the wrapping layer determines that you are checking out
a commit with an include header, it also checks out the included commit,
using a different index file. The core treats everything as if you had a
bunch of non-tracked files in the directory (those being the things in the
other project). When you commit, it first commits any includes (if
needed), identifies the resulting core head, and passes that to the
include for the final result.

It seems to me like this should work perfectly. The one weakness is that
it's quite annoying to do by hand, since you have to simultaneously track
two index files and remember to pass the argument to commit-tree each
time. (Also, it means that you'd ideally pull git-pb from the cogito
repository with a client that ignores things not reachable from your head,
although Petr could still just copy and prune to match the current
situation).

I've written up the git changes needed, if people are interested in the
patch.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox