Git development
 help / color / mirror / Atom feed
* Re: Careful object writing..
From: Linus Torvalds @ 2005-05-03 22:13 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Git Mailing List
In-Reply-To: <20050503205957.GA25253@delft.aura.cs.cmu.edu>



On Tue, 3 May 2005, Jan Harkes wrote:
> 
> Short summary:
> 
>     rc = link(old, new);
>     if (rc == -1 && errno == EXDEV)
> 	rc = rename(old, new);

Ok, that is safe enough. Will do.

> On Coda, the cross-directory link fails, the following cross-directory
> rename will work fine.  On a normal filesystem, if the link fails with
> EXDEV, the rename will fail with the same.

Yup. I do suspect that since you handle the rename anyway, you probably 
could handle the git link/unlink patterns too, but it's easy enough to 
just do the rename fallback in git itself.

The only reason not to use rename in the first place is literally just to 
be able to check for collisions. Which we don't actually _do_ right now, 
but I like to be able to do so in theory.

		Linus

^ permalink raw reply

* [PATCH] Optimize diff-cache -p --cached
From: Junio C Hamano @ 2005-05-03 22:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This patch optimizes "diff-cache -p --cached" by avoiding to
inflate blobs to create temporary files when either the blob
recorded in the cache or in the compared tree matches the
corresponding file in the work tree.

Here is an informal benchmark on my Duron 750.  The tests were
done by running unpatched and patched alternately number of
times, and these are the last three pairs:

 real 0m0.738s user 0m0.630s sys 0m0.100s unpatched
 real 0m0.695s user 0m0.590s sys 0m0.100s patched
 real 0m0.733s user 0m0.560s sys 0m0.170s unpatched
 real 0m0.705s user 0m0.590s sys 0m0.110s patched
 real 0m0.732s user 0m0.550s sys 0m0.180s unpatched
 real 0m0.692s user 0m0.590s sys 0m0.100s patched

The benchmark was run in a fully checked out linux-2.6 GIT
repository.  The work tree matched one commit, and comparison
was done against another commit which was 20-or-so commits
before the work tree.

$ new=a6ad57fb4b5e9d68553f4440377b99f75588fa88
$ old=cd63499cbe37e53e6cc084c8a35d911a4613c797
$ git-read-tree $new
$ git-checkout-cache -f -a
$ git-update-cache --refresh
$ git-rev-tree $new ^$old | wc -l
19
$ export GIT_EXTERNAL_DIFF=true
$ time git-diff-cache -p --cached $old

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff-tree-helper.c |    6 ++--
diff.c             |   67 ++++++++++++++++++++++++++++++++++++++++++-----------
diff.h             |   13 +++-------
3 files changed, 62 insertions(+), 24 deletions(-)

# - 2: Use GIT_EXTERNAL_DIFF exit status to terminate diff early.
# + 5: diff-cache --cached case optimization.
--- a/diff-tree-helper.c
+++ b/diff-tree-helper.c
@@ -35,7 +35,7 @@ static int parse_oneside_change(const ch
 	if (strncmp(cp, "\tblob\t", 6))
 		return -1;
 	cp += 6;
-	if (get_sha1_hex(cp, one->u.sha1))
+	if (get_sha1_hex(cp, one->blob_sha1))
 		return -1;
 	cp += 40;
 	if (*cp++ != '\t')
@@ -83,13 +83,13 @@ static int parse_diff_tree_output(const 
 		if (strncmp(cp, "\tblob\t", 6))
 			return -1;
 		cp += 6;
-		if (get_sha1_hex(cp, old.u.sha1))
+		if (get_sha1_hex(cp, old.blob_sha1))
 			return -1;
 		cp += 40;
 		if (strncmp(cp, "->", 2))
 			return -1;
 		cp += 2;
-		if (get_sha1_hex(cp, new.u.sha1))
+		if (get_sha1_hex(cp, new.blob_sha1))
 			return -1;
 		cp += 40;
 		if (*cp++ != '\t')
--- a/diff.c
+++ b/diff.c
@@ -132,11 +132,50 @@ static void builtin_diff(const char *nam
 	execlp("/bin/sh","sh", "-c", cmd, NULL);
 }
 
+/*
+ * Given a name and sha1 pair, if the dircache tells us the file in
+ * the work tree has that object contents, return true, so that
+ * prepare_temp_file() does not have to inflate and extract.
+ */
+static int work_tree_matches(const char *name, const unsigned char *sha1)
+{
+	struct cache_entry *ce;
+	struct stat st;
+	int pos, len;
+	
+	/* We do not read the cache ourselves here, because the
+	 * benchmark with my previous version that always reads cache
+	 * shows that it makes things worse for diff-tree comparing
+	 * two linux-2.6 kernel trees in an already checked out work
+	 * tree.  This is because most diff-tree comparison deals with
+	 * only a small number of files, while reading the cache is
+	 * expensive for a large project, and its cost outweighs the
+	 * savings we get by not inflating the object to a temporary
+	 * file.  Practically, this code only helps when we are used
+	 * by diff-cache --cached, which does read the cache before
+	 * calling us.
+	 */ 
+	if (!active_cache)
+		return 0;
+
+	len = strlen(name);
+	pos = cache_name_pos(name, len);
+	if (pos < 0)
+		return 0;
+	ce = active_cache[pos];
+	if ((stat(name, &st) < 0) ||
+	    cache_match_stat(ce, &st) ||
+	    memcmp(sha1, ce->sha1, 20))
+		return 0;
+	return 1;
+}
+
 static void prepare_temp_file(const char *name,
 			      struct diff_tempfile *temp,
 			      struct diff_spec *one)
 {
 	static unsigned char null_sha1[20] = { 0, };
+	int use_work_tree = 0;
 
 	if (!one->file_valid) {
 	not_a_valid_file:
@@ -150,20 +189,22 @@ static void prepare_temp_file(const char
 	}
 
 	if (one->sha1_valid &&
-	    !memcmp(one->u.sha1, null_sha1, sizeof(null_sha1))) {
-		one->sha1_valid = 0;
-		one->u.name = name;
-	}
+	    (!memcmp(one->blob_sha1, null_sha1, sizeof(null_sha1)) ||
+	     work_tree_matches(name, one->blob_sha1)))
+		use_work_tree = 1;
 
-	if (!one->sha1_valid) {
+	if (!one->sha1_valid || use_work_tree) {
 		struct stat st;
-		temp->name = one->u.name;
+		temp->name = name;
 		if (stat(temp->name, &st) < 0) {
 			if (errno == ENOENT)
 				goto not_a_valid_file;
 			die("stat(%s): %s", temp->name, strerror(errno));
 		}
-		strcpy(temp->hex, sha1_to_hex(null_sha1));
+		if (!one->sha1_valid)
+			strcpy(temp->hex, sha1_to_hex(null_sha1));
+		else
+			strcpy(temp->hex, sha1_to_hex(one->blob_sha1));
 		sprintf(temp->mode, "%06o",
 			S_IFREG |ce_permissions(st.st_mode));
 	}
@@ -173,10 +214,10 @@ static void prepare_temp_file(const char
 		char type[20];
 		unsigned long size;
 
-		blob = read_sha1_file(one->u.sha1, type, &size);
+		blob = read_sha1_file(one->blob_sha1, type, &size);
 		if (!blob || strcmp(type, "blob"))
 			die("unable to read blob object for %s (%s)",
-			    name, sha1_to_hex(one->u.sha1));
+			    name, sha1_to_hex(one->blob_sha1));
 
 		strcpy(temp->tmp_path, ".diff_XXXXXX");
 		fd = mkstemp(temp->tmp_path);
@@ -187,7 +228,7 @@ static void prepare_temp_file(const char
 		close(fd);
 		free(blob);
 		temp->name = temp->tmp_path;
-		strcpy(temp->hex, sha1_to_hex(one->u.sha1));
+		strcpy(temp->hex, sha1_to_hex(one->blob_sha1));
 		temp->hex[40] = 0;
 		sprintf(temp->mode, "%06o", one->mode);
 	}
@@ -285,7 +326,7 @@ void diff_addremove(int addremove, unsig
 	char concatpath[PATH_MAX];
 	struct diff_spec spec[2], *one, *two;
 
-	memcpy(spec[0].u.sha1, sha1, 20);
+	memcpy(spec[0].blob_sha1, sha1, 20);
 	spec[0].mode = mode;
 	spec[0].sha1_valid = spec[0].file_valid = 1;
 	spec[1].file_valid = 0;
@@ -310,9 +351,9 @@ void diff_change(unsigned old_mode, unsi
 	char concatpath[PATH_MAX];
 	struct diff_spec spec[2];
 
-	memcpy(spec[0].u.sha1, old_sha1, 20);
+	memcpy(spec[0].blob_sha1, old_sha1, 20);
 	spec[0].mode = old_mode;
-	memcpy(spec[1].u.sha1, new_sha1, 20);
+	memcpy(spec[1].blob_sha1, new_sha1, 20);
 	spec[1].mode = new_mode;
 	spec[0].sha1_valid = spec[0].file_valid = 1;
 	spec[1].sha1_valid = spec[1].file_valid = 1;
--- a/diff.h
+++ b/diff.h
@@ -20,15 +20,12 @@ extern void diff_unmerge(const char *pat
 /* These are for diff-tree-helper */
 
 struct diff_spec {
-	union {
-		const char *name;       /* path on the filesystem */
-		unsigned char sha1[20]; /* blob object ID */
-	} u;
+	unsigned char blob_sha1[20];
 	unsigned short mode;	 /* file mode */
-	unsigned sha1_valid : 1; /* if true, use u.sha1 and trust mode.
-				  * (however with a NULL SHA1, read them
-				  * from the file!).
-				  * if false, use u.name and read mode from
+	unsigned sha1_valid : 1; /* if true, use blob_sha1 and trust mode;
+				  * however with a NULL SHA1, read them
+				  * from the file system.
+				  * if false, use the name and read mode from
 				  * the filesystem.
 				  */
 	unsigned file_valid : 1; /* if false the file does not even exist */


^ permalink raw reply

* Re: git and symlinks as tracked content
From: Andreas Gal @ 2005-05-03 21:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Kay Sievers, git
In-Reply-To: <7vr7got2tz.fsf@assigned-by-dhcp.cox.net>


Whether you use an explicit "dev" type or an implicit "dev" type that 
calls itself "blob" and uses a magic mode flag to tell checkout that it 
needs special treatment doesn't make a difference (whatever you 
prefer, really). I was only trying to make the point that hashes should remain 
hashes and not become a placeholder for minors/majors. However, as 
somebody already suggested, the entire issue is probably moot. When was the last 
time you tried to version control /dev? ;)

Andreas

On Tue, 3 May 2005, Junio C Hamano wrote:

> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> On Tue, 3 May 2005, Andreas Gal wrote:
> 
> >> Yuck. Thats really ugly. Right now all files have a uniform
> >> touch to them.  For every hash you can locate the file,
> >> determine its type/tag, unpack it, and check the SHA1
> >> hash. The proposal above breaks all that. Why not just
> >> introduce a new object type "dev" and put major minor in
> >> there. It will still always hash to the same SHA1 hash value,
> >> but fits much better in the overall design.
> 
> LT> Hey, I don't personally care that much. I don't see anybody using 
> LT> character device nodes in the kernel tree, and I don't think most SCM's 
> LT> support stuff like that anyway ;)
> 
> LT> If you want to make it a blob (and have a use for it), go wild. 
> 
> Introducing "dev" type, as Andreas suggests, is wrong.  This
> this should be done in the same way as you suggested for the
> symlink case.  Store a blob object with those chrdev or blkdev
> modes whose contents are of form:
> 
>     major=14
>     minor=4
>     owner=root
>     group=audio
>     perm=0660
> 
> This would impact the diff side least, and for the cache side it
> does not matter in storing and merging.  checkout-cache still
> needs to know about this.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: PATCH: diff-cache.c:51: `return' with a value, in function returning void
From: Alex Riesen @ 2005-05-03 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vmzrct2ot.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano, Tue, May 03, 2005 23:33:22 +0200:
> >>>>> "AR" == Alex Riesen <fork0@users.sourceforge.net> writes:
> 
> AR> gcc -g -O2 -Wall  '-DSHA1_HEADER=<openssl/sha.h>' -o git-diff-cache diff-cache.c libgit.a -lz -lssl
> AR> diff-cache.c: In function `show_new_file':
> AR> diff-cache.c:51: warning: `return' with a value, in function returning void
> 
> This one was fixed in the Linus tree long time ago.
> 

Oh, sorry: it was cogito today.


^ permalink raw reply

* Re: [PATCH 0/3] cogito spec file updates
From: Petr Baudis @ 2005-05-03 21:44 UTC (permalink / raw)
  To: Chris Wright, Mark Allen; +Cc: git
In-Reply-To: <20050503213216.GF5324@shell0.pdx.osdl.net>

Dear diary, on Tue, May 03, 2005 at 11:32:16PM CEST, I got a letter
where Chris Wright <chrisw@osdl.org> told me that...
> * Petr Baudis (pasky@ucw.cz) wrote:
> > I might however have some private mkrelease.sh script which would do
> > 
> > 	echo "$1" >VERSION
> > 	update-spec-file
> > 	echo "$1" | cg-commit
> > 	cg-tag "$1"
> > 
> > or something.
> 
> If you had a Makefile target like the one Mark Allen just posted, would
> you use that?

No, I definitively won't use that:

(i)  The created tarball is Evil. It does not contain everything in a
subdirectory.

(ii) I don't want to have yet another place which I'm supposed to keep
up to date with regard to what should be packaged. That's one of the
things the SCM is for, and it already records precisely this
information, so I'd actually use it. cg-export foo.tar.gz is the perfect
tool for this.

> I'd simply add a git.spec: rule, and have git.spec built
> as a dependency for tarball.

I wouldn't accept this neither. If git.spec is already version
controlled, it should be up-to-date in the version control. Therefore,
you need to update it at the time of release, not at the time of
generating the tarball.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: 'read-tree -m head' vs 'read-tree head'
From: Petr Baudis @ 2005-05-03 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Glanzmann, GIT
In-Reply-To: <7vbr7sw2aj.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Tue, May 03, 2005 at 09:13:40PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> That said, I've been wondering if "git-read-tree -m <tree>"
> always does the same thing (but only making the operation
> afterwards faster) as "git-read-tree <tree>".  That is, if there
> is a valid use case where you would want to use it without "-m"
> because "-m" does something wrong.  If there is no such valid
> use case probably we should always do "-m" version if we are
> reading only one tree, practically deprecating "-m" flag to the
> same status as "-r" flag to git-diff-cache.
> 
> However, I have not had time to think things through and have
> not bugged Linus about it myself.

-m fails when your cache file is missing/corrupted. Not that it cannot
be fixed, just remember to fix it if you are going to do what you
described.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: PATCH: diff-cache.c:51: `return' with a value, in function returning void
From: Junio C Hamano @ 2005-05-03 21:33 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Petr Baudis, git
In-Reply-To: <20050503195601.GA4165@steel.home>

>>>>> "AR" == Alex Riesen <fork0@users.sourceforge.net> writes:

AR> gcc -g -O2 -Wall  '-DSHA1_HEADER=<openssl/sha.h>' -o git-diff-cache diff-cache.c libgit.a -lz -lssl
AR> diff-cache.c: In function `show_new_file':
AR> diff-cache.c:51: warning: `return' with a value, in function returning void

This one was fixed in the Linus tree long time ago.


^ permalink raw reply

* Re: [PATCH 0/3] cogito spec file updates
From: Chris Wright @ 2005-05-03 21:32 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Chris Wright, git
In-Reply-To: <20050503212142.GB15995@pasky.ji.cz>

* Petr Baudis (pasky@ucw.cz) wrote:
> Dear diary, on Tue, May 03, 2005 at 09:35:36PM CEST, I got a letter
> where Chris Wright <chrisw@osdl.org> told me that...
> > * Chris Wright (chrisw@osdl.org) wrote:
> > > Here's the outstanding updates for the spec file, up to 0.8-2 which is
> > > the latest on kernel.org.
> > > 
> > > 	http://www.kernel.org/pub/software/scm/cogito/RPMS/
> > 
> > What's your method for creating a release tarball?  If it were formalized
> > (i.e. Makefile rule), then it'd be simple to use VERSION to drive the
> > spec file, and it'd only need updating for real content changes (similar
> > to what Kay did).
> 
> For now, I do it so seldom that I just manually do
> 
> 	cg-log >Changelog
> 	cg-export ~/cogito-0.9
> 	cp Changelog ~/cogito-0.9
> 	cd ~
> 	tar cvvfz cogito-0.9.tar.gz cogito-0.9
> 
> OTOH, I'd like to change this all to just
> 
> 	cg-export ~/cogito-0.9.tar.gz
> 
> when I get to merge the relevant patches; I'm not sure there is so much
> of a value to bundle the Changelog; just get the git tree and do cg-log
> on your own, or use the web interface.
> 
> I might however have some private mkrelease.sh script which would do
> 
> 	echo "$1" >VERSION
> 	update-spec-file
> 	echo "$1" | cg-commit
> 	cg-tag "$1"
> 
> or something.

If you had a Makefile target like the one Mark Allen just posted, would
you use that?  I'd simply add a git.spec: rule, and have git.spec built
as a dependency for tarball.

> BTW, did you have a particular reason to split the .spec file updates to
> three parts? I think it doesn't make much sense and it'd be probably
> enough to update it at once, when we are not doing it at the right time
> anyway.

Just because that's how I had them locally, and it keeps the file
revision history intact (I don't mind how you prefer to pull them in).

thanks,
-chris

^ permalink raw reply

* Re: [PATCH] cogito: Add cg-undo command
From: Petr Baudis @ 2005-05-03 21:32 UTC (permalink / raw)
  To: Matt Porter; +Cc: git
In-Reply-To: <20050503100624.A29266@cox.net>

Dear diary, on Tue, May 03, 2005 at 07:06:25PM CEST, I got a letter
where Matt Porter <mporter@kernel.crashing.org> told me that...
> Index: cg-help
> ===================================================================
> --- a1aff2a6748c0c0d08058c7d74503e724abc5d03/cg-help  (mode:100755 sha1:1f5d2d79b67490d44ce0f575ff9a4b80134ea47f)
> +++ 023d9a7929d2f933d8e008f1679f13a58f7b1229/cg-help  (mode:100755 sha1:c7dc8f3e03895374cd0dae544570a37a459c2466)
> @@ -43,6 +43,7 @@
>  	cg-status
>  	cg-tag		TNAME [COMMIT_ID]
>  	cg-tag-ls
> +	cg-undo		[COMMIT_ID]

It doesn't seem very optional now.

>  	cg-update	[BNAME]
>  	cg-version
>  
> Index: cg-undo
> ===================================================================
> --- /dev/null  (tree:a1aff2a6748c0c0d08058c7d74503e724abc5d03)
> +++ 023d9a7929d2f933d8e008f1679f13a58f7b1229/cg-undo  (mode:100755 sha1:7fd6d89158fb5aeee42aa05a93f2c81884d9bd34)
> @@ -0,0 +1,20 @@
> +#!/usr/bin/env bash
> +#
> +# Undo a commit or a series of commits
> +# Copyright (C) Matt Porter, 2005
> +#
> +# Takes a commit ID which is the earliest commit to be
> +# removed from the repository.
> +
> +. cg-Xlib
> +
> +PARENT=`git-cat-file commit $1 | grep parent | cut -f 2 -d " "`

What's wrong with parent-id?

> +echo "Undo from $1 to current HEAD"
> +echo "Reset HEAD to $PARENT"

You talk way too much, I think. I'd just do

	echo "Rewinding $HEAD -> $PARENT" >&2

> +echo "$PARENT" > .git/HEAD
> +git-read-tree -m "$PARENT" || {
> +	echo >&2 "$PARENT: bad commit"
> +	exit 1
> +}
> +git-checkout-cache -f -a
> +git-update-cache --refresh

You really don't want to do this if the tree has any local
modifications.


Please make sure the commit you are rewinding to is an ancestor of your
current HEAD (there should be something like separate cg-branch-rm for
killing enemy branches).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: git and symlinks as tracked content
From: Junio C Hamano @ 2005-05-03 21:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Gal, Kay Sievers, git
In-Reply-To: <Pine.LNX.4.58.0505031304140.26698@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Tue, 3 May 2005, Andreas Gal wrote:

>> Yuck. Thats really ugly. Right now all files have a uniform
>> touch to them.  For every hash you can locate the file,
>> determine its type/tag, unpack it, and check the SHA1
>> hash. The proposal above breaks all that. Why not just
>> introduce a new object type "dev" and put major minor in
>> there. It will still always hash to the same SHA1 hash value,
>> but fits much better in the overall design.

LT> Hey, I don't personally care that much. I don't see anybody using 
LT> character device nodes in the kernel tree, and I don't think most SCM's 
LT> support stuff like that anyway ;)

LT> If you want to make it a blob (and have a use for it), go wild. 

Introducing "dev" type, as Andreas suggests, is wrong.  This
this should be done in the same way as you suggested for the
symlink case.  Store a blob object with those chrdev or blkdev
modes whose contents are of form:

    major=14
    minor=4
    owner=root
    group=audio
    perm=0660

This would impact the diff side least, and for the cache side it
does not matter in storing and merging.  checkout-cache still
needs to know about this.


^ permalink raw reply

* Re: [PATCH 0/3] cogito spec file updates
From: Petr Baudis @ 2005-05-03 21:21 UTC (permalink / raw)
  To: Chris Wright; +Cc: git
In-Reply-To: <20050503193536.GE5324@shell0.pdx.osdl.net>

Dear diary, on Tue, May 03, 2005 at 09:35:36PM CEST, I got a letter
where Chris Wright <chrisw@osdl.org> told me that...
> * Chris Wright (chrisw@osdl.org) wrote:
> > Here's the outstanding updates for the spec file, up to 0.8-2 which is
> > the latest on kernel.org.
> > 
> > 	http://www.kernel.org/pub/software/scm/cogito/RPMS/
> 
> What's your method for creating a release tarball?  If it were formalized
> (i.e. Makefile rule), then it'd be simple to use VERSION to drive the
> spec file, and it'd only need updating for real content changes (similar
> to what Kay did).

For now, I do it so seldom that I just manually do

	cg-log >Changelog
	cg-export ~/cogito-0.9
	cp Changelog ~/cogito-0.9
	cd ~
	tar cvvfz cogito-0.9.tar.gz cogito-0.9

OTOH, I'd like to change this all to just

	cg-export ~/cogito-0.9.tar.gz

when I get to merge the relevant patches; I'm not sure there is so much
of a value to bundle the Changelog; just get the git tree and do cg-log
on your own, or use the web interface.

I might however have some private mkrelease.sh script which would do

	echo "$1" >VERSION
	update-spec-file
	echo "$1" | cg-commit
	cg-tag "$1"

or something.


BTW, did you have a particular reason to split the .spec file updates to
three parts? I think it doesn't make much sense and it'd be probably
enough to update it at once, when we are not doing it at the right time
anyway.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: commit-id fails after cg-init
From: Joel Becker @ 2005-05-03 21:14 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git
In-Reply-To: <1115150585.28520.11.camel@dv>

On Tue, May 03, 2005 at 04:03:05PM -0400, Pavel Roskin wrote:
> So, cg-init created an empty .git/refs/heads/master and made .git/HEAD a
> symlink to it.  Now, commit-id reads that file and gets confused.
> 
> If anybody has an idea what to put to .git/refs/heads/master please
> speak up so that cg-init could be fixed.

	Well, cg-init in this case creates no objects.  I'd say,
instead, it should create an empty tree object (representing a project
with no files) and commit that.  That would be your initial commit, and
would put something valid in heads/master.

Joel

-- 

"The cynics are right nine times out of ten."  
        - H. L. Mencken

Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply

* [PATCH] release tarball make targets
From: Mark Allen @ 2005-05-03 21:15 UTC (permalink / raw)
  To: chrisw, pasky; +Cc: git

This patch is to support automating RPM and spec file generation.

Cheers,

--Mark

Signed-off-by: Mark Allen <mrallen1@yahoo.com>

Index: Makefile
===================================================================
--- aa6233be6d1b8bf42797c409a7c23b50593afc99/Makefile  (mode:100644
sha1:6ae0afa0208a8f755d383281a6d049a4ef90fe63)
+++ uncommitted/Makefile  (mode:100644 sha1:181aa6d6c34fb771671a7ee8038f9cf18972713a)
@@ -55,6 +55,10 @@
 
 VERSION= VERSION
 
+DIRS=mozilla-sha1 contrib ppc
+
+DOCS=README README.reference
+
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
 	 tag.o date.o
 LIB_FILE=libgit.a
@@ -136,3 +140,11 @@
 
 backup: clean
 	cd .. ; tar czvf dircache.tar.gz dir-cache
+
+release-tar-gzip: clean
+	tar czf $(shell cat $(VERSION)).tar.gz *.c *.h git.spec Makefile $(DOCS) $(DIRS)
$(SCRIPTS) $(SCRIPT) 
+
+release-tar-bzip2: clean
+	tar cjf $(shell cat $(VERSION)).tar.bz2 *.c *.h git.spec Makefile $(DOCS) $(DIRS)
$(SCRIPTS) $(SCRIPT) 
+
+release-tarballs: release-tar-gzip release-tar-bzip2


^ permalink raw reply

* Re: commit-id fails after cg-init
From: Petr Baudis @ 2005-05-03 21:13 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git
In-Reply-To: <1115150585.28520.11.camel@dv>

Dear diary, on Tue, May 03, 2005 at 10:03:05PM CEST, I got a letter
where Pavel Roskin <proski@gnu.org> told me that...
> Hello!

Hello,

> So, cg-init created an empty .git/refs/heads/master and made .git/HEAD a
> symlink to it.  Now, commit-id reads that file and gets confused.

my plan is to make cg-init:

(i) automatically add to cache any existing content in the current directory,
    if there is any
(ii) call cg-commit right away

This will make us free of this annoying special case and if you are
importing an existing tree, you want to do this anyway. The only case
when it is not 101% right is when starting a new project from fresh, but
some first "dummy" special commit shouldn't hurt there either (that will
require to modify write-tree to be willing to write an empty tree).

Patches welcome.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: Careful object writing..
From: Jan Harkes @ 2005-05-03 20:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505031306310.26698@ppc970.osdl.org>

On Tue, May 03, 2005 at 01:11:47PM -0700, Linus Torvalds wrote:
> On Tue, 3 May 2005, Jan Harkes wrote:
> > I tried to pull in the latest version of your tree, but it doesn't look
> > like this commit has propagated to rsync.kernel.org yet. Hopefully you
> > will accept a small patch (should be < 5 lines) that makes git work
> > nicely when Coda complains about the cross-directory hardlink without
> > affecting the reliability of using link/unlink on normal filesystems.
> 
> What is it that coda wants to do, and is there some portable way to get 
> there? 

Short summary:

    rc = link(old, new);
    if (rc == -1 && errno == EXDEV)
	rc = rename(old, new);

On Coda, the cross-directory link fails, the following cross-directory
rename will work fine.  On a normal filesystem, if the link fails with
EXDEV, the rename will fail with the same.

Because our cache consistency model is fairly optimistic, we already
have to deal with potential problems with a rename removing an unwanted
target. So if we are logging write operations, and the link operation
did not return EEXISTS, then the rename will be marked as not having
removed any target file. If the target did happen to exist on the server
by the time we reintegrate the operations we end up with a reintegration
conflict.


Longer version:

When a server performs conflict resolution it happens on a per-directory
basis. So any cross-directory operation already a special case.

We cannot guarantee which directory will be resolved first, so if it is
the destination of a link or rename the object itself might not exist
yet. The advantage of a rename operation is that it contains a reference
to both the source and the destination directories. If we don't yet know
the renamed object we resolve the source first. That creates the object
and allows us to complete the rename operation.

However with a link we only have a reference to an object and the
directory where the link should be added. But again, the object might
not yet exist on all servers. At this point things get a bit more
complicated because we don't enforce access based on per object UNIX
mode bits, but rely on directory ACLs. So we can't just add a reference
to an unknown object in the destination directory because until we know
where else this object is located, we can't tell if the user actually is
allowed to access the object.

> Is it just that you want to stay within the directory? Or is it any link 
> action that is nasty?

We do allow links within the same directory, mostly because that often
happens in places like /usr/bin and we know that whenever we encounter
the link operation in the resolution log, that the object creation has
already been processed. We also know that the new link can't give a user
any rights he didn't already have.

> What makes resolving renames hard when the file contents are the same? 

Renames mostly work, there are only a few corner cases left. One is
where something is moved up in the directory tree and the source
directory is then removed. We end up screwing ourselves because we
append the childs logs on removal and the create operation ends up
behind the rename operation. But that is a dumb implementation problem.
Another issue is of course when someone (validly) hardlinks a file in
the same directory and then moves one of the links to another directory.

We definitely are not a typical filesystem with UNIX semantics, which is
why it is unusual to find an application that seems so well suited for
disconnected and weakly connected operation.

Jan


^ permalink raw reply

* Re: [PATCH] cg-init shouldn't create master branch when pulling
From: Pavel Roskin @ 2005-05-03 20:48 UTC (permalink / raw)
  To: git
In-Reply-To: <1114637617.15385.10.camel@dv>

Hello!

Replying to myself.

On Wed, 2005-04-27 at 17:33 -0400, Pavel Roskin wrote:
> Hello!
> 
> >From what I see in the current cogito sources, ".git/refs/heads/master"
> is never used once it's created by cg-init.  I believe the "master"
> branch is only useful when creating a new repository.  Then it's the
> only branch in the repository.
> 
Please ignore this patch for now.  It turn out that the repositories
with only "origin" branch don't get updated properly by cg-update.
Changes don't get merged, so cg-cancel need to be run after cg-update.

It looks like that the "master" branch is needed, although it's not
documented anywhere.  Or maybe we need some further changes to support
origin-only repositories.  In any case, changing cg-init is not
sufficient.

-- 
Regards,
Pavel Roskin


^ permalink raw reply

* Re: git and symlinks as tracked content
From: Junio C Hamano @ 2005-05-03 20:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kay Sievers, git
In-Reply-To: <Pine.LNX.4.58.0505031151240.26698@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Tue, 3 May 2005, Kay Sievers wrote:
>> Where can we store the link-target? In its own blob-object or directly
>> in the tree-object?

LT>  - directories: S_IFDIR (0040000) point to "tree" objects for contents
LT>  - symlinks: S_IFLNK (0120000) point to "blob" objects
LT>  - executables: S_IFREG | 0755 (0100755) point to "blob" objects
LT>  - regular files: S_IFREG | 0644 (0100644) point to "blob" objects

These and the device nodes you mention would work naturally on
the cache side and I generally like this idea.  You need to
update checkout-cache to (attempt to) create the right kind of
file, but that is about it.

On the diff side, things are a bit more interesting.  Both the
git-diff-tree-helper engine (diff.c) and git-apply-patch-script
need to be told about these new types of objects, or at least
tightened up to ignore them until they know how to support them.

LT> When you think of it that way, the "patch" ends up falling out very 
LT> naturally, I think. It would look like

LT> 	New file: filename (Mode: 0120000)
LT> 	--- /dev/null
LT> 	+++ filename
LT> 	@@ 0,0 1,1
LT> 	+symlink-value

LT> (or something, you get the idea).

I've always wanted to have this from the normal "diff -r"
output, but we have to be careful.  You do not want to
accidentally feed the normal patch that kind of output.  How
about doing something like this?

GIT: filename (mode:120000)
 --- /dev/null
 +++ filename
 @@ -0,0 +1 @@
 +symlink-value

GIT: filename (mode:120000)
 --- filename
 +++ /dev/null
 @@ -1 +0,0 @@
 -symlink-value

GIT: filename (mode:120000->120000)
 --- filename
 +++ filename
 @@ -1 +1 @@
 -old-symlink-value
 +new-symlink-value

That is, to indent them to keep patch from noticing them [*1*].

About the device nodes, the diffed contents would be major and
minor in decimal notation, and the real filesystem permission
bits and ownerships (e.g. changing /dev/audio from 0600 to 0660
or from root:root to root:audio).  I do not know if we would
want owner/group in symbolic or numeric yet.

GIT: filename (mode:0020000->0020000)
 --- dev/audio
 +++ dev/audio
 @@ -1,5 +1,5 @@
  major=14
  minor=4
  owner=root
 -group=root
 -perm=0600
 +group=audio
 +perm=0660

[Footnote]

*1* A careful but not careful enough reader would wonder if the use
of "--- /dev/null" or "+++ /dev/null" to represent an addition
and a deletion may hamper managing the device node "/dev/null",
but this is not a problem.  Such a device node managed by GIT
will appear as "--- dev/null" or "+++ dev/null", without the
leading slash.


^ permalink raw reply

* Re: Careful object writing..
From: Linus Torvalds @ 2005-05-03 20:11 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Git Mailing List
In-Reply-To: <20050503200034.GA16104@delft.aura.cs.cmu.edu>



On Tue, 3 May 2005, Jan Harkes wrote:
> 
> I tried to pull in the latest version of your tree, but it doesn't look
> like this commit has propagated to rsync.kernel.org yet. Hopefully you
> will accept a small patch (should be < 5 lines) that makes git work
> nicely when Coda complains about the cross-directory hardlink without
> affecting the reliability of using link/unlink on normal filesystems.

What is it that coda wants to do, and is there some portable way to get 
there? 

Is it just that you want to stay within the directory? Or is it any link 
action that is nasty?

What makes resolving renames hard when the file contents are the same? 
Maybe Coda could just do a trivial resolve of that "conflict" too?

			Linus

^ permalink raw reply

* Re: git and symlinks as tracked content
From: Kay Sievers @ 2005-05-03 20:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Gal, git
In-Reply-To: <Pine.LNX.4.58.0505031304140.26698@ppc970.osdl.org>

On Tue, 2005-05-03 at 13:05 -0700, Linus Torvalds wrote:
> 
> On Tue, 3 May 2005, Andreas Gal wrote:
> > 
> > Yuck. Thats really ugly. Right now all files have a uniform touch to them. 
> > For every hash you can locate the file, determine its type/tag, unpack it, 
> > and check the SHA1 hash. The proposal above breaks all that. Why not just 
> > introduce a new object type "dev" and put major minor in there. It 
> > will still always hash to the same SHA1 hash value, but fits much better in the 
> > overall design. 
> 
> Hey, I don't personally care that much. I don't see anybody using 
> character device nodes in the kernel tree, and I don't think most SCM's 
> support stuff like that anyway ;)

Well, you need to be root to create device nodes, that is not a usual
requirement for an SCM checkout. :)

Kay


^ permalink raw reply

* commit-id fails after cg-init
From: Pavel Roskin @ 2005-05-03 20:03 UTC (permalink / raw)
  To: git

Hello!

I tried to start a new project using cogito (current snapshot) and I was
immediately greeted by a bug (or a buglet if you want).  Let's do this
in a clean directory:

$ cg-init 
defaulting to local storage area
$ cg-diff 
cat: .git/refs/tags/: Is a directory
cat: .git/refs/heads/: Is a directory
Invalid id: 
usage: git-cat-file [-t | tagname] <sha1>
usage: git-cat-file [-t | tagname] <sha1>
Invalid id: 
usage: diff-cache [-r] [-z] [-p] [--cached] <tree sha1>
mkdir: cannot create directory `/tmp/gitdiff.k4FHLY/': File exists
$

Not nice.  Trivial debugging shows that it's commit-id that fails:

$ sh -x commit-id 
+ SHA1='[A-Za-z0-9]{40}'
+ SHA1ONLY='^[A-Za-z0-9]{40}$'
+ id=
+ '[' '!' '' ']'
++ cat .git/HEAD
+ id=
+ echo
+ egrep -vq '^[A-Za-z0-9]{40}$'
+ '[' -r .git/refs/tags/ ']'
++ cat .git/refs/tags/
cat: .git/refs/tags/: Is a directory
...

$ ls -al .git/HEAD 
lrwxrwxrwx  1 proski proski 17 2005-05-03 15:50 .git/HEAD -> refs/heads/master
$ cat .git/refs/heads/master
$

So, cg-init created an empty .git/refs/heads/master and made .git/HEAD a
symlink to it.  Now, commit-id reads that file and gets confused.

If anybody has an idea what to put to .git/refs/heads/master please
speak up so that cg-init could be fixed.

-- 
Regards,
Pavel Roskin


^ permalink raw reply

* Re: git and symlinks as tracked content
From: Linus Torvalds @ 2005-05-03 20:05 UTC (permalink / raw)
  To: Andreas Gal; +Cc: Kay Sievers, git
In-Reply-To: <Pine.LNX.4.58.0505031255000.30768@sam.ics.uci.edu>



On Tue, 3 May 2005, Andreas Gal wrote:
> 
> Yuck. Thats really ugly. Right now all files have a uniform touch to them. 
> For every hash you can locate the file, determine its type/tag, unpack it, 
> and check the SHA1 hash. The proposal above breaks all that. Why not just 
> introduce a new object type "dev" and put major minor in there. It 
> will still always hash to the same SHA1 hash value, but fits much better in the 
> overall design. 

Hey, I don't personally care that much. I don't see anybody using 
character device nodes in the kernel tree, and I don't think most SCM's 
support stuff like that anyway ;)

If you want to make it a blob (and have a use for it), go wild. 

		Linus

^ permalink raw reply

* Re: Careful object writing..
From: Daniel Barkalow @ 2005-05-03 20:02 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20050503192753.GA6435@taniwha.stupidest.org>

On Tue, 3 May 2005, Chris Wedgwood wrote:

> i thought this was all common code?  if it's not maybe now is the time
> to change that?

The other versions are getting the tagged compressed object along with the
intended hash from an external source. They both verify the result as they
go (as opposed to the normal case which takes the uncompressed data
and finds the hash, and therefore has to be right). The common part is
really "open the file" and "close (and place) the file", which weren't
previously sufficiently complex to justify sharing the code.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* PATCH: diff-cache.c:51: `return' with a value, in function returning void
From: Alex Riesen @ 2005-05-03 19:56 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

gcc -g -O2 -Wall  '-DSHA1_HEADER=<openssl/sha.h>' -o git-diff-cache diff-cache.c libgit.a -lz -lssl
diff-cache.c: In function `show_new_file':
diff-cache.c:51: warning: `return' with a value, in function returning void


--- cogito/diff-cache.c	2005-05-03 21:50:42.000000000 +0200
+++ cogito/diff-cache.c	2005-05-03 21:52:00.000000000 +0200
@@ -48,7 +48,6 @@ static void show_new_file(struct cache_e
 		return;
 
 	show_file("+", new, sha1, mode);
-	return 0;
 }
 
 static int show_modified(struct cache_entry *old, struct cache_entry *new)


^ permalink raw reply

* Re: Careful object writing..
From: Jan Harkes @ 2005-05-03 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505031204030.26698@ppc970.osdl.org>

On Tue, May 03, 2005 at 12:15:08PM -0700, Linus Torvalds wrote:
> I just pushed out the commit that tries to finally actually write the sha1
> objects the right way in a shared object directory environment.
> 
> I used to be lazy, and just do "O_CREAT | O_EXCL" on the final name, but
> that obviously is not very nice when it can result in other people seeing
> objects that haven't been fully finalized yet.
> 
> So now I do it "right", and create a temporary file in the "top" object
> directory, and then when it's all done, I do a "link()" to the final place
> and unlink the original.

Annoyingly until this commit, git has been just about the perfect SCM
system to run on top of Coda. Almost every other SCM can and will get
conflicts on it's repository files, which are pretty much impossible to
resolve (just try merging two diverging copies of an RCS archive..)

But the only conflicts we ever see with git are when two people create
the same SHA1 object. And if the contents are in fact identical this
conflict will be trivially resolved.

I tried to pull in the latest version of your tree, but it doesn't look
like this commit has propagated to rsync.kernel.org yet. Hopefully you
will accept a small patch (should be < 5 lines) that makes git work
nicely when Coda complains about the cross-directory hardlink without
affecting the reliability of using link/unlink on normal filesystems.

Jan

> 
> I also change the permission to 0444 before it gets its final name.
> 
> Two notes:
> 
>  - because the objects all get created initially in .git/objects rather 
>    than in the subdirectory they get moved to, you can't use symlinks 
>    to other filesystems for the 256 object subdirectories. The object 
>    directory has to be one filesystem (but it doesn't have to be the same 
>    one as you actually keep your working ddirectories on, of course)
> 
>  - The upside of this is that filesystem block allocators should do the 
>    right thing. Instead of spreading the objects out (because they are in 
>    different directories), they should be created together.
> 
> Anyway, somebody should double-check the thing. It _should_ now work
> correctly over NFS etc too, and everything should be nice and atomic (and
> with any half-way decent filesystem, it also means that even if you have a
> system crash in the middle, you'll never see half-created objects).
> 
> NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also write 
> objects. So things like "git-http-pull" etc will still write objects 
> directly into the object directory, and that can cause problems with 
> shared usage. Same goes for "write_sha1_from_fd()" that rpull.c uses. I 
> hope somebody will take a look at those issues..
> 
> Anyway, at least the really core operations should now really be
> "thread-safe" in a shared object directory environment.
> 
> 		Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: git and symlinks as tracked content
From: Andreas Gal @ 2005-05-03 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kay Sievers, git
In-Reply-To: <Pine.LNX.4.58.0505031151240.26698@ppc970.osdl.org>


>  - S_IFCHR/S_IFBLK (0020000 or 0060000), with the 20-byte SHA1 not being a 
>    SHA1 at all, but just the major:minor numbers in some nice binary 
>    encoding. Probably: two network byte order 32-bit values, with twelve 
>    bytes of some non-zero signature (the SHA1 of all zeroes should be 
>    avoided, so the signature really should be soemthing else than just 
>    twelve bytes of zero).

Yuck. Thats really ugly. Right now all files have a uniform touch to them. 
For every hash you can locate the file, determine its type/tag, unpack it, 
and check the SHA1 hash. The proposal above breaks all that. Why not just 
introduce a new object type "dev" and put major minor in there. It 
will still always hash to the same SHA1 hash value, but fits much better in the 
overall design. 

Andreas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox