Git development
 help / color / mirror / Atom feed
* efficient cloning
@ 2006-03-19 21:16 James Cloos
  2006-03-19 22:31 ` Shawn Pearce
  2006-03-19 23:18 ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: James Cloos @ 2006-03-19 21:16 UTC (permalink / raw)
  To: git

Is there a way to accomplish the effect of this script w/o having to
download any unnecessary objects?

==================================================
#!/bin/bash

lt="/gits/linux-2.6/.git"

if [ $# -ne 2 ]; then
    echo >&2 "Usage: $0 <repo> <target-dir>"
    exit 1
fi

git-clone $1 $2
mkdir -p $2/objects/info
{
 test -f "$lt/objects/info/alternates" &&
 cat "$lt/objects/info/alternates";
 echo "$lt/objects"
} >"$2/objects/info/alternates"

cd $2
git-repack -a -d -s
git-prune-packed
==================================================

I tried to modify git-clone to add an alternates file before calling
fetch, but that file just gets deleted.

I presume I need to clone -s -l the local alternate, re-parent it to
the new URL and grab anything missing, but how can I assure that it
results in exactly the same repo as this script?

I'm often behind tiny straws, so efficiency is important.

-JimC
-- 
James H. Cloos, Jr. <cloos@jhcloos.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-19 21:16 efficient cloning James Cloos
@ 2006-03-19 22:31 ` Shawn Pearce
  2006-03-19 23:18 ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Shawn Pearce @ 2006-03-19 22:31 UTC (permalink / raw)
  To: James Cloos; +Cc: git

James Cloos <cloos@jhcloos.com> wrote:
> Is there a way to accomplish the effect of this script w/o having to
> download any unnecessary objects?
> 
> ==================================================
> #!/bin/bash
> 
> lt="/gits/linux-2.6/.git"
> 
> if [ $# -ne 2 ]; then
>     echo >&2 "Usage: $0 <repo> <target-dir>"
>     exit 1
> fi
> 
> git-clone $1 $2
> mkdir -p $2/objects/info
> {
>  test -f "$lt/objects/info/alternates" &&
>  cat "$lt/objects/info/alternates";
>  echo "$lt/objects"
> } >"$2/objects/info/alternates"
> 
> cd $2
> git-repack -a -d -s
> git-prune-packed
> ==================================================
> 
> I tried to modify git-clone to add an alternates file before calling
> fetch, but that file just gets deleted.
> 
> I presume I need to clone -s -l the local alternate, re-parent it to
> the new URL and grab anything missing, but how can I assure that it
> results in exactly the same repo as this script?

Exactly right.  There was some discussion about this perhaps just
two weeks back and it become clear that the easiest way to clone
through a thin straw is to use `git clone -s -l' from a locally
available repository which is ``close''[*1*] to the remote you are going
to actually trying to clone from, edit .git/remotes/origin to have
the correct URL: and Pull: lines, then `git-pull origin' to bring
down whatever you don't have yet.  This won't miss any objects so
it will result in the same repository as a clone would have[*2*].

Footnotes:

  [*1*] Here ``close'' means probably related to the same project.
  Meaning if you are cloning the Linux kernel at least start with
  another kernel repository and not say the GIT repository.  :-)
  The more your original repository has in common with the remote you
  are trying to pull from the less that will need to be downloaded.

  [*2*] This isn't entirely true.  During a normal clone everything
  is pulled down into a single pack. Using this strategy the missing
  objects that are downloaded will be loose; a git-repack after
  the pull might be a good idea to pull them into a pack.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-19 21:16 efficient cloning James Cloos
  2006-03-19 22:31 ` Shawn Pearce
@ 2006-03-19 23:18 ` Junio C Hamano
  2006-03-20  0:32   ` James Cloos
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-03-19 23:18 UTC (permalink / raw)
  To: James Cloos; +Cc: git

James Cloos <cloos@jhcloos.com> writes:

> I presume I need to clone -s -l the local alternate, re-parent it to
> the new URL and grab anything missing, but how can I assure that it
> results in exactly the same repo as this script?

"The same repo as this script" is a very poor way to define what
you really want.  What is "git-repack -a -d -s"?

Guessing what you perhaps are trying to do:

 - You have /gits/linux-2.6/.git on your local disk that is a
   reasonably recent copy of the upstream Linux 2.6 repository.

 - You want to clone from whatever $1 is (maybe a subsystem
   tree, but we cannot tell from your question) to a new
   directory $2.

 - Presumably you know whatever $1 is is related to Linus
   repository and would want to take advantage of the fact that
   it shares many objects with /gits/linux-2.6/.git

It might be worth adding a --reference flag to git-clone like
this patch does.

However, this patch alone does not reduce the transferred data
during cloning any smaller if you are using the "$1" repository
over git native transport (including a local repository),
because the current clone-pack does not look at existing refs
(it was written assuming that there is _nothing_ in the cloned
repository at the beginning).  That needs a separate
enhancements.  Maybe it would be a good idea to deprecate
clone-pack altogether, use fetch-pack -k, and implement the
"copy upstream refs to our refs" logic in git-clone.sh.  We need
to do something like that if/when we are switching to use
$GIT_DIR/refs/remotes/ to store tracking branches outside
refs/heads anyway.

The rsync transport has been deprecated for some time, and it
does not handle alternates correctly anyway, so this patch does
not have any impact on that.

But if you are going to "$1" over http transport, this patch
would help because we stash away the existing refs obtained from
the reference repository under $GIT_DIR/refs/reference-tmp while
we run the fetch.

---
diff --git a/git-clone.sh b/git-clone.sh
index 4ed861d..73fb03c 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -9,7 +9,7 @@
 unset CDPATH
 
 usage() {
-	echo >&2 "Usage: $0 [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
+	echo >&2 "Usage: $0 [--reference <reference-repo>] [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
 	exit 1
 }
 
@@ -56,6 +56,7 @@ upload_pack=
 bare=
 origin=origin
 origin_override=
+reference=
 while
 	case "$#,$1" in
 	0,*) break ;;
@@ -68,6 +69,11 @@ while
         *,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared) 
           local_shared=yes; use_local=yes ;;
 	*,-q|*,--quiet) quiet=-q ;;
+	*,--reference=*)
+	  reference=`expr "$1" : '-[^=]*=\(.*\)'` ;;
+	*,--reference)
+	  case "$#" in 1) usage ;; esac
+	  reference="$1" ;;
 	1,-o) usage;;
 	*,-o)
 		git-check-ref-format "$2" || {
@@ -130,6 +136,23 @@ yes)
 	GIT_DIR="$D/.git" ;;
 esac
 
+# If given a reference we would first add that one; it has to name a
+# local repository that resembles the one being cloned.
+if test -d "$reference"
+then
+	reference=$(cd "$reference" && pwd)
+	if test -d "$reference/.git/objects"
+	then
+		reference="$reference/.git"
+	fi
+	echo "$reference/objects" >"$GIT_DIR/objects/info/alternates"
+	# Pretend we know about these heads - clone-pack does not
+	# honor them currently, but that can be rectified later.
+	mkdir "$GIT_DIR/refs/reference-tmp" 
+	(cd "$reference" && tar cf - refs) |
+	(cd "$GIT_DIR/refs/reference-tmp" && tar xf -)
+fi
+
 # We do local magic only when the user tells us to.
 case "$local,$use_local" in
 yes,yes)
@@ -229,6 +252,7 @@ yes,yes)
 esac
 
 cd "$D" || exit
+test -d "$GIT_DIR/refs/reference-tmp" && rm -fr "$GIT_DIR/refs/reference-tmp"
 
 if test -f "$GIT_DIR/HEAD" && test -z "$bare"
 then

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-19 23:18 ` Junio C Hamano
@ 2006-03-20  0:32   ` James Cloos
  2006-03-20  1:55     ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: James Cloos @ 2006-03-20  0:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> "The same repo as this script" is a very poor way to define what
Junio> you really want. 

I don't think so.  Getting the same values in files like FETCH_HEAD,
ORIG_HEAD, branches/*, remotes/*,  info/* et al is not obvious.
Especially, eg, all of the same Push/Pull lines.

Junio> What is "git-repack -a -d -s"?

A typo.  I of course meant -a -d -l.

Junio> It might be worth adding a --reference flag to git-clone like
Junio> this patch does.

That is essentially what I tried (except only the name of the flag; I
prefer your choice).  I didn't include the reference-tmp logic, but
otherwise it looks about the same.

Junio> However, this patch alone does not reduce the transferred data
Junio> during cloning any smaller if you are using the "$1" repository
Junio> over git native transport (including a local repository),
Junio> because the current clone-pack does not look at existing refs

Exactly the wall I ran into.  And I really only need it for git://.

Junio> Maybe it would be a good idea to deprecate
Junio> clone-pack altogether, use fetch-pack -k, and implement the
Junio> "copy upstream refs to our refs" logic in git-clone.sh.  We need
Junio> to do something like that if/when we are switching to use
Junio> $GIT_DIR/refs/remotes/ to store tracking branches outside
Junio> refs/heads anyway.

And it looks like you've shown me the door in that wall.

I'll have to read up on fetch-pack as opposed to clone-pack.

-JimC
-- 
James H. Cloos, Jr. <cloos@jhcloos.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20  0:32   ` James Cloos
@ 2006-03-20  1:55     ` Junio C Hamano
  2006-03-20  8:54       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-03-20  1:55 UTC (permalink / raw)
  To: James Cloos; +Cc: git

James Cloos <cloos@jhcloos.com> writes:

> Junio> Maybe it would be a good idea to deprecate
> Junio> clone-pack altogether, use fetch-pack -k, and implement the
> Junio> "copy upstream refs to our refs" logic in git-clone.sh.  We need
> Junio> to do something like that if/when we are switching to use
> Junio> $GIT_DIR/refs/remotes/ to store tracking branches outside
> Junio> refs/heads anyway.
>
> And it looks like you've shown me the door in that wall.

I was going to write that myself, but unfortunately will be
offline for the rest of the evening -- interrupted by a surprise
visitor from India who is only visiting for a few days.

So in case you are really in a rush, and in a mood to build on
top of my WIP, here is one.

* fetch-pack.c is modified so that you can say:

	git fetch-pack --all -k $1

  to get the list of "git ls-remote $1" equivalent while
  fetching everything from the remote.

* Change git-clone.sh to use git-fetch-pack --all -k instead of
  git-clone-pack; the output from fetch-pack is munged further
  by a script that implements "copy the refs to the same
  location while figuring out where the HEAD is".  The latter
  part in my WIP is incomplete so --use-separate-remote option
  probably would not work right now.

---
diff --git a/fetch-pack.c b/fetch-pack.c
index 535de10..2d0a626 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -7,8 +7,9 @@
 static int keep_pack;
 static int quiet;
 static int verbose;
+static int fetch_all;
 static const char fetch_pack_usage[] =
-"git-fetch-pack [-q] [-v] [-k] [--thin] [--exec=upload-pack] [host:]directory <refs>...";
+"git-fetch-pack [--all] [-q] [-v] [-k] [--thin] [--exec=upload-pack] [host:]directory <refs>...";
 static const char *exec = "git-upload-pack";
 
 #define COMPLETE	(1U << 0)
@@ -266,8 +267,9 @@ static void filter_refs(struct ref **ref
 	for (prev = NULL, current = *refs; current; current = next) {
 		next = current->next;
 		if ((!memcmp(current->name, "refs/", 5) &&
-					check_ref_format(current->name + 5)) ||
-				!path_match(current->name, nr_match, match)) {
+		     check_ref_format(current->name + 5)) ||
+		    (!fetch_all &&
+		     !path_match(current->name, nr_match, match))) {
 			if (prev == NULL)
 				*refs = next;
 			else
@@ -426,6 +428,10 @@ int main(int argc, char **argv)
 				use_thin_pack = 1;
 				continue;
 			}
+			if (!strcmp("--all", arg)) {
+				fetch_all = 1;
+				continue;
+			}
 			if (!strcmp("-v", arg)) {
 				verbose = 1;
 				continue;
diff --git a/git-clone.sh b/git-clone.sh
index 4ed861d..718029b 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -9,7 +9,7 @@
 unset CDPATH
 
 usage() {
-	echo >&2 "Usage: $0 [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
+	echo >&2 "Usage: $0 [--reference <reference-repo>] [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
 	exit 1
 }
 
@@ -40,22 +40,74 @@ Perhaps git-update-server-info needs to 
 	do
 		name=`expr "$refname" : 'refs/\(.*\)'` &&
 		case "$name" in
-		*^*)	;;
-		*)
-			git-http-fetch -v -a -w "$name" "$name" "$1/" || exit 1
+		*^*)	continue;;
 		esac
+		if test -n "$use_separate_remote" &&
+		   branch_name=`expr "$name" : 'heads/\(.*\)'`
+		then
+			tname="remotes/$branch_name"
+		else
+			tname=$name
+		fi
+		git-http-fetch -v -a -w "$tname" "$name" "$1/" || exit 1
 	done <"$clone_tmp/refs"
 	rm -fr "$clone_tmp"
 }
 
+# A Perl script to read git-fetch -k output and store the
+# remote branches.
+copy_refs='
+use File::Path qw(mkpath);
+my $refs_file = $ARGV[0];
+my $use_separate_remote = $ARGV[1];
+my $git_dir = $ARGV[2];
+
+my $branch_top = ($use_separate_remote ? "heads" : "remotes");
+my $tag_top = "tags";
+my $head = undef;
+
+sub store {
+	my ($sha1, $name, $top) = @_;
+	$name = "$git_dir/refs/$top/$name";
+	mkpath(dirname($name));
+	open O, ">", "$name";
+	print O "$sha1\n";
+	close O;
+}
+
+open FH, "<", $refs_file;
+while (<FH>) {
+	my ($sha1, $name) = /^([0-9a-f]{40}) (.*)$/;
+	if ($name eq "HEAD") {
+		$head = $sha1;
+		next;
+	}
+	if ($name =~ s/^refs\/heads\//) {
+		if (!defined $head && $name eq "master") {
+			$head = $sha1;
+		}
+		store_branch($sha1, $name, $branch_top);
+		next;
+	}
+	if ($name =~ s/^refs\/tags\//) {
+		store_tag($sha1, $name, $tag_top);
+		next;
+	}
+}
+close FH;
+'
+
+
 quiet=
 use_local=no
 local_shared=no
 no_checkout=
 upload_pack=
 bare=
+reference=
 origin=origin
 origin_override=
+use_separate_remote=
 while
 	case "$#,$1" in
 	0,*) break ;;
@@ -68,7 +120,14 @@ while
         *,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared) 
           local_shared=yes; use_local=yes ;;
 	*,-q|*,--quiet) quiet=-q ;;
+	*,--use-separate-remote)
+		use_separate_remote=t ;;
 	1,-o) usage;;
+	1,--reference) usage ;;
+	*,--reference)
+		shift; reference="$2" ;;
+	*,--reference=*)
+		reference=`expr "$1" : '--reference=\(.*\)'` ;;
 	*,-o)
 		git-check-ref-format "$2" || {
 		    echo >&2 "'$2' is not suitable for a branch name"
@@ -130,6 +189,26 @@ yes)
 	GIT_DIR="$D/.git" ;;
 esac
 
+if -n "$reference"
+then
+	if test -d "$reference
+	then
+		if test -d "$reference/.git/objects"
+		then
+			reference="$reference/.git"
+		fi
+		reference=(cd "$reference" && pwd)
+		echo "$reference/objects" >"$GIT_DIR/objects/info/alternates"
+		(cd "$reference" && tar cf - refs) |
+		(cd "$GIT_DIR/refs &&
+		 mkdir reference-tmp &&
+		 cd reference-tmp &&
+		 tar xf -)
+	else
+		echo >&2 "$reference: not a local directory." && usage
+	fi
+fi
+
 # We do local magic only when the user tells us to.
 case "$local,$use_local" in
 yes,yes)
@@ -217,17 +296,22 @@ yes,yes)
 		;;
 	*)
 		cd "$D" && case "$upload_pack" in
-		'') git-clone-pack $quiet "$repo" ;;
-		*) git-clone-pack $quiet "$upload_pack" "$repo" ;;
-		esac || {
+		'') git-fetch-pack -k $quiet "$repo" ;;
+		*) git-fetch-pack -k $quiet "$upload_pack" "$repo" ;;
+		esac >"$GIT_DIR/FETCH_HEAD" || {
 			echo >&2 "clone-pack from '$repo' failed."
 			exit 1
 		}
+		# Now figure out where the remote HEAD points at.
+		perl -e "$copy_refs" "$GIT_DIR/FETCH_HEAD" \
+			"$use_separate_remote" "$GIT_DIR"
 		;;
 	esac
 	;;
 esac
 
+test -d "$GIT_DIR/refs/reference-tmp" && rm -fr "$GIT_DIR/refs/reference-tmp"
+
 cd "$D" || exit
 
 if test -f "$GIT_DIR/HEAD" && test -z "$bare"

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20  1:55     ` Junio C Hamano
@ 2006-03-20  8:54       ` Junio C Hamano
  2006-03-20 15:18         ` Petr Baudis
  2006-03-20 16:30         ` Josef Weidendorfer
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2006-03-20  8:54 UTC (permalink / raw)
  To: James Cloos; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> So in case you are really in a rush, and in a mood to build on
> top of my WIP, here is one.

And this is an replacement, which actually has seen some
testing.  I'll place this in the "next" branch tonight.  Further
testing is appreciated.

-- >8 --
[PATCH] revamp git-clone.

This does two things.

 * A new flag --reference can be used to name a local repository
   that is to be used as an alternate.  This is in response to
   an inquiry by James Cloos in the message on the list
   <m3r74ykue7.fsf@lugabout.cloos.reno.nv.us>.

 * A new flag --use-separate-remote stops contaminating local
   branch namespace by upstream branch names.  The upstream
   branch heads are copied in .git/refs/remotes/ instead of
   .git/refs/heads/ and .git/remotes/origin file is set up to
   reflect this as well.  It requires to have fetch/pull update
   to understand .git/refs/remotes by Eric Wong to further
   update the repository cloned this way.

For the former change, git-fetch-pack is taught a new flag --all
to fetch from all the remote heads.  Nobody uses the git-clone-pack
with this change, so we could deprecate the command, but removal
of the command will be left to a separate round.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 fetch-pack.c |   18 ++++--
 git-clone.sh |  184 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 166 insertions(+), 36 deletions(-)

dfeff66ed9a3931d60f3cd600ad8c14b5cc3d9e5
diff --git a/fetch-pack.c b/fetch-pack.c
index 535de10..a3bcad0 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -7,8 +7,9 @@
 static int keep_pack;
 static int quiet;
 static int verbose;
+static int fetch_all;
 static const char fetch_pack_usage[] =
-"git-fetch-pack [-q] [-v] [-k] [--thin] [--exec=upload-pack] [host:]directory <refs>...";
+"git-fetch-pack [--all] [-q] [-v] [-k] [--thin] [--exec=upload-pack] [host:]directory <refs>...";
 static const char *exec = "git-upload-pack";
 
 #define COMPLETE	(1U << 0)
@@ -266,8 +267,9 @@ static void filter_refs(struct ref **ref
 	for (prev = NULL, current = *refs; current; current = next) {
 		next = current->next;
 		if ((!memcmp(current->name, "refs/", 5) &&
-					check_ref_format(current->name + 5)) ||
-				!path_match(current->name, nr_match, match)) {
+		     check_ref_format(current->name + 5)) ||
+		    (!fetch_all &&
+		     !path_match(current->name, nr_match, match))) {
 			if (prev == NULL)
 				*refs = next;
 			else
@@ -376,7 +378,11 @@ static int fetch_pack(int fd[2], int nr_
 		goto all_done;
 	}
 	if (find_common(fd, sha1, ref) < 0)
-		fprintf(stderr, "warning: no common commits\n");
+		if (!keep_pack)
+			/* When cloning, it is not unusual to have
+			 * no common commit.
+			 */
+			fprintf(stderr, "warning: no common commits\n");
 
 	if (keep_pack)
 		status = receive_keep_pack(fd, "git-fetch-pack", quiet);
@@ -426,6 +432,10 @@ int main(int argc, char **argv)
 				use_thin_pack = 1;
 				continue;
 			}
+			if (!strcmp("--all", arg)) {
+				fetch_all = 1;
+				continue;
+			}
 			if (!strcmp("-v", arg)) {
 				verbose = 1;
 				continue;
diff --git a/git-clone.sh b/git-clone.sh
index 4ed861d..9db678b 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -9,7 +9,7 @@
 unset CDPATH
 
 usage() {
-	echo >&2 "Usage: $0 [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
+	echo >&2 "Usage: $0 [--reference <reference-repo>] [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
 	exit 1
 }
 
@@ -40,13 +40,61 @@ Perhaps git-update-server-info needs to 
 	do
 		name=`expr "$refname" : 'refs/\(.*\)'` &&
 		case "$name" in
-		*^*)	;;
-		*)
-			git-http-fetch -v -a -w "$name" "$name" "$1/" || exit 1
+		*^*)	continue;;
 		esac
+		if test -n "$use_separate_remote" &&
+		   branch_name=`expr "$name" : 'heads/\(.*\)'`
+		then
+			tname="remotes/$branch_name"
+		else
+			tname=$name
+		fi
+		git-http-fetch -v -a -w "$tname" "$name" "$1/" || exit 1
 	done <"$clone_tmp/refs"
 	rm -fr "$clone_tmp"
+	http_fetch "$1/HEAD" "$GIT_DIR/REMOTE_HEAD"
+}
+
+# Read git-fetch-pack -k output and store the remote branches.
+copy_refs='
+use File::Path qw(mkpath);
+use File::Basename qw(dirname);
+my $git_dir = $ARGV[0];
+my $use_separate_remote = $ARGV[1];
+
+my $branch_top = ($use_separate_remote ? "remotes" : "heads");
+my $tag_top = "tags";
+
+sub store {
+	my ($sha1, $name, $top) = @_;
+	$name = "$git_dir/refs/$top/$name";
+	mkpath(dirname($name));
+	open O, ">", "$name";
+	print O "$sha1\n";
+	close O;
+}
+
+open FH, "<", "$git_dir/CLONE_HEAD";
+while (<FH>) {
+	my ($sha1, $name) = /^([0-9a-f]{40})\s(.*)$/;
+	next if ($name =~ /\^\173/);
+	if ($name eq "HEAD") {
+		open O, ">", "$git_dir/REMOTE_HEAD";
+		print O "$sha1\n";
+		close O;
+		next;
+	}
+	if ($name =~ s/^refs\/heads\///) {
+		store($sha1, $name, $branch_top);
+		next;
+	}
+	if ($name =~ s/^refs\/tags\///) {
+		store($sha1, $name, $tag_top);
+		next;
+	}
 }
+close FH;
+'
 
 quiet=
 use_local=no
@@ -54,8 +102,10 @@ local_shared=no
 no_checkout=
 upload_pack=
 bare=
-origin=origin
+reference=
+origin=
 origin_override=
+use_separate_remote=
 while
 	case "$#,$1" in
 	0,*) break ;;
@@ -68,7 +118,14 @@ while
         *,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared) 
           local_shared=yes; use_local=yes ;;
 	*,-q|*,--quiet) quiet=-q ;;
+	*,--use-separate-remote)
+		use_separate_remote=t ;;
 	1,-o) usage;;
+	1,--reference) usage ;;
+	*,--reference)
+		shift; reference="$1" ;;
+	*,--reference=*)
+		reference=`expr "$1" : '--reference=\(.*\)'` ;;
 	*,-o)
 		git-check-ref-format "$2" || {
 		    echo >&2 "'$2' is not suitable for a branch name"
@@ -100,9 +157,24 @@ then
 		echo >&2 '--bare and -o $origin options are incompatible.'
 		exit 1
 	fi
+	if test t = "$use_separate_remote"
+	then
+		echo >&2 '--bare and --use-separate-remote options are incompatible.'
+		exit 1
+	fi
 	no_checkout=yes
 fi
 
+if test -z "$origin_override$origin"
+then
+	if test -n "$use_separate_remote"
+	then
+		origin=remotes/master
+	else
+		origin=heads/origin
+	fi
+fi
+
 # Turn the source into an absolute path if
 # it is local
 repo="$1"
@@ -130,6 +202,28 @@ yes)
 	GIT_DIR="$D/.git" ;;
 esac
 
+if test -n "$reference"
+then
+	if test -d "$reference"
+	then
+		if test -d "$reference/.git/objects"
+		then
+			reference="$reference/.git"
+		fi
+		reference=$(cd "$reference" && pwd)
+		echo "$reference/objects" >"$GIT_DIR/objects/info/alternates"
+		(cd "$reference" && tar cf - refs) |
+		(cd "$GIT_DIR/refs" &&
+		 mkdir reference-tmp &&
+		 cd reference-tmp &&
+		 tar xf -)
+	else
+		echo >&2 "$reference: not a local directory." && usage
+	fi
+fi
+
+rm -f "$GIT_DIR/CLONE_HEAD"
+
 # We do local magic only when the user tells us to.
 case "$local,$use_local" in
 yes,yes)
@@ -165,24 +259,14 @@ yes,yes)
 	    } >"$GIT_DIR/objects/info/alternates"
 	    ;;
 	esac
-
-	# Make a duplicate of refs and HEAD pointer
-	HEAD=
-	if test -f "$repo/HEAD"
-	then
-		HEAD=HEAD
-	fi
-	(cd "$repo" && tar cf - refs $HEAD) |
-	(cd "$GIT_DIR" && tar xf -) || exit 1
+	git-ls-remote "$repo" >"$GIT_DIR/CLONE_HEAD"
 	;;
 *)
 	case "$repo" in
 	rsync://*)
 		rsync $quiet -av --ignore-existing  \
-			--exclude info "$repo/objects/" "$GIT_DIR/objects/" &&
-		rsync $quiet -av --ignore-existing  \
-			--exclude info "$repo/refs/" "$GIT_DIR/refs/" || exit
-
+			--exclude info "$repo/objects/" "$GIT_DIR/objects/" ||
+		exit
 		# Look at objects/info/alternates for rsync -- http will
 		# support it natively and git native ones will do it on the
 		# remote end.  Not having that file is not a crime.
@@ -205,6 +289,7 @@ yes,yes)
 		    done
 		    rm -f "$GIT_DIR/TMP_ALT"
 		fi
+		git-ls-remote "$repo" >"$GIT_DIR/CLONE_HEAD"
 		;;
 	http://*)
 		if test -z "@@NO_CURL@@"
@@ -217,37 +302,71 @@ yes,yes)
 		;;
 	*)
 		cd "$D" && case "$upload_pack" in
-		'') git-clone-pack $quiet "$repo" ;;
-		*) git-clone-pack $quiet "$upload_pack" "$repo" ;;
-		esac || {
-			echo >&2 "clone-pack from '$repo' failed."
+		'') git-fetch-pack --all -k $quiet "$repo" ;;
+		*) git-fetch-pack --all -k $quiet "$upload_pack" "$repo" ;;
+		esac >"$GIT_DIR/CLONE_HEAD" || {
+			echo >&2 "fetch-pack from '$repo' failed."
 			exit 1
 		}
 		;;
 	esac
 	;;
 esac
+test -d "$GIT_DIR/refs/reference-tmp" && rm -fr "$GIT_DIR/refs/reference-tmp"
+
+if test -f "$GIT_DIR/CLONE_HEAD"
+then
+	# Figure out where the remote HEAD points at.
+	perl -e "$copy_refs" "$GIT_DIR" "$use_separate_remote"
+fi
 
 cd "$D" || exit
 
-if test -f "$GIT_DIR/HEAD" && test -z "$bare"
+if test -z "$bare" && test -f "$GIT_DIR/REMOTE_HEAD"
 then
-	head_points_at=`git-symbolic-ref HEAD`
+	head_sha1=`cat "$GIT_DIR/REMOTE_HEAD"`
+	# Figure out which remote branch HEAD points at.
+	case "$use_separate_remote" in
+	'')	remote_top=refs/heads ;;
+	*)	remote_top=refs/remotes ;;
+	esac
+	head_points_at=$(
+		(
+			echo "master"
+			cd "$GIT_DIR/$remote_top" &&
+			find . -type f -print | sed -e 's/^\.\///'
+		) | (
+		done=f
+		while read name
+		do
+			test t = $done && continue
+			branch_tip=`cat "$GIT_DIR/$remote_top/$name"`
+			if test "$head_sha1" = "$branch_tip"
+			then
+				echo "$name"
+				done=t
+			fi
+		done
+		)
+	)
 	case "$head_points_at" in
-	refs/heads/*)
-		head_points_at=`expr "$head_points_at" : 'refs/heads/\(.*\)'`
+	?*)
 		mkdir -p "$GIT_DIR/remotes" &&
 		echo >"$GIT_DIR/remotes/origin" \
 		"URL: $repo
-Pull: $head_points_at:$origin" &&
-		git-update-ref "refs/heads/$origin" $(git-rev-parse HEAD) &&
-		(cd "$GIT_DIR" && find "refs/heads" -type f -print) |
+Pull: refs/heads/$head_points_at:refs/$origin" &&
+		case "$use_separate_remote" in
+		t) git-update-ref HEAD "$head_sha1" ;;
+		*) git-update-ref "refs/$origin" $(git-rev-parse HEAD)
+		esac &&
+		(cd "$GIT_DIR" && find "$remote_top" -type f -print) |
 		while read ref
 		do
-			head=`expr "$ref" : 'refs/heads/\(.*\)'` &&
-			test "$head_points_at" = "$head" ||
+			head=`expr "$ref" : 'refs/\(.*\)'` &&
+			name=`expr "$ref" : 'refs/[^\/]*/\(.*\)'` &&
+			test "$head_points_at" = "$name" ||
 			test "$origin" = "$head" ||
-			echo "Pull: ${head}:${head}"
+			echo "Pull: refs/heads/${name}:$remote_top/${name}"
 		done >>"$GIT_DIR/remotes/origin"
 	esac
 
@@ -256,6 +375,7 @@ Pull: $head_points_at:$origin" &&
 		git-read-tree -m -u -v HEAD HEAD
 	esac
 fi
+rm -f "$GIT_DIR/CLONE_HEAD" "$GIT_DIR/REMOTE_HEAD"
 
 trap - exit
 
-- 
1.2.4.gb7986

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20  8:54       ` Junio C Hamano
@ 2006-03-20 15:18         ` Petr Baudis
  2006-03-20 21:39           ` Junio C Hamano
  2006-03-20 16:30         ` Josef Weidendorfer
  1 sibling, 1 reply; 22+ messages in thread
From: Petr Baudis @ 2006-03-20 15:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: James Cloos, git

Dear diary, on Mon, Mar 20, 2006 at 09:54:03AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
>  * A new flag --use-separate-remote stops contaminating local
>    branch namespace by upstream branch names.  The upstream
>    branch heads are copied in .git/refs/remotes/ instead of
>    .git/refs/heads/ and .git/remotes/origin file is set up to
>    reflect this as well.  It requires to have fetch/pull update
>    to understand .git/refs/remotes by Eric Wong to further
>    update the repository cloned this way.

I think this sucks the way it is, because you still have only a single
namespace for remotes (still quite a huge improvement to the current git
situation), but you can have many upstreams. So it would be a quite more
reasonable to have:

	.git/refs/remotes/<remotename>/<headname>

This is also how I would like to do it for cg-clone -a (which I planned
to implement the last weekend... well... ;). Actually, I think I will
stay in .git/refs/heads/ at least for now until git versions with
.git/refs/remotes/ in the refs search path will be released.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20  8:54       ` Junio C Hamano
  2006-03-20 15:18         ` Petr Baudis
@ 2006-03-20 16:30         ` Josef Weidendorfer
  2006-03-20 23:04           ` Junio C Hamano
  2006-03-21  8:28           ` Junio C Hamano
  1 sibling, 2 replies; 22+ messages in thread
From: Josef Weidendorfer @ 2006-03-20 16:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Monday 20 March 2006 09:54, you wrote:
>  * A new flag --use-separate-remote stops contaminating local
>    branch namespace by upstream branch names.  The upstream
>    branch heads are copied in .git/refs/remotes/ instead of

Shouldn't this be .git/refs/remotes/origin/?
Ie. different namespaces for different remotes?

Linus wanted to still be able to say "origin" which automatically
would map to "remotes/origin/master", where the name of the remote
branch (here "master") is the first mentioned in the Pull line of
the .git/remotes file in eg.

	git diff origin..master

I am not sure whether we really want "take first refspec mentioned on Pull
line", as there can be multiple tracked remote branches with their own local
developments...
Ie. I want to be able to specify: "The local development branch X is based on
the following remote branch Y , which is tracked locally as Z.".
This way, you also could warn/prohobit accidential nonsense merges because of wrong use
of "git-pull". 

Josef

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 15:18         ` Petr Baudis
@ 2006-03-20 21:39           ` Junio C Hamano
  2006-03-20 22:41             ` Petr Baudis
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-03-20 21:39 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis <pasky@suse.cz> writes:

> I think this sucks the way it is, because you still have only a single
> namespace for remotes (still quite a huge improvement to the current git
> situation), but you can have many upstreams.

Come on, give me a break.

You are commenting on the initial 'git-clone' and specifically
on one of its optional feature.  What multiple upstreams?

The whole point of what git-clone does on top of making a
straight clone of the remote is to give you a reasonable
starting point.  The traditional "master" -> "origin" mapping is
good for cloning a typical single-head repository.  If your
upsteram has more branches, --use-separate-remote would help you
to start your branch namespace uncluttered.

If you want to go fancier after the initial clone to
"cg-add-branch" more upstreams, you can implement a customized
editor, even a graphical one if you want, that inspects
$GIT_DIR/[branches,remotes} _and_ $GIT_DIR/{heads,remotes},
shows the current status, and lets you edit the contents of a
$GIT_DIR/remotes/foobar _while_ making matching changes to what
are under $GIT_DIR/{heads,remotes}.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 21:39           ` Junio C Hamano
@ 2006-03-20 22:41             ` Petr Baudis
  2006-03-20 23:07               ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Baudis @ 2006-03-20 22:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Dear diary, on Mon, Mar 20, 2006 at 10:39:41PM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> You are commenting on the initial 'git-clone' and specifically
> on one of its optional feature.  What multiple upstreams?
> 
> The whole point of what git-clone does on top of making a
> straight clone of the remote is to give you a reasonable
> starting point.  The traditional "master" -> "origin" mapping is
> good for cloning a typical single-head repository.  If your
> upsteram has more branches, --use-separate-remote would help you
> to start your branch namespace uncluttered.

Yes, but I just see no connecting with a "starting point" whatsoever -
why should this be inherent to initial clone? I can see no greater
chance that I will want all the branches than when I want to fetch from
another repository later (especially in a truly distributed
environment).

So, it doesn't make sense to me to limit this feature only to the
initial clone case - I want to be able to reasonably "fetch all
branches" of any repository I wish. Without massive namespace clashes,
the reasonable way is to just have a separate directory in
.git/refs/remotes/ for each repository (and it's my understanding that
this was the original proposal as well).

Then you can make a simple change that if a refname matches a directory
in refs/remotes/, you rewrite it as refs/remotes/<refname>/master. This
makes 'origin' work seamlessly in a natural way and a lot more elegantly
than if you make up an artifical rule like "if the remote's branch is
master, save it as origin, but save all the other branches verbatim".

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 16:30         ` Josef Weidendorfer
@ 2006-03-20 23:04           ` Junio C Hamano
  2006-03-20 23:21             ` Petr Baudis
  2006-03-21  0:26             ` Josef Weidendorfer
  2006-03-21  8:28           ` Junio C Hamano
  1 sibling, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2006-03-20 23:04 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

> On Monday 20 March 2006 09:54, you wrote:
>>  * A new flag --use-separate-remote stops contaminating local
>>    branch namespace by upstream branch names.  The upstream
>>    branch heads are copied in .git/refs/remotes/ instead of
>
> Shouldn't this be .git/refs/remotes/origin/?
> Ie. different namespaces for different remotes?
>
> Linus wanted to still be able to say "origin" which automatically
> would map to "remotes/origin/master", where the name of the remote

I do not remember that, but even if he said something similar to
that, I suspect it would not be "map remotes/origin/master to
origin", but "origin could mean remotes/origin when origin is
the unique tail-name anywhere under refs/".

I think what is reasonable is something like this:

 - If you start from a repository cloned in the traditional
   way, the upstream "master" is kept track of with your
   "origin", so "diff origin master" would be "my changes on top
   of the upstream".

 - If your repository was cloned with --use-separate-remote, the
   upstream "master" is refs/remotes/master, so the same diff
   can be had with "diff remotes/master master".

 - Regardless of how you started your cloned repository, with an
   $GIT_DIR/{remotes,refs/heads,refs/remotes} editor I hinted in
   a separate message, you can rearrange things to organize the
   refs/ hierarchy any way you want.

   - You could for example arrange to track my "master" as
     refs/heads/origin and all the other branch heads under
     refs/remotes/junkio/ (or not even track my other branches
     if you are not interested).  Then the same diff can be had
     with "diff origin master".

   - You could for example arrange to track all my branches in
     refs/remotes/junkio/, and if git-pasky were still alive,
     Pasky's branches in refs/remotes/pasky.  If we had a "take
     the unique tail-name anywhere under refs/" logic, the same
     diff can be had with "diff junkio/master master".

So I think two things that would be nice to have on top of what
we have are (1) the said "remotes-and-refs editor" [*1*], and
(2) a change to sha1_name.c to look for places other than
built-in tags/ and heads/ under refs/ to find a unique
tail-match.

Since I do not do Porcelain, (2) would obviously be the next
thing for me to work on on this topic.  I should also address
"Ouch I did not realize I have given the same name to a tag and
a branch" warning issue while doing so.


[Footnote]

*1* ... which currently I do not plan to do myself unless I have
absolutely nothing else to do and really bored.  A sound of huge
hint dropping ;-).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 22:41             ` Petr Baudis
@ 2006-03-20 23:07               ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2006-03-20 23:07 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis <pasky@suse.cz> writes:

> Then you can make a simple change that if a refname matches a directory
> in refs/remotes/, you rewrite it as refs/remotes/<refname>/master. This
> makes 'origin' work seamlessly in a natural way and a lot more elegantly
> than if you make up an artifical rule like "if the remote's branch is
> master, save it as origin, but save all the other branches verbatim".

The "origin" rename applies only to the traditional one.
Separate remote stuff stores master in remotes/master.

At least that is the way I remember I designed it to work.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 23:04           ` Junio C Hamano
@ 2006-03-20 23:21             ` Petr Baudis
  2006-03-20 23:49               ` Junio C Hamano
  2006-03-21  0:26             ` Josef Weidendorfer
  1 sibling, 1 reply; 22+ messages in thread
From: Petr Baudis @ 2006-03-20 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, git

Dear diary, on Tue, Mar 21, 2006 at 12:04:34AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> I think what is reasonable is something like this:

<insert all of the arguments in my other mail here ;>

>  - If your repository was cloned with --use-separate-remote, the
>    upstream "master" is refs/remotes/master, so the same diff
>    can be had with "diff remotes/master master".

Which is ugly. There is no reason why you couldn't go on using 'origin'
which is shorter and we can usually still unambiguously decide what did
you mean (unless you have a local head/tag 'origin' _and_ a remote named
'origin' at the same time).

>  - Regardless of how you started your cloned repository, with an
>    $GIT_DIR/{remotes,refs/heads,refs/remotes} editor I hinted in
>    a separate message, you can rearrange things to organize the
>    refs/ hierarchy any way you want.

Yes, but that makes no sense to do when you usually need this only in
some very special cases (and then you ought to be able to set up the
weird thing yourself), while we can do the doubtlessly right thing in
the general case, WITHOUT confusing the user, WHILE keeping things tidy
and easy to use (and without excessive typing). In fact, you described
it yourself:

>    - You could for example arrange to track all my branches in
>      refs/remotes/junkio/, and if git-pasky were still alive,
>      Pasky's branches in refs/remotes/pasky.  If we had a "take
>      the unique tail-name anywhere under refs/" logic, the same
>      diff can be had with "diff junkio/master master".

Except that you can just name the refs/remotes/directory the same way
you name the remote repository identifier (be it a Git-style remote or
Cogito's remote branch).

I still don't get what's wrong on what I'm proposing. I'm not seeing the
disadvantages, if there are any.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 23:21             ` Petr Baudis
@ 2006-03-20 23:49               ` Junio C Hamano
  2006-03-21  8:19                 ` Andreas Ericsson
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-03-20 23:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Josef Weidendorfer, git

Petr Baudis <pasky@suse.cz> writes:

> I still don't get what's wrong on what I'm proposing. I'm not seeing the
> disadvantages, if there are any.

The only thing I think there is is that I do not get what you
are proposing ;-), since I am not paying full attention while at
day-job.

If you are proposing to root --use-separate-remote not at
refs/remotes but refs/remotes/origin/, I think it makes kind of
sense.  It would make tons of sense _if_ dealing more than one
remote repository is the norm, but otherwise you would have an
extra level of directory refs/remotes which almost always have
only one subdirectory 'origin' and nothing else, which is
pointless.

I am not sure if you are also advocating to map (somehow) origin
to remotes/origin/master (or whatever branch remote's HEAD
points at), but if so I am not quite sure what its semantics
would be.  Which remote branch would you pick (that would not
necessarily be "master") and where are you going to record that
and when.  It all sounds to me complicating things
unnecessarily.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 23:04           ` Junio C Hamano
  2006-03-20 23:21             ` Petr Baudis
@ 2006-03-21  0:26             ` Josef Weidendorfer
  2006-03-21  0:57               ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Josef Weidendorfer @ 2006-03-21  0:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tuesday 21 March 2006 00:04, you wrote:
> > Linus wanted to still be able to say "origin" which automatically
> > would map to "remotes/origin/master", where the name of the remote
> 
> I do not remember that,

See
http://www.gelato.unsw.edu.au/archives/git/0603/17405.html

I found it strange at first. But I think it is the right thing.

> I think what is reasonable is something like this:
> 
>  - If you start from a repository cloned in the traditional
>    way, the upstream "master" is kept track of with your
>    "origin", so "diff origin master" would be "my changes on top
>    of the upstream".

Yes. And it would be nice if the same would work with the new layout,
assuming that there is no local "origin" branch, but a .git/remotes/origin
file and .git/refs/remotes/origin directory.

>  - Regardless of how you started your cloned repository, with an
>    $GIT_DIR/{remotes,refs/heads,refs/remotes} editor I hinted in
>    a separate message, you can rearrange things to organize the
>    refs/ hierarchy any way you want.

Yes.
Still it would be nice to have a fixed convention here.
Eg. gitk could decorate the namespace in a special way.
Even if it is most of the time "origin" only.

> [Footnote]
> 
> *1* ... which currently I do not plan to do myself unless I have
> absolutely nothing else to do and really bored.  A sound of huge
> hint dropping ;-).

It should be as simple as (probably with quite some errors)

~/> cat git-rename-remote.sh
mv .git/refs/remotes/$1 .git/refs/remotes/$2
sed -e "s|remotes/$1|remotes/$2" .git/remotes/$1 > .git/remotes/$2
rm .git/remotes/$1

If you allow more freedom regarding the use of refs/remotes, it gets more
complicated both for the script and for the user to understand.

Josef

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-21  0:26             ` Josef Weidendorfer
@ 2006-03-21  0:57               ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2006-03-21  0:57 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

>> I think what is reasonable is something like this:
>> 
>>  - If you start from a repository cloned in the traditional
>>    way, the upstream "master" is kept track of with your
>>    "origin", so "diff origin master" would be "my changes on top
>>    of the upstream".
>
> Yes. And it would be nice if the same would work with the new layout,
> assuming that there is no local "origin" branch, but a .git/remotes/origin
> file and .git/refs/remotes/origin directory.

My primary aversion comes from that I'd rather avoid teaching
the really core stuff about .git/remotes file, and the part that
interprets refname is fairly a low-level part.

We _could_ record refs/remotes/origin/HEAD that points at
refs/remotes/origin/master (or some other branch) upon cloning,
and if Pasky wants to do something similar upon fetching, that
fetch command could do the same thing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 23:49               ` Junio C Hamano
@ 2006-03-21  8:19                 ` Andreas Ericsson
  2006-03-21  8:42                   ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Ericsson @ 2006-03-21  8:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Josef Weidendorfer, git

Junio C Hamano wrote:
> Petr Baudis <pasky@suse.cz> writes:
> 
> 
>>I still don't get what's wrong on what I'm proposing. I'm not seeing the
>>disadvantages, if there are any.
> 
> 
> The only thing I think there is is that I do not get what you
> are proposing ;-), since I am not paying full attention while at
> day-job.
> 
> If you are proposing to root --use-separate-remote not at
> refs/remotes but refs/remotes/origin/, I think it makes kind of
> sense.  It would make tons of sense _if_ dealing more than one
> remote repository is the norm, but otherwise you would have an
> extra level of directory refs/remotes which almost always have
> only one subdirectory 'origin' and nothing else, which is
> pointless.
> 

afaiu, this is exactly what Pasky's proposing, and I agree. We could 
then teach 'git diff origin master' to mean 'origin/master' *if* no 
other tag/branch is found in the lookup order. I think it makes sense to 
do searching like this, for a ref named foo

(current order, with .git/, .git/refs/, etc...)
.git/refs/remotes/foo
.git/refs/remotes/foo/master

That way the only extra dwimery would be to add "remotes" after "heads" 
under .git/refs and accept directory in .git/remotes/ as ref and tack on 
'/master' at the end of it as the last option to search. For a specific 
branch on an imported remote, one would have to say "jc/next". This 
means we still only handle 'master' specially so we don't introduce any 
new protected or special names.


> I am not sure if you are also advocating to map (somehow) origin
> to remotes/origin/master (or whatever branch remote's HEAD
> points at), but if so I am not quite sure what its semantics
> would be.  Which remote branch would you pick (that would not
> necessarily be "master") and where are you going to record that
> and when.  It all sounds to me complicating things
> unnecessarily.
> 

Not too much so, I think. I'll look into it tonight, although I'm not 
very familiar with the core stuff so possibly (/ hopefully) someone else 
will beat me to it.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-20 16:30         ` Josef Weidendorfer
  2006-03-20 23:04           ` Junio C Hamano
@ 2006-03-21  8:28           ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2006-03-21  8:28 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: git, Petr Baudis

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

> Shouldn't this be .git/refs/remotes/origin/?
> Ie. different namespaces for different remotes?

OK, here is a second try, that comes on top of the one from the
last night.

Together with another topic in the "next" branch, you can now do
this:

 $ git clone --use-separate-remote --reference git.git \
   git://git.kernel.org/pub/scm/git/git.git new.git
 $ cd new.git
 $ git branch
 * master
 $ git log --pretty=oneline master..origin/next | wc -l
 72

What the last one shows is that sha1_basic() lets you omit
refs/remotes; it does not let you say "origin" to mean
"refs/remotes/origin/master", although I agree that abbreviation
would be useful.  As I am still not yet convinced that using
what is in .git/remotes/origin is a good way to implement that
shorthand, and would rather avoid reading .git/remotes/origin
from the C level (even though sha1_basic() is not _that_ core
but a lot closer to the UI, compared to other C level routines),
I am leaving that part for later rounds.

-- >8 --
[PATCH] revamp git-clone (take #2).

This builds on top of the previous one.

 * --use-separate-remote uses .git/refs/remotes/$origin/
   directory to keep track of the upstream branches.

 * The $origin above defaults to "origin" as usual, but the
   existing "-o $origin" option can be used to override it.

I am not yet convinced if we should make "$origin" the synonym to
"refs/remotes/$origin/$name" where $name is the primary branch
name of $origin upstream, nor if so how we should decide which
upstream branch is the primary one, but that is more or less
orthogonal to what the clone does here.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 git-clone.sh |   50 +++++++++++++++++++++++++++++++-------------------
 1 files changed, 31 insertions(+), 19 deletions(-)

47874d6d9a7f49ade6388df049597f03365961ca
diff --git a/git-clone.sh b/git-clone.sh
index 9db678b..3b54753 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -9,7 +9,7 @@
 unset CDPATH
 
 usage() {
-	echo >&2 "Usage: $0 [--reference <reference-repo>] [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
+	echo >&2 "Usage: $0 [--use-separate-remote] [--reference <reference-repo>] [--bare] [-l [-s]] [-q] [-u <upload-pack>] [-o <name>] [-n] <repo> [<dir>]"
 	exit 1
 }
 
@@ -61,8 +61,9 @@ use File::Path qw(mkpath);
 use File::Basename qw(dirname);
 my $git_dir = $ARGV[0];
 my $use_separate_remote = $ARGV[1];
+my $origin = $ARGV[2];
 
-my $branch_top = ($use_separate_remote ? "remotes" : "heads");
+my $branch_top = ($use_separate_remote ? "remotes/$origin" : "heads");
 my $tag_top = "tags";
 
 sub store {
@@ -127,7 +128,12 @@ while
 	*,--reference=*)
 		reference=`expr "$1" : '--reference=\(.*\)'` ;;
 	*,-o)
-		git-check-ref-format "$2" || {
+		case "$2" in
+		*/*)
+		    echo >&2 "'$2' is not suitable for an origin name"
+		    exit 1
+		esac
+		git-check-ref-format "heads/$2" || {
 		    echo >&2 "'$2' is not suitable for a branch name"
 		    exit 1
 		}
@@ -165,14 +171,9 @@ then
 	no_checkout=yes
 fi
 
-if test -z "$origin_override$origin"
+if test -z "$origin"
 then
-	if test -n "$use_separate_remote"
-	then
-		origin=remotes/master
-	else
-		origin=heads/origin
-	fi
+	origin=origin
 fi
 
 # Turn the source into an absolute path if
@@ -317,7 +318,7 @@ test -d "$GIT_DIR/refs/reference-tmp" &&
 if test -f "$GIT_DIR/CLONE_HEAD"
 then
 	# Figure out where the remote HEAD points at.
-	perl -e "$copy_refs" "$GIT_DIR" "$use_separate_remote"
+	perl -e "$copy_refs" "$GIT_DIR" "$use_separate_remote" "$origin"
 fi
 
 cd "$D" || exit
@@ -328,8 +329,18 @@ then
 	# Figure out which remote branch HEAD points at.
 	case "$use_separate_remote" in
 	'')	remote_top=refs/heads ;;
-	*)	remote_top=refs/remotes ;;
+	*)	remote_top="refs/remotes/$origin" ;;
 	esac
+
+	# What to use to track the remote primary branch
+	if test -n "$use_separate_remote"
+	then
+		origin_tracking="remotes/$origin/master"
+	else
+		origin_tracking="heads/$origin"
+	fi
+
+	# The name under $remote_top the remote HEAD seems to point at
 	head_points_at=$(
 		(
 			echo "master"
@@ -349,25 +360,26 @@ then
 		done
 		)
 	)
+
+	# Write out remotes/$origin file.
 	case "$head_points_at" in
 	?*)
 		mkdir -p "$GIT_DIR/remotes" &&
-		echo >"$GIT_DIR/remotes/origin" \
+		echo >"$GIT_DIR/remotes/$origin" \
 		"URL: $repo
-Pull: refs/heads/$head_points_at:refs/$origin" &&
+Pull: refs/heads/$head_points_at:refs/$origin_tracking" &&
 		case "$use_separate_remote" in
 		t) git-update-ref HEAD "$head_sha1" ;;
 		*) git-update-ref "refs/$origin" $(git-rev-parse HEAD)
 		esac &&
-		(cd "$GIT_DIR" && find "$remote_top" -type f -print) |
-		while read ref
+		(cd "$GIT_DIR/$remote_top" && find . -type f -print) |
+		while read dotslref
 		do
-			head=`expr "$ref" : 'refs/\(.*\)'` &&
-			name=`expr "$ref" : 'refs/[^\/]*/\(.*\)'` &&
+			name=`expr "$dotslref" : './\(.*\)'` &&
 			test "$head_points_at" = "$name" ||
 			test "$origin" = "$head" ||
 			echo "Pull: refs/heads/${name}:$remote_top/${name}"
-		done >>"$GIT_DIR/remotes/origin"
+		done >>"$GIT_DIR/remotes/$origin"
 	esac
 
 	case "$no_checkout" in
-- 
1.2.4.ge2fc

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-21  8:19                 ` Andreas Ericsson
@ 2006-03-21  8:42                   ` Junio C Hamano
  2006-03-21  9:19                     ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-03-21  8:42 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Petr Baudis, Josef Weidendorfer, git

Andreas Ericsson <ae@op5.se> writes:

> That way the only extra dwimery would be to add "remotes" after
> "heads" under .git/refs and accept directory in .git/remotes/ as ref
> and tack on '/master' at the end of it as the last option to
> search. For a specific branch on an imported remote, one would have to
> say "jc/next". This means we still only handle 'master' specially so
> we don't introduce any new protected or special names.

Two things that holding me back from doing what you suggested
are (1) "master" is just a convention and indeed non-negligible
number of kernel.org trees have "test" and "release" instead
without "master"; (2) I'd really really really want to avoid
teaching get_sha1_basic() C-level about .git/remotes/$origin
file, even though that function is more of a UI level than the
rest of the really core C-level routines.

But if I were forced to choose between the above 2, I would
probably pick defaulting to "master".

The reason I would like to avoid .git/remotes/$origin is because
it is designed to be Porcelainish thing.  The underlying C-level
git-fetch-pack never sees it; instead the information fed to
C-level is prepared by the upper layer using that file.  As far
as I understand, Cogito does not understand it either, except
that it ships with bash completion code that reads from
filenames there.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-21  8:42                   ` Junio C Hamano
@ 2006-03-21  9:19                     ` Jeff King
  2006-03-21  9:45                       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2006-03-21  9:19 UTC (permalink / raw)
  To: git

On Tue, Mar 21, 2006 at 12:42:02AM -0800, Junio C Hamano wrote:

> The reason I would like to avoid .git/remotes/$origin is because
> it is designed to be Porcelainish thing.  The underlying C-level
> git-fetch-pack never sees it; instead the information fed to
> C-level is prepared by the upper layer using that file.  As far
> as I understand, Cogito does not understand it either, except
> that it ships with bash completion code that reads from
> filenames there.

Then why not create .git/refs/remotes/$origin/HEAD at the time of clone
(or later)? Then the core looks for:
  (current order, .git/refs, etc)
  .git/refs/remotes/foo
  .git/refs/remotes/foo/HEAD
The porcelain can take care of managing the contents of HEAD. If there
is no HEAD in the directory, then it cannot be looked up by 'foo'
('foo/remote-branch' must be used instead).

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-21  9:19                     ` Jeff King
@ 2006-03-21  9:45                       ` Junio C Hamano
  2006-03-21 11:29                         ` Petr Baudis
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-03-21  9:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Then why not create .git/refs/remotes/$origin/HEAD at the time of clone
> (or later)? Then the core looks for:
>   (current order, .git/refs, etc)
>   .git/refs/remotes/foo
>   .git/refs/remotes/foo/HEAD
> The porcelain can take care of managing the contents of HEAD. If there
> is no HEAD in the directory, then it cannot be looked up by 'foo'
> ('foo/remote-branch' must be used instead).

Yup, earlier I mentioned that possibility, and it does not seem
too painful.  On top of the "next", here is what is needed.

-- >8 --
[PATCH] get_sha1_basic(): try refs/... and finally refs/remotes/$foo/HEAD

This implements the suggestion by Jeff King to use
refs/remotes/$foo/HEAD to interpret a shorthand "$foo" to mean
the primary branch head of a tracked remote.  clone needs to be
told about this convention as well.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 sha1_name.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

c51d13692d4e451c755dd7da3521c5db395df192
diff --git a/sha1_name.c b/sha1_name.c
index 74c479c..3adaec3 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -235,18 +235,21 @@ static int ambiguous_path(const char *pa
 
 static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
 {
-	static const char *prefix[] = {
-		"",
-		"refs",
-		"refs/tags",
-		"refs/heads",
-		"refs/remotes",
+	static const char *fmt[] = {
+		"/%.*s",
+		"refs/%.*s",
+		"refs/tags/%.*s",
+		"refs/heads/%.*s",
+		"refs/remotes/%.*s",
+		"refs/remotes/%.*s/HEAD",
 		NULL
 	};
 	const char **p;
 	const char *warning = "warning: refname '%.*s' is ambiguous.\n";
 	char *pathname;
 	int already_found = 0;
+	unsigned char *this_result;
+	unsigned char sha1_from_ref[20];
 
 	if (len == 40 && !get_sha1_hex(str, sha1))
 		return 0;
@@ -255,11 +258,9 @@ static int get_sha1_basic(const char *st
 	if (ambiguous_path(str, len))
 		return -1;
 
-	for (p = prefix; *p; p++) {
-		unsigned char sha1_from_ref[20];
-		unsigned char *this_result =
-			already_found ? sha1_from_ref : sha1;
-		pathname = git_path("%s/%.*s", *p, len, str);
+	for (p = fmt; *p; p++) {
+		this_result = already_found ? sha1_from_ref : sha1;
+		pathname = git_path(*p, len, str);
 		if (!read_ref(pathname, this_result)) {
 			if (warn_ambiguous_refs) {
 				if (already_found &&
-- 
1.2.4.gf1250

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: efficient cloning
  2006-03-21  9:45                       ` Junio C Hamano
@ 2006-03-21 11:29                         ` Petr Baudis
  0 siblings, 0 replies; 22+ messages in thread
From: Petr Baudis @ 2006-03-21 11:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

Dear diary, on Tue, Mar 21, 2006 at 10:45:12AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Jeff King <peff@peff.net> writes:
> 
> > Then why not create .git/refs/remotes/$origin/HEAD at the time of clone
> > (or later)? Then the core looks for:
> >   (current order, .git/refs, etc)
> >   .git/refs/remotes/foo
> >   .git/refs/remotes/foo/HEAD
> > The porcelain can take care of managing the contents of HEAD. If there
> > is no HEAD in the directory, then it cannot be looked up by 'foo'
> > ('foo/remote-branch' must be used instead).
> 
> Yup, earlier I mentioned that possibility, and it does not seem
> too painful.  On top of the "next", here is what is needed.
> 
> -- >8 --
> [PATCH] get_sha1_basic(): try refs/... and finally refs/remotes/$foo/HEAD
> 
> This implements the suggestion by Jeff King to use
> refs/remotes/$foo/HEAD to interpret a shorthand "$foo" to mean
> the primary branch head of a tracked remote.  clone needs to be
> told about this convention as well.
> 
> Signed-off-by: Junio C Hamano <junkio@cox.net>

Excellent, yes, that's what I've meant. I'm happy now. :)

Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-03-21 11:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-19 21:16 efficient cloning James Cloos
2006-03-19 22:31 ` Shawn Pearce
2006-03-19 23:18 ` Junio C Hamano
2006-03-20  0:32   ` James Cloos
2006-03-20  1:55     ` Junio C Hamano
2006-03-20  8:54       ` Junio C Hamano
2006-03-20 15:18         ` Petr Baudis
2006-03-20 21:39           ` Junio C Hamano
2006-03-20 22:41             ` Petr Baudis
2006-03-20 23:07               ` Junio C Hamano
2006-03-20 16:30         ` Josef Weidendorfer
2006-03-20 23:04           ` Junio C Hamano
2006-03-20 23:21             ` Petr Baudis
2006-03-20 23:49               ` Junio C Hamano
2006-03-21  8:19                 ` Andreas Ericsson
2006-03-21  8:42                   ` Junio C Hamano
2006-03-21  9:19                     ` Jeff King
2006-03-21  9:45                       ` Junio C Hamano
2006-03-21 11:29                         ` Petr Baudis
2006-03-21  0:26             ` Josef Weidendorfer
2006-03-21  0:57               ` Junio C Hamano
2006-03-21  8:28           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox