[ANNOUNCE] Cogito-0.12

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [ANNOUNCE] Cogito-0.12
@ 2005-07-03 23:46 Petr Baudis
  2005-07-06 12:01 ` Brian Gerst
  2005-07-07  6:22 ` Chris Wright
  0 siblings, 2 replies; 66+ messages in thread
From: Petr Baudis @ 2005-07-03 23:46 UTC (permalink / raw)
  To: git

  Hello,

  I'm happy to announce the release of the 0.12 version of the Cogito
SCM-like layer over Linus' GIT tree history storage tool. Get it at

	http://www.kernel.org/pub/software/scm/cogito/

or cg-update if you have an older version cloned.

  I wanted to release it later with more cool features, but after all
releasing often is good and people will get to test things more, and
I wanted to make it possible for kernel.org to upgrade to newer RPM.
But it may not be as stable as I'd wish and may have some rough edges,
so be warned.

  This release contains the latest stuff from Linus, with all the
packing stuff and everything. Other things include heaps of bugfixes,
enhanced options parsing, ~/.cgrc support, cg-push, real cg-tag, and
plenty of smaller but nice stuff. And more to come in next days!

  About cg-push, it:

  (i) works only locally or over git+ssh branches

  (ii) the head updated on the other side must be 'master' too
	(high priority to fix)

  (iii) the head updated on the other side is re-created, thus losing
	all attributes (ownership, permissions)
	(high priority to fix)

  (iv) won't update the remote working tree if there is any associated
	with the repository - do cg-cancel to catch up, but that will
	lose any local changes you did (note that I plan to rename
	cg-cancel to cg-reset)

  Also, I've deprecated rsync, as I explained in another mail. Use
cg-branch-chg to change the branch URLs to some more sensible scheme -
most likely HTTP, or SSH if you want to push as well.

  Have fun,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis
@ 2005-07-06 12:01 ` Brian Gerst
  2005-07-07 14:45   ` Petr Baudis
  2005-07-07  6:22 ` Chris Wright
  1 sibling, 1 reply; 66+ messages in thread
From: Brian Gerst @ 2005-07-06 12:01 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis wrote:
>   Hello,
> 
>   I'm happy to announce the release of the 0.12 version of the Cogito
> SCM-like layer over Linus' GIT tree history storage tool. Get it at
> 
> 	http://www.kernel.org/pub/software/scm/cogito/
> 
> or cg-update if you have an older version cloned.
> 
>   I wanted to release it later with more cool features, but after all
> releasing often is good and people will get to test things more, and
> I wanted to make it possible for kernel.org to upgrade to newer RPM.
> But it may not be as stable as I'd wish and may have some rough edges,
> so be warned.
> 
>   This release contains the latest stuff from Linus, with all the
> packing stuff and everything. Other things include heaps of bugfixes,
> enhanced options parsing, ~/.cgrc support, cg-push, real cg-tag, and
> plenty of smaller but nice stuff. And more to come in next days!
> 
>   About cg-push, it:
> 
>   (i) works only locally or over git+ssh branches
> 
>   (ii) the head updated on the other side must be 'master' too
> 	(high priority to fix)
> 
>   (iii) the head updated on the other side is re-created, thus losing
> 	all attributes (ownership, permissions)
> 	(high priority to fix)
> 
>   (iv) won't update the remote working tree if there is any associated
> 	with the repository - do cg-cancel to catch up, but that will
> 	lose any local changes you did (note that I plan to rename
> 	cg-cancel to cg-reset)
> 
>   Also, I've deprecated rsync, as I explained in another mail. Use
> cg-branch-chg to change the branch URLs to some more sensible scheme -
> most likely HTTP, or SSH if you want to push as well.

I really question removing rsync before HTTP pulls become more 
effecient.  I did a complete pull of cogito from kernel.org, and http 
took over 50 minutes to pull everything, while rsync was done in just 
over 1 minute.  I dared not even try to pull the full kernel at that speed.

I suspect that part of the problem is that the pull methods are doing a 
depth first search, so we can't request the next object until the 
current object is fully received and parsed.  Changing to a breadth 
first search would allow multiple requests in flight and asynchronous 
processing which should speed things up.  I am exploring using the 
curl_multi_* functions to do this, but this will require changes to 
common code in pull.c.

--
				Brian Gerst

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-06 12:01 ` Brian Gerst
@ 2005-07-07 14:45   ` Petr Baudis
  2005-07-07 17:21     ` Junio C Hamano
  0 siblings, 1 reply; 66+ messages in thread
From: Petr Baudis @ 2005-07-07 14:45 UTC (permalink / raw)
  To: Brian Gerst; +Cc: git

Dear diary, on Wed, Jul 06, 2005 at 02:01:38PM CEST, I got a letter
where Brian Gerst <bgerst@didntduck.org> told me that...
> Petr Baudis wrote:
> >  Also, I've deprecated rsync, as I explained in another mail. Use
> >cg-branch-chg to change the branch URLs to some more sensible scheme -
> >most likely HTTP, or SSH if you want to push as well.
> 
> I really question removing rsync before HTTP pulls become more 
> effecient.

It won't happen. Or rather, I hope the HTTP pulls become more efficient
soon. Actually, perhaps Linus has something done already, my workstation
is a bit derailed now so I couldn't pull from him in the last few days
(hopefully will sort that out today).

> I did a complete pull of cogito from kernel.org, and http 
> took over 50 minutes to pull everything, while rsync was done in just 
> over 1 minute.  I dared not even try to pull the full kernel at that speed.
> 
> I suspect that part of the problem is that the pull methods are doing a 
> depth first search, so we can't request the next object until the 
> current object is fully received and parsed.  Changing to a breadth 
> first search would allow multiple requests in flight and asynchronous 
> processing which should speed things up.  I am exploring using the 
> curl_multi_* functions to do this, but this will require changes to 
> common code in pull.c.

Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I
suggest you to check the last development in Linus' branch and sync with
Daniel Barkalow, who promised improving the pull tools as well.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 14:45   ` Petr Baudis
@ 2005-07-07 17:21     ` Junio C Hamano
  2005-07-07 19:04       ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 17:21 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git

>>>>> "PB" == Petr Baudis <pasky@suse.cz> writes:

PB> It won't happen. Or rather, I hope the HTTP pulls become more efficient
PB> soon. Actually, perhaps Linus has something done already, my workstation
PB> is a bit derailed now so I couldn't pull from him in the last few days
PB> (hopefully will sort that out today).

PB> Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I
PB> suggest you to check the last development in Linus' branch and sync with
PB> Daniel Barkalow, who promised improving the pull tools as well.

If this weekend is not too late, I have been brewing what is
called an "efficient pull from dumb servers" suite, which would
hopefully fill this gap.  I am still in the process of finishing
the details, but basically it already seems to work.

Linus, please drop the patch I sent you earlier, privately by
mistake not CCing the list, that implemented only the server
end.  I've changed some file formats already from that one.

The outline of how it works is like this.

 * I assume a dumb transport (read: static files only HTTP
   server) and no on-request server side processing.  All the
   smarts must go in the client.  The server side X.git being an
   ordinary GIT archive (no need for files in the work tree),
   plus:

   - X.git/objects/pack can have packed GIT archives.  I
     envision that this will be a series of 5 to 20 MB packs,
     occasionally adding a new incremental pack when
     X.git/objects/??/ directories accumulate enough standalone
     SHA1 files.  It is not necessary to have X.git/objects/??/
     files if an object is contained in one of the packs.

   - X.git/info/ has three extra files.

     - "inventory" lists all the branches stored in X.git/refs
       and looks like this (contents and path):

          ff83c8f3554ceb444b413beaeb49b4a781dae944 snap/0
          013e7c7ff498aae82d799f80da37fbd395545456 snap/10
          ff83c8f3554ceb444b413beaeb49b4a781dae944 heads/master
          dd7ba8b4949535c24e604a37709db0e3be9ccbbc heads/linus

       This is to facilitate discovery from a transport that is
       not so "ls" friendly, like HTTP.

     - "pack" lists available packs under X.git/objects/pack and
       looks like this (size and name):

          432495 pk-65fe69e9bc2e8a3e0881e008dde182522156ba7c.pack

       The file is there for discovery.  The size is used by the
       client to discover optimum set of packs to slurp.

     - "rev-cache" is a binary file that describes commit
       ancestry information in a dense format.  It lists all
       commits available from this repository along with who
       its parents are for each of the commit.  This file is
       produced append-only, so that the server side can use
       rsync based mirroring scheme.

   A new command "git-update-dumb-server" is used to prepare
   these three files.  There may need a helper script that uses
   git-pack-objects and friends to prepare packs partitioned to
   allow pulling a popular branch efficiently.

 * The client side is called "git-dumb-pull-script".  This
   downloads the above three files, and .idx files associated
   with packs described in "pack".  With the information in
   "inventory" about desired branch to pull from along with
   "rev-cache" ancestry information, it discovers the set of
   commits that is lacking from its local store.  By comparing
   that list with downloaded .idx files, along with size
   information for each pack, it comes up a list of packs to
   download to cover the most commits that it wants to obtain,
   and downloads them, verifies them and stores them in its
   .git/objects/pack/ directory.

   The above process of downloading packs would typically not
   cover all the things lacking, because some new commits may
   not be in any of the packs.  After this point, the usual
   commit-walking git-http-pull can be used to fill the rest,
   and it does not have to pull that many objects.  Dan's
   http-pull parallelism improvement would be very useful
   independently here.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 17:21     ` Junio C Hamano
@ 2005-07-07 19:04       ` Linus Torvalds
  2005-07-07 19:57         ` Junio C Hamano
                           ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-07 19:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git

On Thu, 7 Jul 2005, Junio C Hamano wrote:
> 
>    - X.git/objects/pack can have packed GIT archives.  I
>      envision that this will be a series of 5 to 20 MB packs,
>      occasionally adding a new incremental pack when
>      X.git/objects/??/ directories accumulate enough standalone
>      SHA1 files.  It is not necessary to have X.git/objects/??/
>      files if an object is contained in one of the packs.

Note that I just re-packed the kernel archive on kernel.org, and removed 
_all_ unpacked files. Once that percolates to the mirrors, the http 
protocol will be useless without anything like this.

That said, I really think the dumb protocols are useless anyway. No other 
system supports pure static object pulling anyway, and as far as I'm 
concerned, I want "rsync" to kind of work (but it won't be optimal, since 
re-packing will delete all the old objects and replace it with the new 
pack that is downloaded anew). But plain http? I'm not convinced.

I'd much rather have a "stupid server" that just listens to a port, and
basically forks off and executes "git-upload-pack" when it's connected to
(perhaps reading the directory name first).  Nothing else. Then we can do 
a security analysis of upload-pack, which should be fairly easy since it's 
not actually ever _writing_ anything.

At that point, you can do

	git pull git://www.kernel.org/pub/scm/git/..

and it would just connect to some default "git port", pass off the 
directory name, and be done with it - exact same discovery protocol that 
now use for ssh. And "git clone" would also automatically work.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 19:04       ` Linus Torvalds
@ 2005-07-07 19:57         ` Junio C Hamano
  2005-07-07 21:58           ` Linus Torvalds
  2005-07-07 20:00         ` Junio C Hamano
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

I have two questions on "rev-list --objects".

(1) Would it make sense to have an extra flag to "rev-list
    --objects" to make it list all the objects reachable from
    commits listed in its output, even when some of them are
    unchanged from UNINTERESTING commits?  Right now, a pack
    produced from "rev-list --objects A ^B" does not have enough
    information to reproduce the tree associated with commit A.

(2) When "showing --objects", it lists the top-level tree node
    with no name, which makes it indistinguishable from commit
    objects by pack-objects, probably impacting the delta logic.
    Would something like the following patch make sense, to name
    such node "."; giving full-path not just the basename to
    all named nodes would be even better, though.

---
# - master: git-format-patch: Prepare patches for e-mail submission.
# + (working tree)
diff --git a/rev-list.c b/rev-list.c
--- a/rev-list.c
+++ b/rev-list.c
@@ -179,7 +179,10 @@ static void show_commit_list(struct comm
 		die("unknown pending object %s (%s)", sha1_to_hex(obj->sha1), name);
 	}
 	while (objects) {
-		printf("%s %s\n", sha1_to_hex(objects->item->sha1), objects->name);
+		const char *name = objects->name;
+		if (!*name && objects->item->type == tree_type)
+			name = ".";
+		printf("%s %s\n", sha1_to_hex(objects->item->sha1), name);
 		objects = objects->next;
 	}
 }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 19:57         ` Junio C Hamano
@ 2005-07-07 21:58           ` Linus Torvalds
  2005-07-07 22:10             ` Junio C Hamano
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-07 21:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git



On Thu, 7 Jul 2005, Junio C Hamano wrote:
> 
> (1) Would it make sense to have an extra flag to "rev-list
>     --objects" to make it list all the objects reachable from
>     commits listed in its output, even when some of them are
>     unchanged from UNINTERESTING commits?  Right now, a pack
>     produced from "rev-list --objects A ^B" does not have enough
>     information to reproduce the tree associated with commit A.

Well, that would certainly be possible. Just having a flag that disables 
"mark_tree_uninteresting()" would do it.

> (2) When "showing --objects", it lists the top-level tree node
>     with no name, which makes it indistinguishable from commit
>     objects by pack-objects, probably impacting the delta logic.
>     Would something like the following patch make sense, to name
>     such node "."; giving full-path not just the basename to
>     all named nodes would be even better, though.

It doesn't impact the delta algorithm, because the objects are sorted by 
type first, so it never mixes up trees and commits.

But if you wanted to, something like this would be cleaner than your 
suggestion..

		Linus

diff --git a/rev-list.c b/rev-list.c
--- a/rev-list.c
+++ b/rev-list.c
@@ -154,7 +154,7 @@ static void show_commit_list(struct comm
 	while (list) {
 		struct commit *commit = pop_most_recent_commit(&list, SEEN);
 
-		p = process_tree(commit->tree, p, "");
+		p = process_tree(commit->tree, p, "tree");
 		if (process_commit(commit) == STOP)
 			break;
 	}
@@ -386,7 +386,7 @@ static struct commit *get_commit_referen
 			mark_tree_uninteresting(tree);
 			return NULL;
 		}
-		add_pending_object(object, "");
+		add_pending_object(object, "tree");
 		return NULL;
 	}
 
@@ -401,7 +401,7 @@ static struct commit *get_commit_referen
 			mark_blob_uninteresting(blob);
 			return NULL;
 		}
-		add_pending_object(object, "");
+		add_pending_object(object, "blob");
 		return NULL;
 	}
 	die("%s is unknown object", name);

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 21:58           ` Linus Torvalds
@ 2005-07-07 22:10             ` Junio C Hamano
  0 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 22:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

>> (2) When "showing --objects", it lists the top-level tree node
>> with no name, which makes it indistinguishable from commit
>> objects by pack-objects, probably impacting the delta logic.
>> Would something like the following patch make sense, to name
>> such node "."; giving full-path not just the basename to
>> all named nodes would be even better, though.

LT> It doesn't impact the delta algorithm, because the objects are sorted by 
LT> type first, so it never mixes up trees and commits.

You are correct.  I forgot that it does sorting by type.

What do you think about giving full-path so that Makefiles in
different directories would get different name hashes?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 19:04       ` Linus Torvalds
  2005-07-07 19:57         ` Junio C Hamano
@ 2005-07-07 20:00         ` Junio C Hamano
  2005-07-07 21:29         ` Eric W. Biederman
  2005-07-07 22:14         ` [ANNOUNCE] Cogito-0.12 Petr Baudis
  3 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> ... No other 
LT> system supports pure static object pulling anyway,...

That is true, but on the other hand, no other system is easier
to be deployed by mere mortals on barebone ISP accounts.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 19:04       ` Linus Torvalds
  2005-07-07 19:57         ` Junio C Hamano
  2005-07-07 20:00         ` Junio C Hamano
@ 2005-07-07 21:29         ` Eric W. Biederman
  2005-07-07 22:23           ` Linus Torvalds
  2005-07-08  1:54           ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith
  2005-07-07 22:14         ` [ANNOUNCE] Cogito-0.12 Petr Baudis
  3 siblings, 2 replies; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-07 21:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

Linus Torvalds <torvalds@osdl.org> writes:

> That said, I really think the dumb protocols are useless anyway. No other 
> system supports pure static object pulling anyway, and as far as I'm 
> concerned, I want "rsync" to kind of work (but it won't be optimal, since 
> re-packing will delete all the old objects and replace it with the new 
> pack that is downloaded anew). But plain http? I'm not convinced.

Have you not looked at tla/arch? tla does supports dumb servers.
It's job is a little easier as it has one file per atomic commit
I suspect once packs start working well that should not be an
issue for git either.

For small projects this is a major benefit, as they can just push
their files to a convenient http or ftp server.

> I'd much rather have a "stupid server" that just listens to a port, and
> basically forks off and executes "git-upload-pack" when it's connected to
> (perhaps reading the directory name first).  Nothing else. Then we can do 
> a security analysis of upload-pack, which should be fairly easy since it's 
> not actually ever _writing_ anything.
>
> At that point, you can do
>
> 	git pull git://www.kernel.org/pub/scm/git/..
>
> and it would just connect to some default "git port", pass off the 
> directory name, and be done with it - exact same discovery protocol that 
> now use for ssh. And "git clone" would also automatically work.

For optimizing network bandwidth that sounds like the way to go.  For
adhoc development I don't know.  For a central sever you still need
an authenticated way to push content, which makes it another dimension
of the problem.  So it is mostly a question of what is the sanest way
to mirror/publish data.  http is used a lot for publishing data and
practically everyone has access to a http server that can host
content, so I think supporting http makes git a lot more accessible
to people.  The only thing more accessible seems to be email, and
email is terrible for publish small projects.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 21:29         ` Eric W. Biederman
@ 2005-07-07 22:23           ` Linus Torvalds
  2005-07-08  2:11             ` Eric W. Biederman
  2005-07-08  1:54           ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-07 22:23 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, Petr Baudis, git

On Thu, 7 Jul 2005, Eric W. Biederman wrote:
>
> For optimizing network bandwidth that sounds like the way to go.  For
> adhoc development I don't know.  For a central sever you still need
> an authenticated way to push content, which makes it another dimension
> of the problem.

I'm convinced that "ssh" is the only sane way for pushing. If you don't 
trust somebody enough to give him ssh access, you shouldn't trust him with 
write access to your project in the first place.

git can actually do ssh with a _very_ restricted shell, if people are 
worried about shell access. In fact, the _only_ think the shell needs to 
be able to do is execute one of two programs, so you could have something 
_really_ trivial in your /etc/passwd as the login shell that doesn't allow 
anything else. But you'd still use ssh as the authentication protocol.

So I don't worry about pushing. I think we've got that covered. It's 
really the anonymous pulling that needs something.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 22:23           ` Linus Torvalds
@ 2005-07-08  2:11             ` Eric W. Biederman
  0 siblings, 0 replies; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-08  2:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 7 Jul 2005, Eric W. Biederman wrote:
>>
>> For optimizing network bandwidth that sounds like the way to go.  For
>> adhoc development I don't know.  For a central sever you still need
>> an authenticated way to push content, which makes it another dimension
>> of the problem.
>
> I'm convinced that "ssh" is the only sane way for pushing. If you don't 
> trust somebody enough to give him ssh access, you shouldn't trust him with 
> write access to your project in the first place.

Agreed, I brought that up only so I could dismiss it :)  

> So I don't worry about pushing. I think we've got that covered. It's 
> really the anonymous pulling that needs something.

So long as we remember there is a tradeoff between efficiency and
ease of setup for anonymous access and small projects.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Dumb servers (was: [ANNOUNCE] Cogito-0.12)
  2005-07-07 21:29         ` Eric W. Biederman
  2005-07-07 22:23           ` Linus Torvalds
@ 2005-07-08  1:54           ` Kevin Smith
  2005-07-08  2:27             ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Kevin Smith @ 2005-07-08  1:54 UTC (permalink / raw)
  Cc: git

Eric W. Biederman wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> 
>>That said, I really think the dumb protocols are useless anyway. No other 
>>system supports pure static object pulling anyway, and as far as I'm 
>>concerned, I want "rsync" to kind of work (but it won't be optimal, since 
>>re-packing will delete all the old objects and replace it with the new 
>>pack that is downloaded anew). But plain http? I'm not convinced.
> 
> 
> Have you not looked at tla/arch? tla does supports dumb servers.
> It's job is a little easier as it has one file per atomic commit
> I suspect once packs start working well that should not be an
> issue for git either.

In addition to GNU arch/tla, it it also supported by baz, ArX, darcs, 
and mercurial.

> For small projects this is a major benefit, as they can just push
> their files to a convenient http or ftp server.

Absolutely. For the kernel it might not make sense, but I view it as a 
really important feature for tiny projects around the world. Even a CGI 
requirement makes it impossible to serve a project from free or really 
cheap web hosts. Plain HTTP is the only protocol available to people who 
have no extra money to spend on hosting accounts.

This happens to be a hot button issue for me, in case you can't tell. 
Sorry if I'm ranting.

Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Dumb servers (was: [ANNOUNCE] Cogito-0.12)
  2005-07-08  1:54           ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith
@ 2005-07-08  2:27             ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-08  2:27 UTC (permalink / raw)
  To: Kevin Smith; +Cc: Git Mailing List

On Thu, 7 Jul 2005, Kevin Smith wrote:
> 
> Absolutely. For the kernel it might not make sense, but I view it as a 
> really important feature for tiny projects around the world. Even a CGI 
> requirement makes it impossible to serve a project from free or really 
> cheap web hosts. Plain HTTP is the only protocol available to people who 
> have no extra money to spend on hosting accounts.

Well, the http approach always works as well as an "rsync", ie you can 
always replace "rsync" with "wget -r -c" or similar.

But the end result will be a purely dumb mirror of what the other side 
had, ie it will have all the same problems rsync has with things like 
multiple branches etc (it will get all of them, not just the objects 
needed from the one branch you're trying to pull).

So it's not pretty. But it obviously does work: pack-files haven't changed
the fact that git is a append-only thing that lives entirely in the
filesystem space and doesn't have any "dynamic content" (ie nothing is 
hidden inside server state).

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 19:04       ` Linus Torvalds
                           ` (2 preceding siblings ...)
  2005-07-07 21:29         ` Eric W. Biederman
@ 2005-07-07 22:14         ` Petr Baudis
  2005-07-07 22:52           ` Linus Torvalds
  3 siblings, 1 reply; 66+ messages in thread
From: Petr Baudis @ 2005-07-07 22:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Let me join the sceptics camp. :-)

Dear diary, on Thu, Jul 07, 2005 at 09:04:58PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> Note that I just re-packed the kernel archive on kernel.org, and removed 
> _all_ unpacked files. Once that percolates to the mirrors, the http 
> protocol will be useless without anything like this.

*grumble*

So, what _is_ then the way to pull now, actually? If we use rsync, won't
we end up with having the objects we previous had twice now?

> That said, I really think the dumb protocols are useless anyway. No other 
> system supports pure static object pulling anyway, and as far as I'm 
> concerned, I want "rsync" to kind of work (but it won't be optimal, since 
> re-packing will delete all the old objects and replace it with the new 
> pack that is downloaded anew). But plain http? I'm not convinced.

You can always just spider the repository which will work just as well
as rsync in the git case. ;-)

I think it would be actually simplest (for the user) to have a trivial
CGI script on the other side which will do the git-upload-pack stuff.
Minimal extra administrative overhead, flexibility, works through
proxies, and stuff.  People can rewrite it in Perl or PHorridP if they
wish and use it on webhosting servers not allowing much else.

That's not to say a dedicated server wouldn't have its place too, and
that's what's now probably simplest for us. ;-)

Now we are in a situation when there's actually no way to pull from your
kernel repository without throwing own repository to mess and
duplicating data, AFAICS.

> I'd much rather have a "stupid server" that just listens to a port, and
> basically forks off and executes "git-upload-pack" when it's connected to
> (perhaps reading the directory name first).  Nothing else. Then we can do 
> a security analysis of upload-pack, which should be fairly easy since it's 
> not actually ever _writing_ anything.
> 
> At that point, you can do
> 
> 	git pull git://www.kernel.org/pub/scm/git/..
> 
> and it would just connect to some default "git port", pass off the 
> directory name, and be done with it - exact same discovery protocol that 
> now use for ssh. And "git clone" would also automatically work.

Eek. Could you please make it at least pretend to be extensible? Compare
git-upload-pack with git-ssh-pu* - the second one prepends letters to
the data it sends so that if you add a new type of stuff to send (say
for authentication or some smart tags stuff), you could extend it in a
sensible way. What about dividing the communication to "blocks"
separated by a newline? Each block would have its first word on the
first line saying what kind of block it is - "refs", "have", "want", or
"pack" (for simplicity, the pack block might have additional restriction
that it's always the last one).  If you hit unknown block, you should
respond back by something like "huh" and ignore the rest of it.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 22:14         ` [ANNOUNCE] Cogito-0.12 Petr Baudis
@ 2005-07-07 22:52           ` Linus Torvalds
  2005-07-07 23:16             ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
  2005-07-07 23:52             ` [ANNOUNCE] Cogito-0.12 Tony Luck
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-07 22:52 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git

On Fri, 8 Jul 2005, Petr Baudis wrote:

> Let me join the sceptics camp. :-)
> 
> Dear diary, on Thu, Jul 07, 2005 at 09:04:58PM CEST, I got a letter
> where Linus Torvalds <torvalds@osdl.org> told me that...
> > Note that I just re-packed the kernel archive on kernel.org, and removed 
> > _all_ unpacked files. Once that percolates to the mirrors, the http 
> > protocol will be useless without anything like this.
> 
> *grumble*
> 
> So, what _is_ then the way to pull now, actually? If we use rsync, won't
> we end up with having the objects we previous had twice now?

Rsync works fine. You can either unpack the pack you get, or, if you 
prefer, just run

	git-prune-packed

which will remove the stand-alone object that it finds in packs. Now 
you're no longer duplicating data, and your repository is smaller than it 
used to be anyway.

Of course, that requires that you trust the packs 100%. It seems to be 
stable, and I've packed the whole kernel repo, but I actually keep my 
private tree unpacked still just in case.

> I think it would be actually simplest (for the user) to have a trivial
> CGI script on the other side which will do the git-upload-pack stuff.

Well, git-upload-pack expects the other end to follow the proper protocol, 
but yes, you can certainly expose it through a web interface and a 
specialized client that way.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] Pull efficiently from a dumb git store.
  2005-07-07 22:52           ` Linus Torvalds
@ 2005-07-07 23:16             ` Junio C Hamano
  2005-07-07 23:50               ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano
  2005-07-07 23:50               ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano
  2005-07-07 23:52             ` [ANNOUNCE] Cogito-0.12 Tony Luck
  1 sibling, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 23:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git

The git-update-dumb-server-script command statically prepares
additional information to describe what the server side has, so
that a smart client can pull things efficiently even via a
transport such as static-file-only HTTP.

The files prepared by the command is $GIT_DIR/info/server, which
is a tar archive that contains the following files:

    rev-cache   -- commit ancestry chain, append only to help
	           rsync mirroring.
    inventory   -- list of refs and their SHA1.
    pack        -- list of available prepackaged packs.
    server.sha1 -- sha1sum output for the above three files (optional).

A smart client git-dumb-pull-script works in the following way:

 - First it slurps these files, and then .idx files that
   corresponds to the packs described in "pack".

 - Then it finds the commits that it wants from the server by
   looking at "inventory" to find various heads, and "rev-cache" to
   find commits that is missing from the client, and "pack" to
   figure out downloading which packs is the most efficient way to
   fill what is missing from its repository.  This is done with
   the help of the git-dumb-pull-resolve command.

 - Then it slurps the pack files.

 - The git-http-pull / git-local-pull command walks the commit
   chain in an old-fashioned way and downloads unpacked objects
   to fill the rest.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 Makefile                      |   10 +
 dumb-pull-resolve.c           |  239 +++++++++++++++++++++++++++++++++
 git-dumb-pull-script          |  129 ++++++++++++++++++
 git-update-dumb-server-script |   47 ++++++
 rev-cache.c                   |  300 +++++++++++++++++++++++++++++++++++++++++
 rev-cache.h                   |   31 ++++
 show-rev-cache.c              |   18 ++
 update-dumb-server.c          |  153 +++++++++++++++++++++
 8 files changed, 925 insertions(+), 2 deletions(-)
 create mode 100644 dumb-pull-resolve.c
 create mode 100755 git-dumb-pull-script
 create mode 100755 git-update-dumb-server-script
 create mode 100644 rev-cache.c
 create mode 100644 rev-cache.h
 create mode 100644 show-rev-cache.c
 create mode 100644 update-dumb-server.c

a880bc7300f070aca3a255828b48390cb9793245
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -31,7 +31,8 @@ SCRIPTS=git git-apply-patch-script git-m
 	git-fetch-script git-status-script git-commit-script \
 	git-log-script git-shortlog git-cvsimport-script git-diff-script \
 	git-reset-script git-add-script git-checkout-script git-clone-script \
-	gitk git-cherry git-rebase-script git-relink-script git-repack-script
+	gitk git-cherry git-rebase-script git-relink-script git-repack-script \
+	git-dumb-pull-script git-update-dumb-server-script
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
@@ -44,7 +45,8 @@ PROG=   git-update-cache git-diff-files 
 	git-diff-stages git-rev-parse git-patch-id git-pack-objects \
 	git-unpack-objects git-verify-pack git-receive-pack git-send-pack \
 	git-prune-packed git-fetch-pack git-upload-pack git-clone-pack \
-	git-show-index
+	git-show-index git-update-dumb-server git-show-rev-cache \
+	git-dumb-pull-resolve
 
 all: $(PROG)
 
@@ -58,6 +60,9 @@ LIB_FILE=libgit.a
 LIB_H=cache.h object.h blob.h tree.h commit.h tag.h delta.h epoch.h csum-file.h \
 	pack.h pkt-line.h refs.h
 
+LIB_H += rev-cache.h
+LIB_OBJS += rev-cache.o
+
 LIB_H += strbuf.h
 LIB_OBJS += strbuf.o
 
@@ -153,6 +158,7 @@ object.o: $(LIB_H)
 read-cache.o: $(LIB_H)
 sha1_file.o: $(LIB_H)
 usage.o: $(LIB_H)
+rev-cache.o: $(LIB_H)
 strbuf.o: $(LIB_H)
 gitenv.o: $(LIB_H)
 entry.o: $(LIB_H)
diff --git a/dumb-pull-resolve.c b/dumb-pull-resolve.c
new file mode 100644
--- /dev/null
+++ b/dumb-pull-resolve.c
@@ -0,0 +1,239 @@
+#include "cache.h"
+#include "rev-cache.h"
+
+static const char *dumb_pull_resolve_usage =
+"git-dumb_pull_resolve <tmpdir> (<remote> <local>)...";
+
+static struct inventory {
+	struct inventory *next;
+	unsigned char sha1[20];
+	char name[1]; /* more; 1 is for terminating NUL */
+} *inventory;
+
+static struct inventory *find_inventory(const char *name)
+{
+	struct inventory *e = inventory;
+	while (e && strcmp(e->name, name))
+		e = e->next;
+	return e;
+}
+
+static void read_inventory(const char *path)
+{
+	FILE *fp;
+	char buf[1024];
+
+	fp = fopen(path, "r");
+	if (!fp)
+		die("cannot open %s", path);
+	while (fgets(buf, sizeof(buf), fp)) {
+		struct inventory *e; 
+		int len = strlen(buf);
+		if (buf[len-1] != '\n')
+			die("malformed inventory file");
+		buf[--len] = 0;
+		e = xmalloc(sizeof(*e) + len - 41);
+		strcpy(e->name, buf + 41);
+		get_sha1_hex(buf, e->sha1);
+		e->next = inventory;
+		inventory = e;
+	}
+	fclose(fp);
+}
+
+#define MAX_PACKS 0
+static struct pack {
+	struct pack *next;
+	unsigned int *map;
+	unsigned long pack_size;
+	unsigned long index_size;
+	unsigned char ix;
+	unsigned long fill;
+	char name[1]; /* more; 1 is for terminating NUL */
+} *pack;
+
+static void map_pack_idx(const char *path, const char *tmpdir)
+{
+	FILE *fp;
+	char buf[1024];
+	int num_pack = 0;
+
+	fp = fopen(path, "r");
+	if (!fp)
+		die("cannot open %s", path);
+	while (fgets(buf, sizeof(buf), fp)) {
+		struct pack *e;
+		int len;
+		int fd;
+		struct stat st;
+		char path[PATH_MAX];
+		char *cp;
+
+		cp = strchr(buf, ' ');
+		if (!cp || !*++cp)
+			die("malformed pack file");
+
+		len = strlen(cp);
+		if (cp[len-1] != '\n')
+			die("malformed pack file");
+		cp[--len] = 0;
+		
+		if (MAX_PACKS && MAX_PACKS < num_pack) {
+			error("cannot handle too many packs.  ignoring %s",
+			      cp);
+			continue;
+		}
+
+		e = xmalloc(sizeof(*e) + len);
+		strcpy(e->name, cp);
+		e->pack_size = strtoul(buf, NULL, 10);
+
+		sprintf(path, "%s/%s", tmpdir, cp);
+		len = strlen(path);
+		strcpy(path + len - 5, ".idx");
+		fd = open(path, O_RDONLY);
+		if (fd < 0)
+			goto ignore_entry;
+		if (fstat(fd, &st)) {
+			close(fd);
+			goto ignore_entry;
+		}
+		e->index_size = st.st_size;
+		e->map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+		close(fd);
+		if (e->map == MAP_FAILED)
+			die("cannot map %s", path);
+		e->next = pack;
+		e->ix = num_pack++;
+		pack = e;
+		continue;
+	ignore_entry:
+		free(e);
+	}
+	fclose(fp);
+}
+
+static int find_in_pack_idx(const unsigned char *sha1, struct pack *e)
+{
+	unsigned int *level1_ofs = e->map;
+	int hi = ntohl(level1_ofs[*sha1]);
+	int lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
+	void *index = e->map + 256;
+
+	do {
+		int mi = (lo + hi) / 2;
+		int cmp = memcmp(index + 24 * mi + 4, sha1, 20);
+		if (!cmp)
+			return 1;
+		if (0 < cmp)
+			hi = mi;
+		else
+			lo = mi+1;
+	} while (lo < hi);
+	return 0;
+}
+
+static void mark_needed(const unsigned char *sha1)
+{
+	struct rev_cache *rc;
+	struct rev_list_elem *rle;
+	int pos;
+
+	if (has_sha1_file(sha1))
+		return;
+	pos = find_rev_cache(sha1);
+	if (pos < 0)
+		die("rev-cache does not match inventory");
+	rc = rev_cache[pos];
+	rc->work = 1;
+	for (rle = rc->parents; rle; rle= rle->next)
+		mark_needed(rle->ri->sha1);
+}
+
+static struct rev_cache *needed;
+static unsigned long num_needed;
+
+static void link_needed(void)
+{
+	/* Link needed ones for quick traversal */
+	int i;
+	num_needed = 0;
+	for (i = 0; i < nr_revs; i++) {
+		struct rev_cache *rc = rev_cache[i];
+		if (rc->work) {
+			rc->work_ptr = needed;
+			needed = rc;
+			num_needed++;
+		}
+	}
+}
+
+/* Currently this part is stupid, FIXME */
+static void find_optimum_packs(void)
+{
+	struct rev_cache *rc;
+	struct pack *e;
+	unsigned long hits, total;
+
+	hits = total = 0;
+	for (rc = needed; rc; rc = rc->work_ptr)
+		rc->work = 0;
+
+	for (e = pack; e; e = e->next) {
+		e->fill = 0;
+		for (rc = needed; rc; rc = rc->work_ptr)
+			if (!rc->work && find_in_pack_idx(rc->sha1, e)) {
+				rc->work = 1<<(e->ix);
+				e->fill++;
+				hits++;
+			}
+		if (e->fill) {
+			fprintf(stderr, "use %s to fill %lu\n",
+				e->name, e->fill);
+			total += e->pack_size;
+		}
+	}
+
+	fprintf(stderr, "# needed %lu, hits %lu, total %lu\n",
+		num_needed, hits, total);
+	for (e = pack; e; e = e->next)
+		if (e->fill)
+			printf("%s\n", e->name);
+}
+
+int main(int ac, char **av)
+{
+	int i;
+	char path[PATH_MAX];
+	const char *tmpdir;
+
+	if (ac < 4 || ac % 2)
+		usage(dumb_pull_resolve_usage);
+
+	tmpdir = av[1];
+	ac--; av++;
+
+	sprintf(path, "%s/inventory", tmpdir);
+	read_inventory(path);
+
+	sprintf(path, "%s/rev-cache", tmpdir);
+	read_rev_cache(path, NULL, 0);
+
+	for (i = 1; i < ac; i += 2) {
+		/* av[i] is a remote branch name */
+		struct inventory *e = find_inventory(av[i]);
+		if (!e) {
+			error("cannot find branch %s", av[i]);
+			continue;
+		}
+		mark_needed(e->sha1);
+	}
+
+	link_needed();
+
+	sprintf(path, "%s/pack", tmpdir);
+	map_pack_idx(path, tmpdir);
+
+	find_optimum_packs();
+	return 0;
+}
diff --git a/git-dumb-pull-script b/git-dumb-pull-script
new file mode 100755
--- /dev/null
+++ b/git-dumb-pull-script
@@ -0,0 +1,129 @@
+#!/bin/sh
+
+: ${GIT_DIR=.git}
+: ${GIT_OBJECT_DIRECTORY="${GIT_DIR}/objects"}
+
+usage () {
+	echo >&2 "* git dumb-pull <url> ( <remote-name> <local-name> ) ..."
+	exit 1
+}
+
+error () {
+	echo >&2 "* git-dumb-pull: $*"
+	exit 1
+}
+
+download_one() {
+	# $1 - URL
+	# $2 - Local target
+	case "$1" in
+	file://* )
+		path=/$(expr "$1" : 'file:/*\(.*\)')
+		cp "$path" "$2" || rm -f "$2"
+		;;
+	http://* | https://* )
+		wget -O "$2" "$1" || rm -f "$2"
+		;;
+	esac
+}
+
+case "$#" in
+0)
+	usage;;
+esac
+url="$1"; shift
+
+case "$url" in
+http://* | https://*)
+	use_url="$url"
+	cmd='git-http-pull -a -v'
+	;;
+file://*)
+	use_url=/$(expr "$url" : 'file:/*\(.*\)')
+	cmd='git-local-pull -a -l -v'
+	;;
+*)
+	error "Unknown url scheme $url"
+	;;
+esac
+
+# The rest of arguments are remote and local names
+case $#,$(expr "$#" % 2) in
+0,* | 1,* | *,1)
+	error "Need one or more branch name pairs." ;;
+esac
+
+tmp=.git-dumb-pull-$$
+mkdir "$tmp" || error "cannot create temporary directory"
+trap "rm -fr $tmp" 0 1 2 3 15
+
+# Failing to download is not fatal.  It just means the server is
+# dumber than we thought ;-)
+if download_one "$url/info/server" $tmp/server
+then
+	infofiles='inventory pack rev-cache'
+	(
+	  cd $tmp &&
+	  tar xvf server $infofiles || exit 1
+	  if tar xf server server.sha1
+	  then
+		sha1sum -c server.sha1 || {
+		    # did we fail because we did not have sha1sum command?
+		    case "$?" in
+		    127)
+		        : ;; # the command did not exist.
+		    *)
+		        false ;;
+		    esac
+		}
+	  else
+	  	echo >&2 "* warning: server file lacks sha1 checksum"
+	  fi &&
+	  rm -f server.sha1
+	) || exit
+fi
+
+if test -f $tmp/pack
+then
+	while read pack_size pack
+	do
+		case "$pack" in
+		*/*)
+			echo >&2 "* malformed pack $pack"
+			continue
+			;;
+		esac
+
+		idx=$(expr "$pack" : '\(.*\)\.pack$').idx
+		# It is possible, even likely, that we already have that
+		# index file and associated pack file.
+		if test -f "${GIT_OBJECT_DIRECTORY}/pack/$pack" &&
+		   test -f "${GIT_OBJECT_DIRECTORY}/pack/$idx"
+		then
+			continue
+		fi
+		download_one "$url/objects/pack/$idx" "$tmp/$idx"
+	done <$tmp/pack
+
+	git-dumb-pull-resolve $tmp "$@" |
+	while read pack
+	do
+		echo >&2 "* $pack"
+		download_one "$url/objects/pack/$pack" "$tmp/$pack"
+		if test -f "$tmp/$pack" && git-verify-pack "$tmp/$pack"
+		then
+			idx=$(expr "$pack" : '\(.*\)\.pack$').idx
+			mv "$tmp/$pack" "$tmp/$idx" \
+				"${GIT_OBJECT_DIRECTORY}/pack/"
+		fi
+	done
+fi
+
+while case "$#" in 0) break ;; esac
+do
+	remote="$1" local="$2"
+	$cmd -w "$local" "$remote" "$use_url"
+
+	shift
+	shift
+done
diff --git a/git-update-dumb-server-script b/git-update-dumb-server-script
new file mode 100755
--- /dev/null
+++ b/git-update-dumb-server-script
@@ -0,0 +1,47 @@
+#!/bin/sh
+#
+# Copyright (c) 2005, Junio C Hamano
+#
+
+: ${GIT_DIR=.git}
+: ${GIT_OBJECT_DIRECTORY="$GIT_DIR/objects"}
+export GIT_DIR GIT_OBJECT_DIRECTORY
+
+infofiles='inventory pack rev-cache'
+
+usage () {
+	echo >&2 "* git update-dumb-server"
+	exit 1
+}
+
+# Allow 10MB plain SHA1 files to be accumulated before we repack.
+max_plain_size=10240
+
+plain_size=$(
+{
+	du -sk "$GIT_OBJECT_DIRECTORY/" "$GIT_OBJECT_DIRECTORY/pack/" |
+	sed -e 's/^[ 	]*\([0-9][0-9]*\)[ 	].*/\1/'
+	echo ' - p'
+} | dc) &&
+
+if test $max_plain_size -lt $plain_size >/dev/null
+then
+	git-repack-script && git-prune-packed
+fi &&
+
+git-update-dumb-server &&
+
+files=$infofiles
+cd "$GIT_DIR/info" &&
+if sha1sum $infofiles >server.sha1
+then
+	files="$files server.sha1"
+else
+	rm -f server.sha1
+	echo >&2 "* warning: creating server file without sha1sum"
+fi &&
+tar cf server $files &&
+
+# We leave rev-cache there for later runs.
+rm -f server.sha1 inventory pack
+
diff --git a/rev-cache.c b/rev-cache.c
new file mode 100644
--- /dev/null
+++ b/rev-cache.c
@@ -0,0 +1,300 @@
+#include "refs.h"
+#include "cache.h"
+#include "rev-cache.h"
+
+struct rev_cache **rev_cache;
+int nr_revs, alloc_revs;
+
+struct rev_list_elem *rle_free;
+
+#define BATCH_SIZE 512
+
+int find_rev_cache(const unsigned char *sha1)
+{
+	int lo = 0, hi = nr_revs;
+	while (lo < hi) {
+		int mi = (lo + hi) / 2;
+		struct rev_cache *ri = rev_cache[mi];
+		int cmp = memcmp(sha1, ri->sha1, 20);
+		if (!cmp)
+			return mi;
+		if (cmp < 0)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+	return -lo - 1;
+}
+
+static struct rev_list_elem *alloc_list_elem(void)
+{
+	struct rev_list_elem *rle;
+	if (!rle_free) {
+		int i;
+
+		rle = xmalloc(sizeof(*rle) * BATCH_SIZE);
+		for (i = 0; i < BATCH_SIZE - 1; i++) {
+			rle[i].ri = NULL;
+			rle[i].next = &rle[i + 1];
+		}
+		rle[BATCH_SIZE - 1].ri = NULL; 
+		rle[BATCH_SIZE - 1].next = NULL; 
+		rle_free = rle;
+	}
+	rle = rle_free;
+	rle_free = rle->next;
+	return rle;
+}
+
+static struct rev_cache *create_rev_cache(const unsigned char *sha1)
+{
+	struct rev_cache *ri;
+	int pos = find_rev_cache(sha1);
+
+	if (0 <= pos)
+		return rev_cache[pos];
+	pos = -pos - 1;
+	if (alloc_revs <= ++nr_revs) {
+		alloc_revs = alloc_nr(alloc_revs);
+		rev_cache = xrealloc(rev_cache, sizeof(ri) * alloc_revs);
+	}
+	if (pos < nr_revs)
+		memmove(rev_cache + pos + 1, rev_cache + pos,
+			(nr_revs - pos - 1) * sizeof(ri));
+	ri = xcalloc(1, sizeof(*ri));
+	memcpy(ri->sha1, sha1, 20);
+	rev_cache[pos] = ri;
+	return ri;
+}
+
+static unsigned char last_sha1[20];
+
+static void write_one_rev_cache(FILE *rev_cache_file, struct rev_cache *ri)
+{
+	unsigned char flag;
+	struct rev_list_elem *rle;
+
+	if (ri->written)
+		return;
+
+	if (ri->parsed) {
+
+		/* We use last_sha1 compression only for the first parent;
+		 * otherwise the resulting rev-cache would lose the parent
+		 * order information.
+		 */
+		if (ri->parents &&
+		    !memcmp(ri->parents->ri->sha1, last_sha1, 20))
+			flag = (ri->num_parents - 1) | 0x80;
+		else
+			flag = ri->num_parents;
+
+		fwrite(ri->sha1, 20, 1, rev_cache_file);
+		fwrite(&flag, 1, 1, rev_cache_file);
+		for (rle = ri->parents; rle; rle = rle->next) {
+			if (flag & 0x80 && rle == ri->parents)
+				continue;
+			fwrite(rle->ri->sha1, 20, 1, rev_cache_file);
+		}
+		memcpy(last_sha1, ri->sha1, 20);
+		ri->written = 1;
+	}
+	/* recursively write children depth first */
+	for (rle = ri->children; rle; rle = rle->next)
+		write_one_rev_cache(rev_cache_file, rle->ri);
+}
+
+void write_rev_cache(const char *path)
+{
+	/* write the following commit ancestry information in
+	 * $GIT_DIR/info/rev-cache.
+	 *
+	 * The format is:
+	 * 20-byte SHA1 (commit ID)
+	 * 1-byte flag:
+	 * - bit 0-6 records "number of parent commit SHA1s to
+	 *   follow" (i.e. up to 127 children can be listed).
+	 * - when the bit 7 is on, then "the entry immediately
+	 *   before this entry is one of the parents of this
+         *   commit".
+	 * N x 20-byte SHA1 (parent commit IDs)
+	 */
+	FILE *rev_cache_file;
+	int i;
+	struct rev_cache *ri;
+
+	rev_cache_file = fopen(path, "a");
+	if (!rev_cache_file)
+		die("cannot append to rev cache file.");
+
+	memset(last_sha1, 0, 20);
+
+	/* Go through available rev_cache structures, starting from
+	 * parentless ones first, so that we would get most out of
+	 * last_sha1 optimization by the depth first behaviour of
+	 * write_one_rev_cache().
+	 */
+	for (i = 0; i < nr_revs; i++) {
+		ri = rev_cache[i];
+		if (ri->num_parents)
+			continue;
+		write_one_rev_cache(rev_cache_file, ri);
+	}
+	/* Then the rest */
+	for (i = 0; i < nr_revs; i++) {
+		ri = rev_cache[i];
+		write_one_rev_cache(rev_cache_file, ri);
+	}
+
+	fclose(rev_cache_file);
+}
+
+static void add_parent(struct rev_cache *child,
+		       const unsigned char *parent_sha1)
+{
+	struct rev_cache *parent = create_rev_cache(parent_sha1);
+	struct rev_list_elem *e = alloc_list_elem();
+
+	/* Keep the parent list ordered in the same way the commit
+	 * object records them.
+	 */
+	e->ri = parent;
+	e->next = NULL;
+	if (!child->parents_tail)
+		child->parents = e;
+	else
+		child->parents_tail->next = e;
+	child->parents_tail = e;
+	child->num_parents++;
+	
+	/* There is no inherent order of the children so we just
+	 * LIFO them together.
+	 */
+	e = alloc_list_elem();
+	e->next = parent->children;
+	parent->children = e;
+	e->ri = child;
+	parent->num_children++;
+}
+
+int read_rev_cache(const char *path, FILE *dumpfile, int dry_run)
+{
+	unsigned char *map;
+	int fd;
+	struct stat st;
+	unsigned long ofs, len;
+	struct rev_cache *ri = NULL;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return 0;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+	map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (map == MAP_FAILED) {
+		close(fd);
+		return -1;
+	}
+	close(fd);
+
+	memset(last_sha1, 0, 20);
+	ofs = 0;
+	len = st.st_size;
+	while (ofs < len) {
+		unsigned char sha1[20];
+		int flag, cnt, i;
+		if (len < ofs + 21)
+			die("rev-cache too short"); 
+		memcpy(sha1, map + ofs, 20);
+		flag = map[ofs + 20];
+		ofs += 21;
+		cnt = (flag & 0x7f) + ((flag & 0x80) != 0);
+		if (len < ofs + (flag & 0x7f) * 20)
+			die("rev-cache too short to have %d more parents",
+			    (flag & 0x7f));
+		if (dumpfile)
+			fprintf(dumpfile, "%s", sha1_to_hex(sha1));
+		if (!dry_run) {
+			ri = create_rev_cache(sha1);
+			ri->written = 1;
+			ri->parsed = 1;
+			if (!ri)
+				die("cannot create rev-cache for %s",
+				    sha1_to_hex(sha1));
+		}
+		i = 0;
+		if (flag & 0x80) {
+			if (!dry_run)
+				add_parent(ri, last_sha1);
+			if (dumpfile)
+				fprintf(dumpfile, " %s",
+					sha1_to_hex(last_sha1));
+			i++;
+		}
+		while (i++ < cnt) {
+			if (!dry_run)
+				add_parent(ri, map + ofs);
+			if (dumpfile)
+				fprintf(dumpfile, " %s",
+					sha1_to_hex(last_sha1));
+			ofs += 20;
+		}
+		if (dumpfile)
+			fprintf(dumpfile, "\n");
+		memcpy(last_sha1, sha1, 20);
+	}
+	if (ofs != len)
+		die("rev-cache truncated?");
+	munmap(map, len);
+	return 0;
+}
+
+int record_rev_cache(const unsigned char *sha1)
+{
+	unsigned char parent[20];
+	char type[20];
+	unsigned long size, ofs;
+	unsigned int cnt, i;
+	void *buf;
+	struct rev_cache *ri;
+
+	buf = read_sha1_file(sha1, type, &size);
+	if (!buf)
+		return 1; /* unavailable */
+	if (strcmp(type, "commit")) {
+		/* could be a tag or tree */
+		free(buf);
+		return 1;
+	}
+	ri = create_rev_cache(sha1);
+	if (ri->parsed)
+		return 0;
+
+	cnt = 0;
+	ofs = 46; /* "tree " + hex-sha1 + "\n" */
+	while (!memcmp(buf + ofs, "parent ", 7) &&
+	       !get_sha1_hex(buf + ofs + 7, parent)) {
+		ofs += 48;
+		cnt++;
+	}
+	if (cnt * 48 + 46 != ofs) {
+		free(buf);
+		return error("internal error in record_rev_cache");
+	}
+
+	ri = create_rev_cache(sha1);
+	ri->parsed = 1;
+
+	for (i = 0; i < cnt; i++) {
+		unsigned char parent_sha1[20];
+		
+		ofs = 46 + i * 48 + 7;
+		get_sha1_hex(buf + ofs, parent_sha1);
+		add_parent(ri, parent_sha1);
+		record_rev_cache(parent_sha1);
+	}
+	free(buf);
+	return 0;
+}
diff --git a/rev-cache.h b/rev-cache.h
new file mode 100644
--- /dev/null
+++ b/rev-cache.h
@@ -0,0 +1,31 @@
+#ifndef REV_CACHE_H
+#define REV_CACHE_H
+
+#define REV_CACHE_PATH "info/rev-cache"
+
+extern struct rev_cache {
+	struct rev_cache *head_list;
+	struct rev_list_elem *children;
+	struct rev_list_elem *parents;
+	struct rev_list_elem *parents_tail;
+	unsigned short num_parents;
+	unsigned short num_children;
+	unsigned int written : 1;
+	unsigned int parsed : 1;
+	unsigned int work : 30;
+	void *work_ptr;
+	unsigned char sha1[20];
+} **rev_cache;
+extern int nr_revs, alloc_revs;
+
+struct rev_list_elem {
+	struct rev_list_elem *next;
+	struct rev_cache *ri;
+};
+
+extern int find_rev_cache(const unsigned char *);
+extern int read_rev_cache(const char *, FILE *, int);
+extern int record_rev_cache(const unsigned char *);
+extern void write_rev_cache(const char *);
+
+#endif
diff --git a/show-rev-cache.c b/show-rev-cache.c
new file mode 100644
--- /dev/null
+++ b/show-rev-cache.c
@@ -0,0 +1,18 @@
+#include "cache.h"
+#include "rev-cache.h"
+
+static char *dump_rev_cache_usage =
+"git-dump-rev-cache <rev-cache-file>";
+
+int main(int ac, char **av)
+{
+	while (1 < ac && av[0][1] == '-') {
+		/* do flags here */
+		break;
+		ac--; av++;
+	}
+	if (ac != 2)
+		usage(dump_rev_cache_usage);
+
+	return read_rev_cache(av[1], stdout, 1);
+}
diff --git a/update-dumb-server.c b/update-dumb-server.c
new file mode 100644
--- /dev/null
+++ b/update-dumb-server.c
@@ -0,0 +1,153 @@
+#include "refs.h"
+#include "cache.h"
+#include "rev-cache.h"
+
+static FILE *inventory_file;
+static int verbose = 0;
+
+static int do_refs(const char *path, const unsigned char *sha1)
+{
+	/* path is like .git/refs/heads/master */
+	int pfxlen = 10; /* strlen(".git/refs/") */
+	fprintf(inventory_file, "%s %s\n", sha1_to_hex(sha1), path + pfxlen);
+	if (verbose)
+		fprintf(stderr, "inventory %s %s\n",
+			sha1_to_hex(sha1), path + pfxlen);
+	record_rev_cache(sha1);
+	return 0;
+}
+
+static int inventory(void)
+{
+	/* write names of $GIT_DIR/refs/?*?/?* files in
+	 * $GIT_DIR/info/inventory, and find the ancestry
+	 * information.
+	 */
+	char path[PATH_MAX];
+
+	strcpy(path, git_path("info/inventory"));
+	safe_create_leading_directories(path);
+	inventory_file = fopen(path, "w");
+	if (!inventory_file)
+		die("cannot create inventory file.");
+	for_each_ref(do_refs);
+	fclose(inventory_file);
+	return 0;
+}
+
+static int compare_pack_size(const void *a_, const void *b_)
+{
+	struct packed_git *const*a = a_;
+	struct packed_git *const*b = b_;
+	if ((*a)->pack_size < (*b)->pack_size)
+		return 1;
+	else if ((*a)->pack_size == (*b)->pack_size)
+		return 0;
+	return -1;
+}
+
+static int write_packs(void)
+{
+	/* write names of pack files under $GIT_OBJECT_DIRECTORY/pack
+	 * into $GIT_DIR/info/packs.
+	 */
+	struct packed_git *p;
+	char path[PATH_MAX];
+	FILE *packs_file;
+	int pfxlen = strlen(".git/objects/pack/");
+	struct packed_git **list;
+	int cnt, i;
+
+	for (cnt = 0, p = packed_git; p; p = p->next)
+		cnt++;
+	list = xmalloc(sizeof(*list) * cnt);
+	for (i = 0, p = packed_git; p; p = p->next)
+		list[i++] = p;
+	qsort(list, cnt, sizeof(*list), compare_pack_size);
+
+	strcpy(path, git_path("info/pack"));
+	safe_create_leading_directories(path);
+	packs_file = fopen(path, "w");
+	if (!packs_file)
+		return -1;
+	for (i = 0; i < cnt; i++) {
+		p = list[i];
+		fprintf(packs_file, "%lu %s\n",
+			p->pack_size, p->pack_name + pfxlen);
+		if (verbose)
+			fprintf(stderr, "pack %lu %s\n",
+				p->pack_size,
+				p->pack_name + pfxlen);
+	}
+	free(list);
+	fclose(packs_file);
+	return 0;
+}
+
+static int inventory_packs(void)
+{
+	struct packed_git *p;
+
+	for (p = packed_git; p; p = p->next) {
+		int nth, lim;
+		lim = num_packed_objects(p);
+		for (nth = 0; nth < lim; nth++) {
+			unsigned char sha1[20];
+			char type[20];
+			if (nth_packed_object_sha1(p, nth, sha1)) {
+				error("cannot read %dth object from pack %s",
+				      nth, p->pack_name);
+				continue;
+			}
+			if (sha1_object_info(sha1, type, NULL)) {
+				error("cannot find type of %s", sha1_to_hex(sha1));
+				continue;
+			}
+			if (strcmp(type, "commit"))
+				continue;
+			record_rev_cache(sha1);
+		}
+	}
+	return 0;
+}
+
+static const char *update_dumb_server_usage =
+"git-update-dumb-server [-v] [-a]";
+
+int main(int ac, char **av)
+{
+	char path[PATH_MAX];
+	int all_commits = 0;
+
+	while (1 < ac && av[1][0] == '-') {
+		if (!strcmp(av[1], "-v"))
+			verbose = 1;
+		else if (!strcmp(av[1], "-a"))
+			all_commits = 1;
+		else
+			usage(update_dumb_server_usage);
+		ac--; av++;
+	}
+
+	/* read existing rev-cache if any */
+	strcpy(path, git_path(REV_CACHE_PATH));
+	read_rev_cache(path, verbose ? stderr : NULL, 0);
+
+	/* read refs directory and find commit ancentry information */
+	inventory();
+
+	/* 
+	 * prepare info/pack file.
+	 * Note that we do prepare_packed_git() in case we ran in
+	 * an headless repository.
+	 */
+	prepare_packed_git();
+	write_packs();
+
+	if (all_commits)
+		inventory_packs();
+
+	/* update the rev-cache database by appending newly found one to it */
+	write_rev_cache(path);
+	return 0;
+}
------------

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] rev-list: add "--objects=self-sufficient" flag.
  2005-07-07 23:16             ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
@ 2005-07-07 23:50               ` Junio C Hamano
  2005-07-07 23:58                 ` Linus Torvalds
  2005-07-07 23:50               ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano
  1 sibling, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 23:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

When --objects=self-sufficient is specified instead of usual
"--objects", rev-list shows all objects reachable from trees
associated with the commits in its output.  This can be used to
ensure that a single pack can be used to recreate the tree
associated with every commit in it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

*** This makes things easier for the dumb puller because
*** self-sufficient pack means less falling back on traditional
*** http-pull.

 rev-list.c                 |    7 ++-
 t/t6100-rev-list-object.sh |   97 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 2 deletions(-)
 create mode 100644 t/t6100-rev-list-object.sh

60563326cea81f89098a88ab716fb4f02e326b43
diff --git a/rev-list.c b/rev-list.c
--- a/rev-list.c
+++ b/rev-list.c
@@ -27,6 +27,7 @@ static int bisect_list = 0;
 static int tag_objects = 0;
 static int tree_objects = 0;
 static int blob_objects = 0;
+static int objects_self_sufficient = 0;
 static int verbose_header = 0;
 static int show_parents = 0;
 static int hdr_termination = 0;
@@ -198,7 +199,7 @@ static void mark_tree_uninteresting(stru
 	struct object *obj = &tree->object;
 	struct tree_entry_list *entry;
 
-	if (!tree_objects)
+	if (!tree_objects || objects_self_sufficient)
 		return;
 	if (obj->flags & UNINTERESTING)
 		return;
@@ -448,7 +449,9 @@ int main(int argc, char **argv)
 			bisect_list = 1;
 			continue;
 		}
-		if (!strcmp(arg, "--objects")) {
+		if (!strncmp(arg, "--objects", 9)) {
+			if (!strcmp(arg+9, "=self-sufficient"))
+				objects_self_sufficient = 1;
 			tag_objects = 1;
 			tree_objects = 1;
 			blob_objects = 1;
diff --git a/t/t6100-rev-list-object.sh b/t/t6100-rev-list-object.sh
new file mode 100644
--- /dev/null
+++ b/t/t6100-rev-list-object.sh
@@ -0,0 +1,97 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='git-rev-list --objects test.
+
+'
+. ./test-lib.sh
+
+GIT_AUTHOR_DATE='+0000 946684801'
+GIT_AUTHOR_NAME=none
+GIT_AUTHOR_EMAIL=none@none
+GIT_COMMITTER_DATE='+0000 946684801'
+GIT_COMMITTER_NAME=none
+GIT_COMMITTER_EMAIL=none@none
+export GIT_AUTHOR_DATE GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL \
+       GIT_COMMITTER_DATE GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL
+
+_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
+_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
+sedScript='s/^\('"$_x40"' [^ ]*\) .*/\1/p'
+
+test_expect_success setup '
+    for i in frotz nitfol
+    do
+	    echo $i >$i &&
+	    git-update-cache --add $i || exit
+    done &&
+    tree0=$(git-write-tree) &&
+    commit0=$(git-commit-tree $tree0) &&
+    echo $tree0 &&
+    echo $commit0 &&
+    git-ls-tree -r $tree0 &&
+    echo nitfol nitfol >nitfol &&
+    git-update-cache --add nitfol &&
+    tree1=$(git-write-tree) &&
+    commit1=$(git-commit-tree $tree1 -p $commit0) &&
+    echo $tree1 &&
+    echo $commit1 &&
+    git-ls-tree -r $tree1    
+' </dev/null
+
+test_expect_success 'pack #0' '
+    name0=$(git-rev-list --objects $commit0 | \
+            git-pack-objects pk0) &&
+    ls pk0-* &&
+    git-verify-pack -v pk0-$name0.idx |
+    sed -ne "$sedScript" | sort >contents.0
+'
+
+test_expect_success 'pack #1 (commit 1 except commit 0)' '
+    name1=$(git-rev-list --objects $commit1 ^$commit0 | \
+            git-pack-objects pk1) &&
+    ls pk1-* &&
+    git-verify-pack -v pk1-$name1.idx |
+    sed -ne "$sedScript" | sort >contents.1
+'
+
+test_expect_success 'there should not be any overlaps' '
+    case $(comm -12 contents.0 contents.1 | wc -l) in
+    0) ;;
+    *) false ;;
+    esac
+'
+
+test_expect_success 'pack #2 (commit 1 unpacked only)' '
+    ln pk0-* .git/objects/pack/. &&
+    name2=$(git-rev-list --objects --unpacked $commit1 | \
+            git-pack-objects pk2) &&
+    ls pk2-* &&
+    git-verify-pack -v pk1-$name2.idx |
+    sed -ne "$sedScript" | sort >contents.2
+'
+
+test_expect_success 'pack #1 and #2 should be the same' '
+    diff contents.1 contents.2
+'
+
+test_expect_success 'pack #3 (commit 1 except commit 0, self-sufficient)' '
+    name3=$(git-rev-list --objects=self-sufficient $commit1 ^$commit0 | \
+            git-pack-objects pk3) &&
+    ls pk3-* &&
+    git-verify-pack -v pk3-$name3.idx |
+    sed -ne "$sedScript" | sort >contents.3
+'
+
+ls_tree_to_invent='s/^[0-9]* \([^ ]*\) \('"$_x40"'\)	.*/\2 \1/'
+test_expect_success 'make sure pack #3 is not missing anything from commit1' '
+    (
+	echo "$tree1 tree"
+	echo "$commit1 commit"
+	git-ls-tree "$tree1" | sed -e "$ls_tree_to_invent"
+    ) | sort >tree-contents.1 &&
+    comm -23 tree-contents.1 contents.3 >missing.3 &&
+    diff /dev/null missing.3
+'
------------

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--objects=self-sufficient" flag.
  2005-07-07 23:50               ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano
@ 2005-07-07 23:58                 ` Linus Torvalds
  2005-07-08  1:02                   ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
  2005-07-08  1:03                   ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-07 23:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 7 Jul 2005, Junio C Hamano wrote:
>
> -		if (!strcmp(arg, "--objects")) {
> +		if (!strncmp(arg, "--objects", 9)) {
> +			if (!strcmp(arg+9, "=self-sufficient"))
> +				objects_self_sufficient = 1;

This is nasty - if you mis-spell "self-sufficient" (easy enough to do) 
you'll never know the end result isn't what you expected. It won't warn 
you in any way, it will just make a non-self-sufficient pack..

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] rev-list: add "--full-objects" flag.
  2005-07-07 23:58                 ` Linus Torvalds
@ 2005-07-08  1:02                   ` Junio C Hamano
  2005-07-08  1:33                     ` Linus Torvalds
  2005-07-08  1:46                     ` Linus Torvalds
  2005-07-08  1:03                   ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano
  1 sibling, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-08  1:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> This is nasty - if you mis-spell "self-sufficient" (easy enough to do) 
LT> you'll never know the end result isn't what you expected. It won't warn 
LT> you in any way, it will just make a non-self-sufficient pack..

Again you are right.  How about --full-objects instead?

------------
When --full-objects is specified instead of usual "--objects",
rev-list shows all objects reachable from trees associated with
the commits in its output.  This can be used to ensure that a
single pack can be used to recreate the tree associated with
every commit in it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 rev-list.c                 |   13 +++++-
 t/t6100-rev-list-object.sh |   98 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 t/t6100-rev-list-object.sh

24c31c0417a54a6ca6dc1b86267bccbbfe87c7d8
diff --git a/rev-list.c b/rev-list.c
--- a/rev-list.c
+++ b/rev-list.c
@@ -17,6 +17,7 @@ static const char rev_list_usage[] =
 		      "  --min-age=epoch\n"
 		      "  --bisect\n"
 		      "  --objects\n"
+		      "  --full-objects\n"
 		      "  --unpacked\n"
 		      "  --header\n"
 		      "  --pretty\n"
@@ -27,6 +28,7 @@ static int bisect_list = 0;
 static int tag_objects = 0;
 static int tree_objects = 0;
 static int blob_objects = 0;
+static int objects_self_sufficient = 0;
 static int verbose_header = 0;
 static int show_parents = 0;
 static int hdr_termination = 0;
@@ -198,7 +200,7 @@ static void mark_tree_uninteresting(stru
 	struct object *obj = &tree->object;
 	struct tree_entry_list *entry;
 
-	if (!tree_objects)
+	if (!tree_objects || objects_self_sufficient)
 		return;
 	if (obj->flags & UNINTERESTING)
 		return;
@@ -448,7 +450,14 @@ int main(int argc, char **argv)
 			bisect_list = 1;
 			continue;
 		}
-		if (!strcmp(arg, "--objects")) {
+		if (!strncmp(arg, "--objects", 9)) {
+			tag_objects = 1;
+			tree_objects = 1;
+			blob_objects = 1;
+			continue;
+		}
+		if (!strncmp(arg, "--full-objects", 9)) {
+			objects_self_sufficient = 1;
 			tag_objects = 1;
 			tree_objects = 1;
 			blob_objects = 1;
diff --git a/t/t6100-rev-list-object.sh b/t/t6100-rev-list-object.sh
new file mode 100644
--- /dev/null
+++ b/t/t6100-rev-list-object.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='git-rev-list --objects test.
+
+'
+. ./test-lib.sh
+
+GIT_AUTHOR_DATE='+0000 946684801'
+GIT_AUTHOR_NAME=none
+GIT_AUTHOR_EMAIL=none@none
+GIT_COMMITTER_DATE='+0000 946684801'
+GIT_COMMITTER_NAME=none
+GIT_COMMITTER_EMAIL=none@none
+export GIT_AUTHOR_DATE GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL \
+       GIT_COMMITTER_DATE GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL
+
+_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
+_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
+sedScript='s/^\('"$_x40"' [^ ]*\) .*/\1/p'
+
+test_expect_success setup '
+    for i in frotz nitfol
+    do
+	    echo $i >$i &&
+	    git-update-cache --add $i || exit
+    done &&
+    tree0=$(git-write-tree) &&
+    commit0=$(git-commit-tree $tree0) &&
+    echo $tree0 &&
+    echo $commit0 &&
+    git-ls-tree -r $tree0 &&
+    echo nitfol nitfol >nitfol &&
+    rm -f frotz &&
+    git-update-cache --add nitfol --remove frotz &&
+    tree1=$(git-write-tree) &&
+    commit1=$(git-commit-tree $tree1 -p $commit0) &&
+    echo $tree1 &&
+    echo $commit1 &&
+    git-ls-tree -r $tree1    
+' </dev/null
+
+test_expect_success 'pack #0' '
+    name0=$(git-rev-list --objects $commit0 | \
+            git-pack-objects pk0) &&
+    ls pk0-* &&
+    git-verify-pack -v pk0-$name0.idx |
+    sed -ne "$sedScript" | sort >contents.0
+'
+
+test_expect_success 'pack #1 (commit 1 except commit 0)' '
+    name1=$(git-rev-list --objects $commit1 ^$commit0 | \
+            git-pack-objects pk1) &&
+    ls pk1-* &&
+    git-verify-pack -v pk1-$name1.idx |
+    sed -ne "$sedScript" | sort >contents.1
+'
+
+test_expect_success 'there should not be any overlaps' '
+    case $(comm -12 contents.0 contents.1 | wc -l) in
+    0) ;;
+    *) false ;;
+    esac
+'
+
+test_expect_success 'pack #2 (commit 1 unpacked only)' '
+    ln pk0-* .git/objects/pack/. &&
+    name2=$(git-rev-list --objects --unpacked $commit1 | \
+            git-pack-objects pk2) &&
+    ls pk2-* &&
+    git-verify-pack -v pk1-$name2.idx |
+    sed -ne "$sedScript" | sort >contents.2
+'
+
+test_expect_success 'pack #1 and #2 should be the same' '
+    diff contents.1 contents.2
+'
+
+test_expect_success 'pack #3 (commit 1 except commit 0, self-sufficient)' '
+    name3=$(git-rev-list --full-objects $commit1 ^$commit0 | \
+            git-pack-objects pk3) &&
+    ls pk3-* &&
+    git-verify-pack -v pk3-$name3.idx |
+    sed -ne "$sedScript" | sort >contents.3
+'
+
+ls_tree_to_invent='s/^[0-9]* \([^ ]*\) \('"$_x40"'\)	.*/\2 \1/'
+test_expect_success 'make sure pack #3 is not missing anything from commit1' '
+    (
+	echo "$tree1 tree"
+	echo "$commit1 commit"
+	git-ls-tree "$tree1" | sed -e "$ls_tree_to_invent"
+    ) | sort >tree-contents.1 &&
+    comm -23 tree-contents.1 contents.3 >missing.3 &&
+    diff /dev/null missing.3
+'
------------

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-08  1:02                   ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
@ 2005-07-08  1:33                     ` Linus Torvalds
  2005-07-08  1:46                     ` Linus Torvalds
  1 sibling, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-08  1:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 7 Jul 2005, Junio C Hamano wrote:
> 
> Again you are right.  How about --full-objects instead?

I don't mind the "--objects=xxx" format per se, but it would need to 
verify that the "=xxx" was either valid or wasn't there at all. So what I 
objected to was not that it was easy to mis-spell, but that if misspelled, 
the program wouldn't point it out as an error, but silently just do the 
wrong thing.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-08  1:02                   ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
  2005-07-08  1:33                     ` Linus Torvalds
@ 2005-07-08  1:46                     ` Linus Torvalds
  2005-07-08  2:17                       ` Junio C Hamano
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-08  1:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 7 Jul 2005, Junio C Hamano wrote:
>
> When --full-objects is specified instead of usual "--objects",
> rev-list shows all objects reachable from trees associated with
> the commits in its output.  This can be used to ensure that a
> single pack can be used to recreate the tree associated with
> every commit in it.

Hmm.. The more I think about it, the less I think this is about "full 
objects".

After all, we won't have all objects: the pack will still cut off any 
commits that may be reachable but not interesting.

So this is more specifically about full _trees_, not objects per se. So 
while the name of the option doesn't really matter all that much, I do 
think it would make more sense as "--whole-trees" or something like that.

However, I really don't think it's a very useful option in the first
place. Any dumb web-based thing that depends on "--whole-trees" would suck
horribly. For the kernel, it means that you'd be guaranteed 17,000+ files,
and there would be very few deltas in there, so you'd have this 40MB+
pack-file. Which is _not_ an acceptable way of getting updates.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-08  1:46                     ` Linus Torvalds
@ 2005-07-08  2:17                       ` Junio C Hamano
  2005-07-08  2:39                         ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2005-07-08  2:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> However, I really don't think it's a very useful option in
LT> the first place.  Any dumb web-based thing that depends on
LT> "--whole-trees" would suck horribly.

I agree with these two sentences now.

However it does not automatically mean that the avenue I have
been pursuing would not work; the server side preparation needs
to be a bit more careful than what I sent, which unconditionally
runs "prune-packed".  It instead should leave the files that
"--whole-trees" would have packed as plain SHA1 files, so that
the bulk is obtained by statically generated packs and the rest
can be handled in the commit-chain walker as before.

So, the server side preparation needs be tweaked to do something
like:

  (1) Repack when necessary (no --whole-trees).

  (2) For each .git/objects/pack/ pack, make a list of trees and
      blobs that are missing from the commits that contained in
      the same pack.

  (3) Run "prune-packed" but do not prune objects on the list
      produced in the previous step.

  (4) Take inventory, rev-cache, and pack, as done by the posted
      patch.

The determination of (1) is a bit problematic since "when
necessary" is not "when .git/objects/?? grew too big" anymore,
due to the fact that (3) would deliberately leave plain SHA1
files there.

A completely different way would be to prepare packs of objects
based on age, and create an inventory of such packs.  Have
client download such an inventory, which essentially says "if
you have this commit, then slurp these packs and you are done."

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-08  2:17                       ` Junio C Hamano
@ 2005-07-08  2:39                         ` Linus Torvalds
  2005-07-09 21:09                           ` Eric W. Biederman
       [not found]                           ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-08  2:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 7 Jul 2005, Junio C Hamano wrote:
> 
> However it does not automatically mean that the avenue I have
> been pursuing would not work; the server side preparation needs
> to be a bit more careful than what I sent, which unconditionally
> runs "prune-packed".  It instead should leave the files that
> "--whole-trees" would have packed as plain SHA1 files, so that
> the bulk is obtained by statically generated packs and the rest
> can be handled in the commit-chain walker as before.

I really think the commit-chain walker needs to run locally (ie at the 
server end, or after fetching all the objects from the server).

I don't know how much you've tried out the git-http-pull and git-ssh-pull 
things, but their performance was quite horrid for anything half-way 
bigger, because of the totally synchronized IO.

The "fetch one object, parse it, fetch the next one, parse that.." 
approach is just horrible.

I ended up preferring the "rsync" thing even though rsync sucked badly on
big object stores too, if only because when rsync got working, it at least
nicely pipelined the transfers, and would transfer things ten times faster
than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it
really felt that way).

And the thing is, if you purely follow one tree (which is likely the
common case for a lot of users), then you are actually always likely
better off with the "mirror it" model. Which is _not_ a good model for
developers (for example, me rsync'ing from Jeff's kernel repository always
got me hundreds of useless objects), but it's fine for somebody who
actually just wants to track somebody else.

And then you really can use just rsync or wget or ncftpget or anything
else that has a "fetch recursively, optimizing existing objects" mode.

Now, re-packing ends up causing some double transmissions, but I bet the
cost of those are going to be less than the cost of the "ping-pong for
each object" approach. Especially as most of the repacked objects will be 
deltas if the repacking is done properly.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-08  2:39                         ` Linus Torvalds
@ 2005-07-09 21:09                           ` Eric W. Biederman
  2005-07-10  5:11                             ` Linus Torvalds
                                               ` (2 more replies)
       [not found]                           ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>
  1 sibling, 3 replies; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-09 21:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 7 Jul 2005, Junio C Hamano wrote:
>> 
>> However it does not automatically mean that the avenue I have
>> been pursuing would not work; the server side preparation needs
>> to be a bit more careful than what I sent, which unconditionally
>> runs "prune-packed".  It instead should leave the files that
>> "--whole-trees" would have packed as plain SHA1 files, so that
>> the bulk is obtained by statically generated packs and the rest
>> can be handled in the commit-chain walker as before.

> The "fetch one object, parse it, fetch the next one, parse that.." 
> approach is just horrible.

Agreed.  That does not cover up latency at all and depending on the 
parsing cost can potentially even keep you from having anything on
your network connection for a noticeable amount of time.

> I ended up preferring the "rsync" thing even though rsync sucked badly on
> big object stores too, if only because when rsync got working, it at least
> nicely pipelined the transfers, and would transfer things ten times faster
> than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it
> really felt that way).

This feels to me like an implementation issue (no pipelining) rather
than a design issue (pipelining is impossible).

> And the thing is, if you purely follow one tree (which is likely the
> common case for a lot of users), then you are actually always likely
> better off with the "mirror it" model. Which is _not_ a good model for
> developers (for example, me rsync'ing from Jeff's kernel repository always
> got me hundreds of useless objects), but it's fine for somebody who
> actually just wants to track somebody else.

I assume the problem with the mirror it model was simply there were
to many objects?

> And then you really can use just rsync or wget or ncftpget or anything
> else that has a "fetch recursively, optimizing existing objects" mode.

Sane.  But with an intelligent fetcher and a little extra information
a dumb server should still be able to not fetch branches we care
nothing about.  I think that extra information is simply commit
object graph and which packs those commit objects are in.  I assume
the commit graph information will be fairly modest.

Once you have that extra information you can generate incremental
packs whenever you upload to the server, and you can make the
incremental packs per branch.

That should allow an dumb fetcher to look at the list of commits
and just fetch those packs it cares about, and since it only has
to look one place first it should be fairly sane.

The core idea is that if the dumb-server-preparation can anticipate
common access patterns (mirror a branch) and give enough information
so that can be done cheaply and pipelined I don't expect it to be much
worse than an intelligent fetcher.

The current intelligent fetch currently has a problem that it cannot
be used to bootstrap a repository.  If you don't have an ancestor
of what you are fetching you can't fetch it.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-09 21:09                           ` Eric W. Biederman
@ 2005-07-10  5:11                             ` Linus Torvalds
  2005-07-10  6:28                               ` Junio C Hamano
  2005-07-10 21:48                             ` Sven Verdoolaege
  2005-07-10 22:36                             ` Linus Torvalds
  2 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10  5:11 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Sat, 9 Jul 2005, Eric W. Biederman wrote:
> 
> I assume the problem with the mirror it model was simply there were
> to many objects?

Yes.

> > And then you really can use just rsync or wget or ncftpget or anything
> > else that has a "fetch recursively, optimizing existing objects" mode.
> 
> Sane.  But with an intelligent fetcher and a little extra information a
> dumb server should still be able to not fetch branches we care nothing
> about.  I think that extra information is simply commit object graph and
> which packs those commit objects are in.  I assume the commit graph
> information will be fairly modest.

Well, what I'd hope for is actually that eventually "webgit" will have 
some machine-parseable sub-tree, and then you can have this kind of thing 
generated automatically.

But a _truly_ dumb server (ie one with no CGI at all, just "raw data", you
really end up with just effectively rsyncing it. Yes, you could create a
new "commit index file" every time you push, and maybe it's worth it, but 
on the other hand, what's wrong with just rsyncing it all and parsing it 
locally instead?

People who use it for major development would all try to get the smart 
client, even if it's "just" some webgit extension thing..

Dumb servers work, they just won't do any selective stuff. Big deal. 
That's why they are dumb.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-10  5:11                             ` Linus Torvalds
@ 2005-07-10  6:28                               ` Junio C Hamano
  0 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-10  6:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric W. Biederman, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
>> On Sat, 9 Jul 2005, Eric W. Biederman wrote:
>> I assume the commit graph information will be fairly modest.

That is true.  My experience from the one I have been cooking,
Gitified 2.4.0->2.6.12-rc2 BKCVS export results in a bit shy of
600KB commit ancestry information.  The full development trail
for that repository contains 370152 objects among which 28237
are commits; when packed into one pack-idx pair, it is around
a 170MB .pack with a 9MB .idx file.

LT> But a _truly_ dumb server (ie one with no CGI at all, just "raw data", you
LT> really end up with just effectively rsyncing it. Yes, you could create a
LT> new "commit index file" every time you push, and maybe it's worth it, but 
LT> on the other hand, what's wrong with just rsyncing it all and parsing it 
LT> locally instead?

Nothing, and you convinced me to drop the one I have been
cooking.  Maybe its time to either change git-fetch-script to
use wget -r for http transport for objects part, perhaps?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-09 21:09                           ` Eric W. Biederman
  2005-07-10  5:11                             ` Linus Torvalds
@ 2005-07-10 21:48                             ` Sven Verdoolaege
  2005-07-10 22:36                             ` Linus Torvalds
  2 siblings, 0 replies; 66+ messages in thread
From: Sven Verdoolaege @ 2005-07-10 21:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Junio C Hamano, git

On Sat, Jul 09, 2005 at 03:09:02PM -0600, Eric W. Biederman wrote:
> The current intelligent fetch currently has a problem that it cannot
> be used to bootstrap a repository.  If you don't have an ancestor
> of what you are fetching you can't fetch it.
> 

Not sure if this is what you want, but you could use the
following gitweb patch (to be applied on top of my previous
patches) to get a git tree snapshot for bootstrapping.

http://www.liacs.nl/~sverdool/gitweb.cgi?p=gitweb.git;a=summary
http://www.liacs.nl/~sverdool/gitweb.git/

skimo
--
Support pack snapshots.

---
commit f76a442a0e2166b3f17db0e496545a600a33f94c
tree f8f089ab738864e69e0155b10262dbec832b4a11
parent 8392280de17a89a451c1f7db4e268f2047d4aa83
author Sven Verdoolaege <skimo@liacs.nl> Sun, 10 Jul 2005 23:56:42 +0200
committer Sven Verdoolaege <skimo@liacs.nl> Sun, 10 Jul 2005 23:56:42 +0200

 gitweb.cgi |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gitweb.cgi b/gitweb.cgi
--- a/gitweb.cgi
+++ b/gitweb.cgi
@@ -2058,8 +2058,9 @@ sub git_snapshot {
 	      "<th></th>\n" .
 	      "</tr>\n";
 	my %types = (
-		'Bzipped tar archive' => 'tar.bz2',
-		'Gzipped tar archive' => 'tar.gz',
+		'Source tree (bzipped tar archive)' => 'tar.bz2',
+		'Source tree (gzipped tar archive)' => 'tar.gz',
+		'Git tree (pack file)' => 'pack',
 	);
 	my $alternate = 0;
 	for my $type (sort keys %types) {
@@ -2094,6 +2095,7 @@ sub git_serve_snapshot {
 	my %info = (
 		'tar.bz2' => [ 'application/x-bzip2', 'bzip2' ],
 		'tar.gz' => [ 'application/x-gzip', 'gzip' ],
+		'pack' => [ 'application/x-git-pack' ],
 	);
 	if (!exists $info{$st}) {
 		die_error(undef, "Unknown snapshot type.");
@@ -2101,7 +2103,10 @@ sub git_serve_snapshot {
 	my ($type, $zip) = @{$info{$st}};
 	print $cgi->header(-type => $type, 
 			   -attachment => "$project-$hash.$st");
-	open my $fd, "-|", "$gitbin/git-tar-tree $hash '$project-$hash' | $zip" 
+	open my $fd, "-|", ($st eq 'pack' ?
+		"$gitbin/git-rev-list --max-count=1 --objects $hash | ". 
+			"$gitbin/git-pack-objects --stdout" :
+		"$gitbin/git-tar-tree $hash '$project-$hash' | $zip")
 		or return;
 	undef $/;
 	print <$fd>;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-09 21:09                           ` Eric W. Biederman
  2005-07-10  5:11                             ` Linus Torvalds
  2005-07-10 21:48                             ` Sven Verdoolaege
@ 2005-07-10 22:36                             ` Linus Torvalds
  2005-07-11 15:19                               ` Eric W. Biederman
  2 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10 22:36 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Sat, 9 Jul 2005, Eric W. Biederman wrote:
> 
> The current intelligent fetch currently has a problem that it cannot
> be used to bootstrap a repository.  If you don't have an ancestor
> of what you are fetching you can't fetch it.

Sure you can.

See the current "git clone". It's actually quite good, it's a pleasure to 
use now that it gives updates on how much it has done.

Just do

	git clone src dest

to try it out. It starts out silent (for big repositories) because it 
takes a while to get the whole rev list, but once it gets going it's quite 
nice and gives a nice progress report..

It uses the exact same server side code that "git-fetch-pack" does (ie it
just starts "git-upload-pack" on the server).

Now, one thing you cannot do is to start a totally new _project_ on the
server side. In order to do a "git-send-pack", you need to first create a
directory and do a "git-init-db" on the remote side.

So to create a new project, what you need to do is

	src$ ssh target

	target$ mkdir new-project
	target$ cd new-project
	target$ git-init-db
	target$ exit

	src$ git-send-pack target:new-project master

and you've now sent your "master" branch to the new project at 
"target:new-project".

You can even populate multiple branches at a time: just list them all (you
do have to list them, because by default "git-send-pack" will update the
_common_ branches, and since the other end is empty, there obviously are
no common branches to start with).

Ahh, you should even be able to automate the sending of all branches by
doing

	git-send-pack target:new-project $(cd .git ; find refs -type f)

I think - that will end up being equivalent to a "reverse clone".

The smart clients are doing pretty damn well, I think.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-10 22:36                             ` Linus Torvalds
@ 2005-07-11 15:19                               ` Eric W. Biederman
  2005-07-11 16:38                                 ` Linus Torvalds
  2005-07-11 17:53                                 ` Linus Torvalds
  0 siblings, 2 replies; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-11 15:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sat, 9 Jul 2005, Eric W. Biederman wrote:
>> 
>> The current intelligent fetch currently has a problem that it cannot
>> be used to bootstrap a repository.  If you don't have an ancestor
>> of what you are fetching you can't fetch it.
>
> Sure you can.
>
> See the current "git clone". It's actually quite good, it's a pleasure to 
> use now that it gives updates on how much it has done.
>
> Just do
>
> 	git clone src dest

Sorry, somehow I just missed that, and then I noticed just a little
before you sent out your email.

I'm having the worst time putting together a mental model of how git
works, and the documentation is spotty enough that it hasn't been
helpful.  So I am wading through the code.  It seems every time I turn
a corner there is another rough spot.

I guess I was expecting to pull from one tree into another unrelated
tree.  Getting a tree with two heads and then be able to merge them
together.

A couple of questions.

1) Does git-clone-script when packed copy the entire repository
   or just take a couple of slices of the tree where you have
   references?

2) Is there a way for a pack to create deltas against objects
   that are not in the tree?  For a dumb repository making incremental
   changes this is ideal.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-11 15:19                               ` Eric W. Biederman
@ 2005-07-11 16:38                                 ` Linus Torvalds
  2005-07-12  0:44                                   ` Eric W. Biederman
  2005-07-11 17:53                                 ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-11 16:38 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Mon, 11 Jul 2005, Eric W. Biederman wrote:
> 
> I guess I was expecting to pull from one tree into another unrelated
> tree.  Getting a tree with two heads and then be able to merge them
> together.

You can do it, but you have to do it by hand. It's a valid operation, but 
it's not an operation I want people to do by mistake, so it's not 
something the trivial helper scripts help with.

The way to do it by hand is to just use something stupid that doesn't
understand what it's doing anyway, and just copy the files over. "cp -a" 
or "rsync" works fine. Then just do "git resolve" by hand. It's not very 
hard at all, but it's definitely something that should be a special case.

> A couple of questions.
> 
> 1) Does git-clone-script when packed copy the entire repository
>    or just take a couple of slices of the tree where you have
>    references?

It only gets the objects needed for the references, nothing more.

So if you only get one branch, it will leave the objects that are specific 
to other branches alone.

> 2) Is there a way for a pack to create deltas against objects
>    that are not in the tree?  For a dumb repository making incremental
>    changes this is ideal.

A pack can only have deltas against objects in that pack. It caan't even 
have deltas to other objects in the same tree, it literally is only 
_within_ a pack. This is so that each pack is totally independent: you can 
always unpack (and verify) the objects in a pack _without_ having anything 
else (of course, the end result is often not a full project, and you won't 
have any references, but at least the _objects_ are valid).

I don't want to have deltas to outside the pack, because while it's 
obviously very nice from a size packing standpoint, it's totally horrid 
from an infrastructure standpoint. It would make it possible to have 
circular dependencies (ie deltas against each other) that could only be 
resolved by having a third pack (or the unpacked object).

It would also means that you may have to have two packs mapped at the same
time to unpack them, which was very much against what I was aiming for: I
think that in the long run, for truly huge projects, you'd want to have a
history of packs, each maybe a gigabyte in size, and you may be in the 
situation that you simply cannot have two packs mapped at the same time 
because you don't have enough virtual memory for it.

So then inter-pack deltas would mean that you'd have to have "partial pack 
mapping" etc horrid special case logic. Right now, because a pack is 
always self-sufficient, you know that in order to unpack an object, if you 
find it in the index file, you will be able to unpack it by just mapping 
that pack and going off..

So the rule is: don't pack too often. The unpacked objects are actually 
working really really well as long as you don't have tens of thousands of 
them. Having a few hundred (or even a few thousand) unpacked objects is 
not a problem at all. Then you do a "git repack" when it starts getting 
uncomfortable, and you you continue.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-11 16:38                                 ` Linus Torvalds
@ 2005-07-12  0:44                                   ` Eric W. Biederman
  2005-07-12  1:14                                     ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-12  0:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Mon, 11 Jul 2005, Eric W. Biederman wrote:
>> 
>> I guess I was expecting to pull from one tree into another unrelated
>> tree.  Getting a tree with two heads and then be able to merge them
>> together.
>
> You can do it, but you have to do it by hand. It's a valid operation, but 
> it's not an operation I want people to do by mistake, so it's not 
> something the trivial helper scripts help with.
>
> The way to do it by hand is to just use something stupid that doesn't
> understand what it's doing anyway, and just copy the files over. "cp -a" 
> or "rsync" works fine. Then just do "git resolve" by hand. It's not very 
> hard at all, but it's definitely something that should be a special case.

Ok.  Only the dumb methods are allowed.

>> A couple of questions.
>> 
>> 1) Does git-clone-script when packed copy the entire repository
>>    or just take a couple of slices of the tree where you have
>>    references?
>
> It only gets the objects needed for the references, nothing more.
>
> So if you only get one branch, it will leave the objects that are specific 
> to other branches alone.

Hmm.  As I recall reading the code it grabs everything that is
in .git/refs/*.  So I would actually expect it to grab all of the
branches.   My real question was different.  With a clone it
appears to just get the objects used to compose a tree object,
but none of the history available by looking at the commit
parents is obtained.  Not at all what I would expect for
an operation named clone.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-12  0:44                                   ` Eric W. Biederman
@ 2005-07-12  1:14                                     ` Linus Torvalds
  2005-07-12  2:38                                       ` Eric W. Biederman
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-12  1:14 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Mon, 11 Jul 2005, Eric W. Biederman wrote:
>
> Ok.  Only the dumb methods are allowed.

Well, no, you can actually do git-clone-pack by hand in that git archive,
and it will use the smart packing to get the other end, even if it is
totally unrelated to the current project.

But you have to do it by "hand" in the sense that none of the nice helper
scripts will help you to do this. Merging two unrelated projects really is
a very special operation. I've done it once (gitk into git), and I don't
think we'll see it done very many times again.

> > So if you only get one branch, it will leave the objects that are specific 
> > to other branches alone.
> 
> Hmm.  As I recall reading the code it grabs everything that is
> in .git/refs/*.

Only by default.

If you specify a branch (or five) git-clone-pack will grab only that
branch.

However, I don't think "git clone" (the script) even exposes that, so
right now you'd not even see it - "git clone" only exposes the "get all
the branches by default" behaviour.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-12  1:14                                     ` Linus Torvalds
@ 2005-07-12  2:38                                       ` Eric W. Biederman
  2005-07-12  3:21                                         ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-12  2:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Mon, 11 Jul 2005, Eric W. Biederman wrote:
>> > So if you only get one branch, it will leave the objects that are specific 
>> > to other branches alone.
>> 
>> Hmm.  As I recall reading the code it grabs everything that is
>> in .git/refs/*.
>
> Only by default.
>
> If you specify a branch (or five) git-clone-pack will grab only that
> branch.
>
> However, I don't think "git clone" (the script) even exposes that, so
> right now you'd not even see it - "git clone" only exposes the "get all
> the branches by default" behaviour.

Yep.

The question:
Does git-upload-pack which gets it's list of objects
with "git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3"
get any history beyond the top of tree of each branch.  

As I read the code it does not.  

If the code does not get the history I see some problems.
In particular merging with a branch is hard because we
may not pull the common history point.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-12  2:38                                       ` Eric W. Biederman
@ 2005-07-12  3:21                                         ` Linus Torvalds
  2005-07-12  3:39                                           ` Eric W. Biederman
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-12  3:21 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Mon, 11 Jul 2005, Eric W. Biederman wrote:
> 
> The question:
> Does git-upload-pack which gets it's list of objects
> with "git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3"
> get any history beyond the top of tree of each branch.  
> 
> As I read the code it does not.  

It does. It gets all the history necessary for each branch. git-rev-list
will walk the whole history until it hits commits that as been marked as
uninteresting (or the parents of commits that have been marked as
uninteresting), and those are the ones that the receiver already has, of
course.

So after you get a pack, you have all the history for all the branches you 
got.

A branch you _didn't_ get, you don't get any history for, of course, but 
that doesn't matter. You'll get that history if you ever pull the branch 
later.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-12  3:21                                         ` Linus Torvalds
@ 2005-07-12  3:39                                           ` Eric W. Biederman
  2005-07-12  4:48                                             ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Eric W. Biederman @ 2005-07-12  3:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Mon, 11 Jul 2005, Eric W. Biederman wrote:
>> 
>> The question:
>> Does git-upload-pack which gets it's list of objects
>> with "git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3"
>> get any history beyond the top of tree of each branch.  
>> 
>> As I read the code it does not.  
>
> It does. It gets all the history necessary for each branch. git-rev-list
> will walk the whole history until it hits commits that as been marked as
> uninteresting (or the parents of commits that have been marked as
> uninteresting), and those are the ones that the receiver already has, of
> course.

Ok.  So the intention is sane then.

Looking closer it appears that commit_list_insert is recursive
and that is what I missed.

> So after you get a pack, you have all the history for all the branches you 
> got.
>
> A branch you _didn't_ get, you don't get any history for, of course, but 
> that doesn't matter. You'll get that history if you ever pull the branch 
> later.

Right.  Things work well if you have all of the history.


Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-12  3:39                                           ` Eric W. Biederman
@ 2005-07-12  4:48                                             ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-12  4:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Mon, 11 Jul 2005, Eric W. Biederman wrote:
> 
> Looking closer it appears that commit_list_insert is recursive
> and that is what I missed.

Actually, it's "pop_most_recent_commit()" that ends up being the
"recursive" part: it will pop the top-most entry, but as it is popping it 
it will push the parents of that entry onto the same list.

So basically, you can get a list of all history by first inserting the top 
entry, and then doing "pop_most_recent_commit()" until the list is empty.

Now, git-rev-list ends up being slightly more complex than that, since it
has support for multiple starting points, and marking commits (and thus
their parents) uninteresting, and two other sorting methods in addition to 
the default "by date" thing.

And then there's all the issues about tags, trees and blobs, and their
visibility as a function of the commits that are visible and the command
line arguments..

In fact, it turns out that git-rev-list is really the real heart of "git".  
Almost everything else revolves around it. Once you grok git-rev-list, you
probably really grok git.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] rev-list: add "--full-objects" flag.
  2005-07-11 15:19                               ` Eric W. Biederman
  2005-07-11 16:38                                 ` Linus Torvalds
@ 2005-07-11 17:53                                 ` Linus Torvalds
  1 sibling, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-11 17:53 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

On Mon, 11 Jul 2005, Eric W. Biederman wrote:
> 
> I'm having the worst time putting together a mental model of how git
> works, and the documentation is spotty enough that it hasn't been
> helpful.  So I am wading through the code.  It seems every time I turn
> a corner there is another rough spot.

Btw, I know I'm bad at writing docs, but what I _do_ enjoy doing is
answering reasonably specific technical questions, and maybe somebody else
can write docs by taking advantage of me that way.

I tried to write the tutorial in a way that it also tries to explain how
git works (not just a "do this", but a "you update the index file and then
write the result out as a tree object"), but it obviously covers a fairly
limited part of what git actually can do, and at the same time it doesn't
go into a lot of detail.

And part of that is not just my inability to write documentation, it's
also that I just have the wrong "view" of the project, ie I probably just
take a lot of things for granted and consider them obvious, even though
they aren't, and then I probably occasionally explain things that aren't
worth explaining, because either they _are_ obvious, or people just don't
care and they are irrelevant.

I'd love to see somebody write up more of a "this is how you use git" kind
of tutorial, _and_ on the other hand more of a low-level explanation of
the notion of an object store where objects refer to each other by their
SHA1 names, and how that is represented in the filesystem and/or in packs. 

Something with a few pictures would be great (ie screenshots of gitk, but
also something that tries to just visually show hot tags point to commits
that point to parents and trees, and trees pointing to other trees and
then blobs).

All things that I'm a complete idiot at, but that would help users 
visualize what the heck git is actually _doing_, so that they don't just 
parrot some magic command line that they don't understand, but can 
actually reason about what they are doing.

I think a lot of people do understand this, but yes, the docs are kind of 
lacking.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

[parent not found: <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>]

[parent not found: <Pine.LNX.4.58.0507082109140.17536@g5.osdl.org>]

[parent not found: <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net>]

* [PATCH] Check packs and then files.
       [not found]                               ` <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net>
@ 2005-07-11  7:00                                 ` Junio C Hamano
  0 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-11  7:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This reverses the order of object lookup, to check pack index
first and then go to the filesystem to find .git/objects/??/
hierarchy.  When most of the objects are packed, this saves
quite many stat() calls and negative dcache entries; while the
price this approach has to pay is negligible, even when most of
the objects are outside pack, because checking pack index file
is quite cheap.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 sha1_file.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

0394e2b0ed5b197510340f187d02ef2274b6cad2
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1035,14 +1035,17 @@ void * read_sha1_file(const unsigned cha
 {
 	unsigned long mapsize;
 	void *map, *buf;
+	struct pack_entry e;
 
+	if (find_pack_entry(sha1, &e))
+		return read_packed_sha1(sha1, type, size);
 	map = map_sha1_file_internal(sha1, &mapsize);
 	if (map) {
 		buf = unpack_sha1_file(map, mapsize, type, size);
 		munmap(map, mapsize);
 		return buf;
 	}
-	return read_packed_sha1(sha1, type, size);
+	return NULL;
 }
 
 void *read_object_with_reference(const unsigned char *sha1,
@@ -1343,9 +1346,9 @@ int has_sha1_file(const unsigned char *s
 	struct stat st;
 	struct pack_entry e;
 
-	if (find_sha1_file(sha1, &st))
+	if (find_pack_entry(sha1, &e))
 		return 1;
-	return find_pack_entry(sha1, &e);
+	return find_sha1_file(sha1, &st) ? 1 : 0;
 }
 
 int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, const char *type)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] Give --full-objects flag to rev-list when preparing a dumb server.
  2005-07-07 23:58                 ` Linus Torvalds
  2005-07-08  1:02                   ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
@ 2005-07-08  1:03                   ` Junio C Hamano
  1 sibling, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-08  1:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> This is nasty - if you mis-spell "self-sufficient" (easy enough to do) 
LT> you'll never know the end result isn't what you expected. It won't warn 
LT> you in any way, it will just make a non-self-sufficient pack..

To match the change of flag name to --full-objects,...

------------
This adds --full flag to git-repack-script, and uses it when
preparing the dumb server material.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 git-repack-script             |   10 +++++++++-
 git-update-dumb-server-script |    2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

0617ae867e7e27a7b484827f882fe7b396bea004
diff --git a/git-repack-script b/git-repack-script
--- a/git-repack-script
+++ b/git-repack-script
@@ -1,8 +1,16 @@
 #!/bin/sh
 : ${GIT_DIR=.git}
 : ${GIT_OBJECT_DIRECTORY="$GIT_DIR/objects"}
+
+case "$1" in
+--full)
+	objects=--full-objects ;;
+*)
+	objects=--objects ;;
+esac
+
 rm -f .tmp-pack-*
-packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) |
+packname=$(git-rev-list --unpacked $objects $(git-rev-parse --all) |
 	git-pack-objects --non-empty --incremental .tmp-pack) ||
 	exit 1
 if [ -z "$packname" ]; then
diff --git a/git-update-dumb-server-script b/git-update-dumb-server-script
--- a/git-update-dumb-server-script
+++ b/git-update-dumb-server-script
@@ -26,7 +26,7 @@ plain_size=$(
 
 if test $max_plain_size -lt $plain_size >/dev/null
 then
-	git-repack-script && git-prune-packed
+	git-repack-script --full && git-prune-packed
 fi &&
 
 git-update-dumb-server &&
------------

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] Use --objects=self-sufficient flag to rev-list.
  2005-07-07 23:16             ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
  2005-07-07 23:50               ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano
@ 2005-07-07 23:50               ` Junio C Hamano
  1 sibling, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 23:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This adds --self-sufficient flag to git-repack-script, and uses
it when preparing the dumb server material.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

*** This makes things easier for the dumb puller because
*** self-sufficient pack means less falling back on traditional
*** http-pull.

 git-repack-script             |   10 +++++++++-
 git-update-dumb-server-script |    2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

6b0568181ede5540706bcdf69868102f554a2f8a
diff --git a/git-repack-script b/git-repack-script
--- a/git-repack-script
+++ b/git-repack-script
@@ -1,8 +1,16 @@
 #!/bin/sh
 : ${GIT_DIR=.git}
 : ${GIT_OBJECT_DIRECTORY="$GIT_DIR/objects"}
+
+case "$1" in
+--self-sufficient)
+	objects=--objects=self-sufficient ;;
+*)
+	objects=--objects ;;
+esac
+
 rm -f .tmp-pack-*
-packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) |
+packname=$(git-rev-list --unpacked $objects $(git-rev-parse --all) |
 	git-pack-objects --non-empty --incremental .tmp-pack) ||
 	exit 1
 if [ -z "$packname" ]; then
diff --git a/git-update-dumb-server-script b/git-update-dumb-server-script
--- a/git-update-dumb-server-script
+++ b/git-update-dumb-server-script
@@ -26,7 +26,7 @@ plain_size=$(
 
 if test $max_plain_size -lt $plain_size >/dev/null
 then
-	git-repack-script && git-prune-packed
+	git-repack-script --self-sufficient && git-prune-packed
 fi &&
 
 git-update-dumb-server &&
------------

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 22:52           ` Linus Torvalds
  2005-07-07 23:16             ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
@ 2005-07-07 23:52             ` Tony Luck
  2005-07-07 23:54               ` Junio C Hamano
  2005-07-07 23:59               ` Linus Torvalds
  1 sibling, 2 replies; 66+ messages in thread
From: Tony Luck @ 2005-07-07 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git

> > So, what _is_ then the way to pull now, actually? If we use rsync, won't
> > we end up with having the objects we previous had twice now?
> 
> Rsync works fine. You can either unpack the pack you get, or, if you
> prefer, just run
> 
>         git-prune-packed

cg-update from a local repo that contains packs is broken though :-(

Also "git-fsck-cache" in a repo that is fully packed complains:

   fatal: No default references

-Tony

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 23:52             ` [ANNOUNCE] Cogito-0.12 Tony Luck
@ 2005-07-07 23:54               ` Junio C Hamano
  2005-07-07 23:59               ` Linus Torvalds
  1 sibling, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2005-07-07 23:54 UTC (permalink / raw)
  To: Tony Luck; +Cc: git

>>>>> "TL" == Tony Luck <tony.luck@gmail.com> writes:

TL> Also "git-fsck-cache" in a repo that is fully packed complains:

TL>    fatal: No default references

"git-fsck-cache --full", perhaps?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 23:52             ` [ANNOUNCE] Cogito-0.12 Tony Luck
  2005-07-07 23:54               ` Junio C Hamano
@ 2005-07-07 23:59               ` Linus Torvalds
  2005-07-08  0:09                 ` Tony Luck
  2005-07-08  0:09                 ` Linus Torvalds
  1 sibling, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-07 23:59 UTC (permalink / raw)
  To: Tony Luck; +Cc: Petr Baudis, Junio C Hamano, git



On Thu, 7 Jul 2005, Tony Luck wrote:
>
> > > So, what _is_ then the way to pull now, actually? If we use rsync, won't
> > > we end up with having the objects we previous had twice now?
> > 
> > Rsync works fine. You can either unpack the pack you get, or, if you
> > prefer, just run
> > 
> >         git-prune-packed
> 
> cg-update from a local repo that contains packs is broken though :-(

Is this with cg-0.12? The most recent release should be happy with packs.

> Also "git-fsck-cache" in a repo that is fully packed complains:
> 
>    fatal: No default references

Ahh, that's true. I knew about it, and forgot. Will fix,

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 23:59               ` Linus Torvalds
@ 2005-07-08  0:09                 ` Tony Luck
  2005-07-08  0:23                   ` Linus Torvalds
  2005-07-08  0:09                 ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Tony Luck @ 2005-07-08  0:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git

> > cg-update from a local repo that contains packs is broken though :-(
> 
> Is this with cg-0.12? The most recent release should be happy with packs.

Yes ... I pulled, built and installed the latest cogito this afternoon
before trying
to touch anything involving packs.  cg-version says:

cogito-0.12 (b21855b8734ca76ea08c0c17e4a204191b6e3add)

This is what happens ("linus" is a local branch just pulled from kernel.org,
so it just contains one pack file and its index).

$ cg-update linus
`/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus'
does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69
798458e067
Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067
while processing commit 0000000000000000000000000000000000000000.
cg-pull: objects pull failed

If I try it again, it thinks things are up to date (since it
mistakenly updated the
.git/refs/heads/linus), but then fails to apply (since it doesn't have
the objects
it needs).

-Tony

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-08  0:09                 ` Tony Luck
@ 2005-07-08  0:23                   ` Linus Torvalds
  2005-07-09 21:58                     ` Russell King
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-08  0:23 UTC (permalink / raw)
  To: Tony Luck; +Cc: Petr Baudis, Junio C Hamano, git



On Thu, 7 Jul 2005, Tony Luck wrote:
> 
> This is what happens ("linus" is a local branch just pulled from kernel.org,
> so it just contains one pack file and its index).
> 
> $ cg-update linus
> `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus'
> does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69
> 798458e067
> Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067
> while processing commit 0000000000000000000000000000000000000000.
> cg-pull: objects pull failed

Ok. The immediate fix is to just unpack the pack:

	mv .git/objects/pack/* .git/
	for i in .git/*.pack; do git-unpack-objects < $i; done

(or similar - the above is untested, but I think it should be obvious 
enough what I'm trying to do).

> If I try it again, it thinks things are up to date (since it mistakenly
> updated the .git/refs/heads/linus), but then fails to apply (since it
> doesn't have the objects it needs).

Ok, that's a worse bug, it really shouldn't update the head until _after_ 
the pull has succeeded.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-08  0:23                   ` Linus Torvalds
@ 2005-07-09 21:58                     ` Russell King
  2005-07-09 22:29                       ` Russell King
  2005-07-10  8:09                       ` Russell King
  0 siblings, 2 replies; 66+ messages in thread
From: Russell King @ 2005-07-09 21:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tony Luck, Petr Baudis, Junio C Hamano, git

On Thu, Jul 07, 2005 at 05:23:26PM -0700, Linus Torvalds wrote:
> On Thu, 7 Jul 2005, Tony Luck wrote:
> > This is what happens ("linus" is a local branch just pulled from kernel.org,
> > so it just contains one pack file and its index).
> > 
> > $ cg-update linus
> > `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus'
> > does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69
> > 798458e067
> > Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067
> > while processing commit 0000000000000000000000000000000000000000.
> > cg-pull: objects pull failed
> 
> Ok. The immediate fix is to just unpack the pack:
> 
> 	mv .git/objects/pack/* .git/
> 	for i in .git/*.pack; do git-unpack-objects < $i; done
> 
> (or similar - the above is untested, but I think it should be obvious 
> enough what I'm trying to do).

This is evil on the bandwidth, since you'll keep refetching the packed
object (64MB of it) over and over.

However, I've tried the above, and I get:

$ mv .git/objects/pack/* .git/
$ for i in .git/*.pack; do git-unpack-objects < $i; done
Unpacking 55435 objects
fatal: inflate returned -3

so it seems that the pack is corrupt... or something.

$ md5sum .git/*.pack
2be38f2947b99bcd088c1930122aadec  .git/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack

and git-fsck-cache produces lots and lots of:

dangling tree fae688b62db0b553aae0bf17f0f70e93819dec2b
broken link from    tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347
              to    blob 008e19210e66f01fbaef1aba30243850766b8b12
broken link from    tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347
              to    blob edae09a4b021e353ab4fbba756e31492fbb8fd2e
broken link from    tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347
              to    blob d098b3ba35384fb912989348fd6da59820711ca4
... etc ...

-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-09 21:58                     ` Russell King
@ 2005-07-09 22:29                       ` Russell King
  2005-07-09 23:46                         ` Junio C Hamano
  2005-07-10  8:09                       ` Russell King
  1 sibling, 1 reply; 66+ messages in thread
From: Russell King @ 2005-07-09 22:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tony Luck, Petr Baudis, Junio C Hamano, git

On Sat, Jul 09, 2005 at 10:58:18PM +0100, Russell King wrote:
> On Thu, Jul 07, 2005 at 05:23:26PM -0700, Linus Torvalds wrote:
> > On Thu, 7 Jul 2005, Tony Luck wrote:
> > > This is what happens ("linus" is a local branch just pulled from kernel.org,
> > > so it just contains one pack file and its index).
> > > 
> > > $ cg-update linus
> > > `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus'
> > > does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69
> > > 798458e067
> > > Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067
> > > while processing commit 0000000000000000000000000000000000000000.
> > > cg-pull: objects pull failed
> > 
> > Ok. The immediate fix is to just unpack the pack:
> > 
> > 	mv .git/objects/pack/* .git/
> > 	for i in .git/*.pack; do git-unpack-objects < $i; done
> > 
> > (or similar - the above is untested, but I think it should be obvious 
> > enough what I'm trying to do).
> 
> This is evil on the bandwidth, since you'll keep refetching the packed
> object (64MB of it) over and over.
> 
> However, I've tried the above, and I get:
> 
> $ mv .git/objects/pack/* .git/
> $ for i in .git/*.pack; do git-unpack-objects < $i; done
> Unpacking 55435 objects
> fatal: inflate returned -3
> 
> so it seems that the pack is corrupt... or something.
> 
> $ md5sum .git/*.pack
> 2be38f2947b99bcd088c1930122aadec  .git/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack
> 
> and git-fsck-cache produces lots and lots of:
> 
> dangling tree fae688b62db0b553aae0bf17f0f70e93819dec2b
> broken link from    tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347
>               to    blob 008e19210e66f01fbaef1aba30243850766b8b12
> broken link from    tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347
>               to    blob edae09a4b021e353ab4fbba756e31492fbb8fd2e
> broken link from    tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347
>               to    blob d098b3ba35384fb912989348fd6da59820711ca4
> ... etc ...

Additional information: x86 box, running FC2, cogito 0.12 built from
the src.rpm on kernel.org.  Lots of disk space (blocks + inodes)
remaining.

Pretty please can we stop breaking rmk's git/cogito/repos/scripts ?

-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-09 22:29                       ` Russell King
@ 2005-07-09 23:46                         ` Junio C Hamano
  2005-07-10  5:02                           ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2005-07-09 23:46 UTC (permalink / raw)
  To: Russell King; +Cc: Linus Torvalds, Tony Luck, Petr Baudis, git

>>>>> "RK" == Russell King <rmk@arm.linux.org.uk> writes:

>> $ mv .git/objects/pack/* .git/
>> $ for i in .git/*.pack; do git-unpack-objects < $i; done
>> Unpacking 55435 objects
>> fatal: inflate returned -3
>> 
>> so it seems that the pack is corrupt... or something.
>> 
>> $ md5sum .git/*.pack
>> 2be38f2947b99bcd088c1930122aadec  .git/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack

RK> Additional information: x86 box, running FC2, cogito 0.12 built from
RK> the src.rpm on kernel.org.  Lots of disk space (blocks + inodes)
RK> remaining.

Hmph, I am worried about that inflate() failure.  An x86 box,
running Debian sarge, vanilla git without Cogito built from
Linus tip.  From here, it does not look like the pack corruption
to me; unless you broke md5sum and found a collission, that is.

: siamese; type git-unpack-objects
git-unpack-objects is /home/junio/bin/Linux/git-unpack-objects
: siamese; ldd /home/junio/bin/Linux/git-unpack-objects
        libz.so.1 => /usr/lib/libz.so.1 (0xb7f8e000)
        libcrypto.so.0.9.7 =>
        /usr/lib/i686/cmov/libcrypto.so.0.9.7 (0xb7e8e000)
        libc.so.6 => /lib/tls/libc.so.6 (0xb7d59000)
        libdl.so.2 => /lib/tls/libdl.so.2 (0xb7d56000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb7fad000)
: siamese; cd /opt/packrat/playpen/public/in-place/git/linux-2.6/
: siamese; md5sum .git/objects/pack/pack-*.pack
2be38f2947b99bcd088c1930122aadec  .git/objects/pack/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack
: siamese; cd ..
: siamese; mkdir junk
: siamese; cd junk
: siamese; git-init-db
defaulting to local storage area
: siamese; git-unpack-objects <../linux-2.6/.git/objects/pack/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack
Unpacking 55435 objects  100% (55434/55435) done

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-09 23:46                         ` Junio C Hamano
@ 2005-07-10  5:02                           ` Linus Torvalds
  2005-07-10  5:15                             ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10  5:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Russell King, Tony Luck, Petr Baudis, git



On Sat, 9 Jul 2005, Junio C Hamano wrote:
>
> >>>>> "RK" == Russell King <rmk@arm.linux.org.uk> writes:
> 
> >> $ mv .git/objects/pack/* .git/
> >> $ for i in .git/*.pack; do git-unpack-objects < $i; done
> >> Unpacking 55435 objects
> >> fatal: inflate returned -3

Ahh, damn. 

> >> so it seems that the pack is corrupt... or something.

No, I htink you're using cogito-0.12, and I fixed this one-liner that 
didn't make it into cogito:

	diff-tree 291ec0f2d2ce65e5ccb876b46d6468af49ddb82e (from 72347a233e6f3c176059a28f0817de6654ef29c7)
	Author: Linus Torvalds <torvalds@g5.osdl.org>
	Date:   Tue Jul 5 17:06:09 2005 -0700
	
	    Don't special-case a zero-sized compression.
	
	    zlib actually writes a header for that case, and while ignoring that
	    header will get us the right data, it will also end up messing up our
	    stream position.  So we actually want zlib to "uncompress" even an empty
	    object.
	
	diff --git a/unpack-objects.c b/unpack-objects.c
	--- a/unpack-objects.c
	+++ b/unpack-objects.c
	@@ -55,8 +55,6 @@ static void *get_data(unsigned long size
	        z_stream stream;
	        void *buf = xmalloc(size);
	
	-       if (!size)
	-               return buf;
	        memset(&stream, 0, sizeof(stream));
	
	        stream.next_out = buf;

(well, I guess it's a two-liner.).

What happens is that there's one zero-sized blob in the kernel archive 
history, and when we pack it, we pack it as a 8-byte "compressed" thing 
(hey, zlib has a header, that's normal), but when we unpack it, because we 
notice that the result is zero, we'd just skip the zlib header.

Which was wrong, because now the _next_ object will try to unpack at the 
wrong offset, and that explains why you get -3 ("bad data").

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10  5:02                           ` Linus Torvalds
@ 2005-07-10  5:15                             ` Linus Torvalds
  2005-07-10  6:55                               ` Russell King
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10  5:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Russell King, Tony Luck, Petr Baudis, git

On Sat, 9 Jul 2005, Linus Torvalds wrote:
> 
> No, I htink you're using cogito-0.12, and I fixed this one-liner that 
> didn't make it into cogito:

Btw, this will only affect unpacking. The packed objects should be fine,
and you'll never see this if you keep the index file around and have the
pack in .git/objects/pack, because then git won't ever do the "streaming"  
thing, it will look up exactly where the object is using the index, and it
doesn't matter that it doesn't look at the compressed data of a zero-sized
object.

So cogito isn't terminally broken, it just can't do the streaming unpack.

And as Russell points out, unpacking the packs after downloading them is
actually the wrong thing to do, because you break the rsync'ness of your
archive, so you'll keep on downloading the pack-files over and over again.

So you can fix this by getting the current git release, but you probably 
shouldn't even care.  Just use the pack-files as pack-files instead, and 
enjoy the higher performance and lower disk use ;).

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10  5:15                             ` Linus Torvalds
@ 2005-07-10  6:55                               ` Russell King
  2005-07-10  7:15                                 ` Junio C Hamano
  0 siblings, 1 reply; 66+ messages in thread
From: Russell King @ 2005-07-10  6:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Tony Luck, Petr Baudis, git

On Sat, Jul 09, 2005 at 10:15:41PM -0700, Linus Torvalds wrote:
> So you can fix this by getting the current git release, but you probably 
> shouldn't even care.  Just use the pack-files as pack-files instead, and 
> enjoy the higher performance and lower disk use ;).

I would if I could, but my workflow involves having an untouched local
copy of your tree and several trees for each area.

This involves updates using relative paths, and as has already been
found elsewhere, this (with cogito 0.12) doesn't work with packed
objects yet.

-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10  6:55                               ` Russell King
@ 2005-07-10  7:15                                 ` Junio C Hamano
  2005-07-10 12:46                                   ` Russell King
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2005-07-10  7:15 UTC (permalink / raw)
  To: Russell King; +Cc: Linus Torvalds, Tony Luck, Petr Baudis, git

>>>>> "RK" == Russell King <rmk@arm.linux.org.uk> writes:

RK> I would if I could, but my workflow involves having an untouched local
RK> copy of your tree and several trees for each area.

RK> This involves updates using relative paths, and as has already been
RK> found elsewhere, this (with cogito 0.12) doesn't work with packed
RK> objects yet.

As a workaround until Cogito gets updated, would it help to have
the environment variable GIT_ALTERNATE_OBJECT_DIRECTORIES
pointing at the untouched copy of Linus tree's .git/objects/
directory?  All your other trees would find the objects in your
copied-Linus tree (including packed one) available to them
already and hopefully pull breakage does not even have to touch
those objects.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10  7:15                                 ` Junio C Hamano
@ 2005-07-10 12:46                                   ` Russell King
  2005-07-10 16:51                                     ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Russell King @ 2005-07-10 12:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Tony Luck, Petr Baudis, git

On Sun, Jul 10, 2005 at 12:15:48AM -0700, Junio C Hamano wrote:
> As a workaround until Cogito gets updated, would it help to have
> the environment variable GIT_ALTERNATE_OBJECT_DIRECTORIES
> pointing at the untouched copy of Linus tree's .git/objects/
> directory?  All your other trees would find the objects in your
> copied-Linus tree (including packed one) available to them
> already and hopefully pull breakage does not even have to touch
> those objects.

That seems to work, thanks.  I think this is a good idea anyway -
it seems to mean that each working tree ends up with an empty set of
.git/objects/* directories.  When new work is done in a tree, the
corresponding objects then appear, and only these objects need
transferring upstream.

It means that rsync --delete-after can (in theory) be used when
making changes available to the upstream maintainer.

-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10 12:46                                   ` Russell King
@ 2005-07-10 16:51                                     ` Linus Torvalds
  2005-07-10 19:15                                       ` Russell King
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10 16:51 UTC (permalink / raw)
  To: Russell King; +Cc: Junio C Hamano, Tony Luck, Petr Baudis, git

On Sun, 10 Jul 2005, Russell King wrote:
> 
> It means that rsync --delete-after can (in theory) be used when
> making changes available to the upstream maintainer.

I'd suggest against that from a safety standpoint (no backups), but what 
you _can_ do is to upload only the objects I don't have. 

This actually works - I already synced several weeks ago with Paul 
Mackerras, who had made his ppc64 git thing contain only the objects that 
I didn't have.

In other words, if you have my tree pointed to by
GIT_ALTERNATE_OBJECT_DIRECTORIES, and you populate your tree only with new
files, you can actually upload that small "sparsely populated" tree as-is
(without any of the objects that came from my tree), and I should be able
to pull it as-is.

Well, at least with rsync. I think my git "pack" send/receive thing might
be unhappy about a partial tree, but that's something I can fix, so if
this makes it easier for people (you can create a totally new tre _really_ 
cheaply and also upload it and move it around very cheaply), then I'm ok 
with pulling from partial repositories, and I have indeed already done so 
in the past.

Btw, if people start doing this, then I really think we want a 
".git/config" file, so that you can have different alternate object 
directories for different git directories without having to remember to 
set the environment variables all the time.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10 16:51                                     ` Linus Torvalds
@ 2005-07-10 19:15                                       ` Russell King
  2005-07-10 20:03                                         ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Russell King @ 2005-07-10 19:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

On Sun, Jul 10, 2005 at 09:51:16AM -0700, Linus Torvalds wrote:
> On Sun, 10 Jul 2005, Russell King wrote:
> > It means that rsync --delete-after can (in theory) be used when
> > making changes available to the upstream maintainer.
> 
> I'd suggest against that from a safety standpoint (no backups), but what 
> you _can_ do is to upload only the objects I don't have. 
> 
> This actually works - I already synced several weeks ago with Paul 
> Mackerras, who had made his ppc64 git thing contain only the objects that 
> I didn't have.

Ok, let's give this a go then.  However, I'm not confident in this
working, especially after seeing the output of git-fsck-cache --full...
and I've no idea _why_ it's complaining.

I've pushed this (partial) tree out to
master.kernel.org:~rmk/linux-2.6-arm.git

Below is the usual mail.

$ export | grep GIT_
declare -x GIT_ALTERNATE_OBJECT_DIRECTORIES="/home/rmk/git/linux-2.6/.git/objects"
$ git-fsck-cache --full
error: cannot read sha1_file for 0084227438c28d26bc2d089b1facc4675310f741
bad sha1 entry '0084227438c28d26bc2d089b1facc4675310f741'
error: cannot read sha1_file for 008c1ddc1fc2854b64fcb49a40f1c933d116fb5c
bad sha1 entry '008c1ddc1fc2854b64fcb49a40f1c933d116fb5c'
...
error: cannot read sha1_file for 83c28d2c90fe720b5a315b89301cf3a519ffed88
bad sha1 entry '83c28d2c90fe720b5a315b89301cf3a519ffed88'
dangling commit 043d051615aa5da09a7e44f1edbb69798458e067
dangling commit a92b7b80579fe68fe229892815c750f6652eb6a9
$ grep . .git/refs/heads/*
.git/refs/heads/master:ec6bced6c7b92904f5ead39c9c1b8dc734e6eff0
.git/refs/heads/origin:f179bc77d09b9087bfc559d0368bba350342ac76
.git/refs/heads/smp:053a7b5b7617a72d7c61b6f84196d1c0f79b9849
$ cd $GIT_ALTERNATE_OBJECT_DIRECTORIES/../..
$ git-fsck-cache --full
$ 

Could this be because cogito doesn't know how to handle this setup
properly yet?  Have I just destroyed my git tree by trying to apply
stuff to it?

---

Linus, Andrew,

Please incorporate the latest ARM changes, which can be found at:

	master.kernel.org:/home/rmk/linux-2.6-arm.git

This will update the following files:

 arch/arm/mach-omap/Kconfig              |  221 -----
 arch/arm/mach-omap/Makefile             |   40 
 arch/arm/mach-omap/Makefile.boot        |    4 
 arch/arm/mach-omap/board-generic.c      |  100 --
 arch/arm/mach-omap/board-h2.c           |  189 ----
 arch/arm/mach-omap/board-h3.c           |  207 -----
 arch/arm/mach-omap/board-innovator.c    |  282 ------
 arch/arm/mach-omap/board-netstar.c      |  153 ---
 arch/arm/mach-omap/board-osk.c          |  171 ----
 arch/arm/mach-omap/board-perseus2.c     |  191 ----
 arch/arm/mach-omap/board-voiceblue.c    |  258 ------
 arch/arm/mach-omap/clock.c              | 1076 --------------------------
 arch/arm/mach-omap/clock.h              |  112 --
 arch/arm/mach-omap/common.c             |  549 -------------
 arch/arm/mach-omap/common.h             |   36 
 arch/arm/mach-omap/dma.c                | 1086 --------------------------
 arch/arm/mach-omap/fpga.c               |  188 ----
 arch/arm/mach-omap/gpio.c               |  762 ------------------
 arch/arm/mach-omap/irq.c                |  219 -----
 arch/arm/mach-omap/leds-h2p2-debug.c    |  144 ---
 arch/arm/mach-omap/leds-innovator.c     |  103 --
 arch/arm/mach-omap/leds-osk.c           |  198 ----
 arch/arm/mach-omap/leds.c               |   61 -
 arch/arm/mach-omap/leds.h               |    3 
 arch/arm/mach-omap/mcbsp.c              |  685 ----------------
 arch/arm/mach-omap/mux.c                |  163 ---
 arch/arm/mach-omap/ocpi.c               |  114 --
 arch/arm/mach-omap/pm.c                 |  632 ---------------
 arch/arm/mach-omap/sleep.S              |  314 -------
 arch/arm/mach-omap/time.c               |  424 ----------
 arch/arm/mach-omap/usb.c                |  593 --------------
 arch/arm/Kconfig                        |    6 
 arch/arm/Makefile                       |    6 
 arch/arm/configs/enp2611_defconfig      |   20 
 arch/arm/configs/ixdp2400_defconfig     |   20 
 arch/arm/configs/ixdp2401_defconfig     |   20 
 arch/arm/configs/ixdp2800_defconfig     |   20 
 arch/arm/configs/ixdp2801_defconfig     |   20 
 arch/arm/configs/omap_h2_1610_defconfig |  117 +-
 arch/arm/mach-ixp2000/core.c            |   55 -
 arch/arm/mach-ixp2000/enp2611.c         |    1 
 arch/arm/mach-ixp2000/ixdp2x00.c        |    1 
 arch/arm/mach-ixp2000/ixdp2x01.c        |    1 
 arch/arm/mach-omap1/Kconfig             |  144 +++
 arch/arm/mach-omap1/Makefile            |   30 
 arch/arm/mach-omap1/Makefile.boot       |    3 
 arch/arm/mach-omap1/board-generic.c     |   99 ++
 arch/arm/mach-omap1/board-h2.c          |  188 ++++
 arch/arm/mach-omap1/board-h3.c          |  206 ++++
 arch/arm/mach-omap1/board-innovator.c   |  281 ++++++
 arch/arm/mach-omap1/board-netstar.c     |  152 +++
 arch/arm/mach-omap1/board-osk.c         |  170 ++++
 arch/arm/mach-omap1/board-perseus2.c    |  190 ++++
 arch/arm/mach-omap1/board-voiceblue.c   |  257 ++++++
 arch/arm/mach-omap1/fpga.c              |  188 ++++
 arch/arm/mach-omap1/id.c                |  188 ++++
 arch/arm/mach-omap1/io.c                |  115 ++
 arch/arm/mach-omap1/irq.c               |  234 +++++
 arch/arm/mach-omap1/leds-h2p2-debug.c   |  144 +++
 arch/arm/mach-omap1/leds-innovator.c    |  103 ++
 arch/arm/mach-omap1/leds-osk.c          |  194 ++++
 arch/arm/mach-omap1/leds.c              |   61 +
 arch/arm/mach-omap1/leds.h              |    3 
 arch/arm/mach-omap1/serial.c            |  200 ++++
 arch/arm/mach-omap1/time.c              |  436 ++++++++++
 arch/arm/mm/Kconfig                     |    2 
 arch/arm/mm/mm-armv.c                   |    4 
 arch/arm/plat-omap/Kconfig              |  112 ++
 arch/arm/plat-omap/Makefile             |   17 
 arch/arm/plat-omap/clock.c              | 1323 ++++++++++++++++++++++++++++++++
 arch/arm/plat-omap/clock.h              |  120 ++
 arch/arm/plat-omap/common.c             |  135 +++
 arch/arm/plat-omap/cpu-omap.c           |  128 +++
 arch/arm/plat-omap/dma.c                | 1116 ++++++++++++++++++++++++++
 arch/arm/plat-omap/gpio.c               |  762 ++++++++++++++++++
 arch/arm/plat-omap/mcbsp.c              |  758 ++++++++++++++++++
 arch/arm/plat-omap/mux.c                |  160 +++
 arch/arm/plat-omap/ocpi.c               |  114 ++
 arch/arm/plat-omap/pm.c                 |  632 +++++++++++++++
 arch/arm/plat-omap/sleep.S              |  314 +++++++
 arch/arm/plat-omap/usb.c                |  593 ++++++++++++++
 include/asm-arm/arch-ixp2000/platform.h |    1 
 include/asm-arm/arch-omap/board-h2.h    |    5 
 include/asm-arm/arch-omap/board-h3.h    |    5 
 include/asm-arm/arch-omap/board-osk.h   |    5 
 include/asm-arm/arch-omap/board.h       |   12 
 include/asm-arm/arch-omap/common.h      |   36 
 include/asm-arm/arch-omap/dma.h         |    1 
 include/asm-arm/arch-omap/hardware.h    |   24 
 include/asm-arm/arch-omap/irqs.h        |    3 
 include/asm-arm/arch-omap/mux.h         |   28 
 include/asm-arm/arch-omap/omap16xx.h    |   32 
 include/asm-arm/arch-omap/system.h      |   21 
 93 files changed, 10164 insertions(+), 9450 deletions(-)

through these changes:

From: Tony Lindgren: Sun Jul 10 19:58:20 BST 2005
	
	[PATCH] ARM: 2803/1: OMAP update 11/11: Add cpufreq support
	
	Patch from Tony Lindgren
	
	This patch adds minimal cpufreq support for OMAP
	taking advantage of the clock framework.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:19 BST 2005
	
	[PATCH] ARM: 2805/1: OMAP update 10/11: Update H2 defconfig
	
	Patch from Tony Lindgren
	
	This patch updates H2 defconfig.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:18 BST 2005
	
	[PATCH] ARM: 2804/1: OMAP update 9/11: Update OMAP arch files
	
	Patch from Tony Lindgren
	
	This patch by various OMAP developers syncs the OMAP
	specific arch files with the linux-omap tree.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:17 BST 2005
	
	[PATCH] ARM: 2802/1: OMAP update 8/11: Update OMAP arch files
	
	Patch from Tony Lindgren
	
	This patch by various OMAP developers syncs the OMAP
	specific arch files with the linux-omap tree.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:15 BST 2005
	
	[PATCH] ARM: 2812/1: OMAP update 7c/11: Move arch-omap to plat-omap
	
	Patch from Tony Lindgren
	
	This patch move common OMAP code from arch-omap to plat-omap
	directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:14 BST 2005
	
	[PATCH] ARM: 2809/1: OMAP update 7b/11: Move arch-omap to plat-omap
	
	Patch from Tony Lindgren
	
	This patch move common OMAP code from arch-omap to plat-omap
	directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:13 BST 2005
	
	[PATCH] ARM: 2807/1: OMAP update 7a/11: Move arch-omap to plat-omap
	
	Patch from Tony Lindgren
	
	This patch move common OMAP code from arch-omap to plat-omap
	directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:12 BST 2005
	
	[PATCH] ARM: 2801/1: OMAP update 6/11: Split OMAP1 common code into id, io and serial
	
	Patch from Tony Lindgren
	
	This patch by Juha YrjÃ¶lÃ¤ and other OMAP developers splits
	OMAP1 specific common code into OMAP1 id, io, and serial
	code in mach-omap1 directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:11 BST 2005
	
	[PATCH] ARM: 2806/1: OMAP update 5/11: Move board files into mach-omap1 directory
	
	Patch from Tony Lindgren
	
	This patch by Paul Mundt and other OMAP developers
	moves OMAP1 board files into mach-omap1 directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:10 BST 2005
	
	[PATCH] ARM: 2799/1: OMAP update 4/11: Move OMAP1 LED code into mach-omap1 directory
	
	Patch from Tony Lindgren
	
	This patch by Paul Mundt and other OMAP developers
	moves OMAP1 specific LED code into mach-omap1 directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:09 BST 2005
	
	[PATCH] ARM: 2800/1: OMAP update 3/11: Move OMAP1 core code into mach-omap1 directory
	
	Patch from Tony Lindgren
	
	This patch by Paul Mundt and other OMAP developers
	moves OMAP1 specific IRQ, time, and FPGA code into
	mach-omap1 directory.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:08 BST 2005
	
	[PATCH] ARM: 2798/1: OMAP update 2/11: Change ARM Kconfig to support omap1 and omap2
	
	Patch from Tony Lindgren
	
	This patch by Paul Mundt and other OMAP developers modifies
	ARM specific Kconfig to allow sharing code between OMAP1 and
	OMAP2 architectures.
	In order to share code between OMAP1 and OMAP2, all OMAP1
	specific code is moved into mach-omap1 directory in the
	following patch. A new mach-omap2 directory will be added
	later on.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Tony Lindgren: Sun Jul 10 19:58:06 BST 2005
	
	[PATCH] ARM: 2797/1: OMAP update 1/11: Update include files
	
	Patch from Tony Lindgren
	
	This patch by various OMAP developers syncs the OMAP
	specific include files with the linux-omap tree.
	
	Signed-off-by: Tony Lindgren <tony@atomide.com>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Deepak Saxena: Sun Jul 10 19:44:55 BST 2005
	
	[PATCH] ARM: 2796/1: Fix ARMv5[TEJ] check in MMU initalization
	
	Patch from Deepak Saxena
	
	The code in mm-armv.c checks for the condition (cpu_architecture()<= ARMv5)
	in a few places but should be checking for ARMv5TEJ as the MMU is shared
	across all v5 variations.
	
	Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Lennert Buytenhek: Sun Jul 10 19:44:54 BST 2005
	
	[PATCH] ARM: 2795/1: update ixp2000 defconfigs
	
	Patch from Lennert Buytenhek
	
	Update the ixp2000 defconfigs from 2.6.12-git6 to 2.6.13-rc2.
	
	Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

From: Lennert Buytenhek: Sun Jul 10 19:44:53 BST 2005
	
	[PATCH] ARM: 2793/1: platform serial support for ixp2000
	
	Patch from Lennert Buytenhek
	
	This patch converts the ixp2000 serial port over to a platform
	serial device.
	
	Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
	Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
	Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>



-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10 19:15                                       ` Russell King
@ 2005-07-10 20:03                                         ` Linus Torvalds
  2005-07-10 20:32                                           ` Russell King
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10 20:03 UTC (permalink / raw)
  To: Russell King; +Cc: Junio C Hamano, Petr Baudis, git

On Sun, 10 Jul 2005, Russell King wrote:
> 
> Ok, let's give this a go then.  However, I'm not confident in this
> working, especially after seeing the output of git-fsck-cache --full...
> and I've no idea _why_ it's complaining.

Ok, I've downloaded your objects, and it all looks fine. Nothing is 
missing.

So something is wrong with the git-fsck-cache handling of 
GIT_ALTERNATE_OBJECT_DIRECTORIES, but I don't see what. Other programs 
happily see the objects, git-fsck-cache for some reason does not, and thus 
complains. I'll try to figure it out.

However, the more I try to make "git-pack-objects" work with a partial
repository, the less happy I am about it. It works wonderfully well with
rsync:, since rsync just doesn't know that something is missing, but
generating the object list when there are objects missing is quite hard.

I can be trivial and say "missing objects aren't interesting", and it 
would _work_, but that just doesn't make me happy. So I'm almost getting 
ready to say "let's not do this thing after all".

> Could this be because cogito doesn't know how to handle this setup
> properly yet?  Have I just destroyed my git tree by trying to apply
> stuff to it?

This is definitely not a cogito problem, that fsck thing is in git itself. 

And no, you didn't destroy your tree - I just merged it, and the merged 
results look fine and fsck correctly (and I get the same diffstat you do). 
It's just a bug in fsck somewhere that makes it look bad.

That said, my inability to check the pack for completeness for a partial 
archive makes me think this partial rsync wasn't such a good idea after 
all. It _is_ convenient, though, so I'll have to think about the send-pack 
issues some more and see if I can resolve the difficulty without too much 
problems. And clearly I need to fix git-fsck-cache.

Anyway, I pushed out the merge, so don't worry about your tree. But let's 
hold off on this partial thing for a while, ok?

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10 20:03                                         ` Linus Torvalds
@ 2005-07-10 20:32                                           ` Russell King
  2005-07-10 21:40                                             ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Russell King @ 2005-07-10 20:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git

On Sun, Jul 10, 2005 at 01:03:30PM -0700, Linus Torvalds wrote:
> Anyway, I pushed out the merge, so don't worry about your tree. But let's 
> hold off on this partial thing for a while, ok?

Thanks, that's good news.  I was fearing having to reconstruct stuff.

Do you want me to re-populate linux-2.6-arm.git to be fully populated
or are you happy for it to just grow the new objects as they become
available?

-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10 20:32                                           ` Russell King
@ 2005-07-10 21:40                                             ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2005-07-10 21:40 UTC (permalink / raw)
  To: Russell King; +Cc: Junio C Hamano, Petr Baudis, git



On Sun, 10 Jul 2005, Russell King wrote:
>
> On Sun, Jul 10, 2005 at 01:03:30PM -0700, Linus Torvalds wrote:
> > Anyway, I pushed out the merge, so don't worry about your tree. But let's 
> > hold off on this partial thing for a while, ok?
> 
> Thanks, that's good news.  I was fearing having to reconstruct stuff.
> 
> Do you want me to re-populate linux-2.6-arm.git to be fully populated
> or are you happy for it to just grow the new objects as they become
> available?

We can try just letting it grow. That way I'll have more reason to try to 
make the partial-repo thing just work.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-09 21:58                     ` Russell King
  2005-07-09 22:29                       ` Russell King
@ 2005-07-10  8:09                       ` Russell King
  2005-07-10 14:59                         ` Petr Baudis
  1 sibling, 1 reply; 66+ messages in thread
From: Russell King @ 2005-07-10  8:09 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

On Sat, Jul 09, 2005 at 10:58:18PM +0100, Russell King wrote:
> $ mv .git/objects/pack/* .git/
> $ for i in .git/*.pack; do git-unpack-objects < $i; done
> Unpacking 55435 objects
> fatal: inflate returned -3

This morning's cg-update gave these new errors:

receiving file list ... done

wrote 86 bytes  read 192 bytes  556.00 bytes/sec
total size is 410  speedup is 1.47
Missing object of tag v2.6.11... different source (obsolete tag?)
Missing object of tag v2.6.11-tree... different source (obsolete tag?)
Missing object of tag v2.6.12... different source (obsolete tag?)
Missing object of tag v2.6.12-rc2... different source (obsolete tag?)
Missing object of tag v2.6.12-rc3... different source (obsolete tag?)
Missing object of tag v2.6.12-rc4... different source (obsolete tag?)
Missing object of tag v2.6.12-rc5... different source (obsolete tag?)
Missing object of tag v2.6.12-rc6... different source (obsolete tag?)
Missing object of tag v2.6.13-rc1... different source (obsolete tag?)
Missing object of tag v2.6.13-rc2... different source (obsolete tag?)

-- 
Russell King

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10  8:09                       ` Russell King
@ 2005-07-10 14:59                         ` Petr Baudis
  2005-07-11 20:30                           ` Chris Wright
  0 siblings, 1 reply; 66+ messages in thread
From: Petr Baudis @ 2005-07-10 14:59 UTC (permalink / raw)
  To: Russell King; +Cc: git

Dear diary, on Sun, Jul 10, 2005 at 10:09:14AM CEST, I got a letter
where Russell King <rmk@arm.linux.org.uk> told me that...
> On Sat, Jul 09, 2005 at 10:58:18PM +0100, Russell King wrote:
> > $ mv .git/objects/pack/* .git/
> > $ for i in .git/*.pack; do git-unpack-objects < $i; done
> > Unpacking 55435 objects
> > fatal: inflate returned -3
> 
> This morning's cg-update gave these new errors:
> 
> receiving file list ... done
> 
> wrote 86 bytes  read 192 bytes  556.00 bytes/sec
> total size is 410  speedup is 1.47
> Missing object of tag v2.6.11... different source (obsolete tag?)
> Missing object of tag v2.6.11-tree... different source (obsolete tag?)
> Missing object of tag v2.6.12... different source (obsolete tag?)
> Missing object of tag v2.6.12-rc2... different source (obsolete tag?)
> Missing object of tag v2.6.12-rc3... different source (obsolete tag?)
> Missing object of tag v2.6.12-rc4... different source (obsolete tag?)
> Missing object of tag v2.6.12-rc5... different source (obsolete tag?)
> Missing object of tag v2.6.12-rc6... different source (obsolete tag?)
> Missing object of tag v2.6.13-rc1... different source (obsolete tag?)
> Missing object of tag v2.6.13-rc2... different source (obsolete tag?)

Ok, cg-pull didn't quite handle this. I've fixed it so that it should
reasonably handle it now. Hopefully.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-10 14:59                         ` Petr Baudis
@ 2005-07-11 20:30                           ` Chris Wright
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wright @ 2005-07-11 20:30 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Russell King, git

* Petr Baudis (pasky@suse.cz) wrote:
> Ok, cg-pull didn't quite handle this. I've fixed it so that it should
> reasonably handle it now. Hopefully.

Is this plus the zero-sized fix worth making cogito-0.12-2 rpm release?

IOW, these two patches...

diff-tree 291ec0f2d2ce65e5ccb876b46d6468af49ddb82e (from 72347a233e6f3c176059a28f0817de6654ef29c7)
tree a1d3a4e01516f1d924c407a9e42a6df0d13b43b6
parent 72347a233e6f3c176059a28f0817de6654ef29c7
author Linus Torvalds <torvalds@g5.osdl.org> 1120608369 -0700
committer Linus Torvalds <torvalds@g5.osdl.org> 1120608369 -0700

    Don't special-case a zero-sized compression.
    
    zlib actually writes a header for that case, and while ignoring that
    header will get us the right data, it will also end up messing up our
    stream position.  So we actually want zlib to "uncompress" even an empty
    object.

diff --git a/unpack-objects.c b/unpack-objects.c
--- a/unpack-objects.c
+++ b/unpack-objects.c
@@ -55,8 +55,6 @@ static void *get_data(unsigned long size
 	z_stream stream;
 	void *buf = xmalloc(size);
 
-	if (!size)
-		return buf;
 	memset(&stream, 0, sizeof(stream));
 
 	stream.next_out = buf;
diff-tree 7b754d7f0800117cd97afa5e806e50c7fd16d8c1 (from a2503fd85e6bb7f25d134a5634a1d8efc93fee5f)
Author: Petr Baudis <pasky@suse.cz>
Date:   Sun Jul 10 16:59:28 2005 +0200

    Fix cg-pull to handle packed tags properly
    
    If the objects referenced by refs/tags/ are packed, it wouldn't detect
    them properly and instead try to refetch them, but they are likely to
    be packed on the other side as well and that makes them impossible to
    be fetched explicitly (which isn't a problem as long as they are the
    same branch).
    
    Also, the fetch failure message was confusing.
    
    Reported by Russel King.

diff --git a/cg-pull b/cg-pull
--- a/cg-pull
+++ b/cg-pull
@@ -294,13 +294,14 @@ $fetch -i -s -u -d "$uri/refs/tags" "$_g
 	for tag in *; do
 		[ "$tag" = "*" ] && break
 		tagid=$(cat $tag)
-		tagfile=objects/${tagid:0:2}/${tagid:2}
-		[ -s "../../$tagfile" ] && continue
+		GIT_DIR=../../../$_git git-cat-file -t "$tagid" >/dev/null 2>&1 && continue
 		echo -n "Missing object of tag $tag... "
+		# In case it's not in a packfile...
+		tagfile=objects/${tagid:0:2}/${tagid:2}
 		if $fetch -i -s "$uri/$tagfile" "../../$tagfile" 2>/dev/null >&2; then
 			echo "retrieved"
 		else
-			echo "different source (obsolete tag?)"
+			echo "unable to retrieve"
 		fi
 	done
 )

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-07 23:59               ` Linus Torvalds
  2005-07-08  0:09                 ` Tony Luck
@ 2005-07-08  0:09                 ` Linus Torvalds
  2005-07-08  8:14                   ` Petr Baudis
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2005-07-08  0:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: Petr Baudis, Junio C Hamano, git

On Thu, 7 Jul 2005, Linus Torvalds wrote:
> > 
> > cg-update from a local repo that contains packs is broken though :-(
> 
> Is this with cg-0.12? The most recent release should be happy with packs.

Ahh, I see it. It's because it uses "git-local-pull", and yes, 
git-local-pull does the old filename assumption. Right?

Ho humm.. That's a bug in local-pull.c, although I'm not sure how to fix
it best. One option is to just not use it (as in "use git-fetch-pack
instead"), and another is to use GIT_ALTERNATE_OBJECT_DIRECTORIES and just
pick up the files that way. Yet another one is to actually copy over (or
link) the pack-file, but that's likely the least preferable one.

The _simplest_ fix is to use git-fetch-pack. It doesn't give you the 
convenient hard-linking, though.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-08  0:09                 ` Linus Torvalds
@ 2005-07-08  8:14                   ` Petr Baudis
  2005-07-08 15:56                     ` Daniel Barkalow
  0 siblings, 1 reply; 66+ messages in thread
From: Petr Baudis @ 2005-07-08  8:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tony Luck, Junio C Hamano, Daniel Barkalow, git

Dear diary, on Fri, Jul 08, 2005 at 02:09:48AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Thu, 7 Jul 2005, Linus Torvalds wrote:
> > > 
> > > cg-update from a local repo that contains packs is broken though :-(
> > 
> > Is this with cg-0.12? The most recent release should be happy with packs.
> 
> Ahh, I see it. It's because it uses "git-local-pull", and yes, 
> git-local-pull does the old filename assumption. Right?
> 
> Ho humm.. That's a bug in local-pull.c, although I'm not sure how to fix
> it best.

It seems like the whole pull family is totally borked now, and I'm
getting desperate. Looks like this evening will be *pull.c fixing for
me.

Jul 04 Daniel Barkalow  [PATCH 0/2] Support for transferring pack files in git-ssh-*

is what brings some hope to my life, though. Daniel? Any chance we could
get the similar fixes for local-pull? (I didn't actually look at the
patch but briefly.) I'll try to review the ssh patchset ASAP - I still
prefer it much to the fetch-pack things since its protocol is actually
extensible.

> The _simplest_ fix is to use git-fetch-pack. It doesn't give you the 
> convenient hard-linking, though.

Hard-linking is an absolute must for local repositories (well, either
that for people who want safety, or symlinking for the rest who want
speed - I want to make that one possible in Cogito ASAP but it requires
some non-trivial changes to some of its assumptions).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-08  8:14                   ` Petr Baudis
@ 2005-07-08 15:56                     ` Daniel Barkalow
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Barkalow @ 2005-07-08 15:56 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, Tony Luck, Junio C Hamano, git

On Fri, 8 Jul 2005, Petr Baudis wrote:

> It seems like the whole pull family is totally borked now, and I'm
> getting desperate. Looks like this evening will be *pull.c fixing for
> me.
> 
> Jul 04 Daniel Barkalow  [PATCH 0/2] Support for transferring pack files in git-ssh-*
> 
> is what brings some hope to my life, though. Daniel? Any chance we could
> get the similar fixes for local-pull? (I didn't actually look at the
> patch but briefly.)

This patch is not actually for transferring objects which are in pack
files in the source, but for transferring a group of objects as a pack
file. It does, however, read the source side with git-pack-objects to
generate the content to send, so it would, I guess, fix the problem for
the case where it decides to use a pack to transfer.

The real fix is to go through the pull methods (local-pull and
ssh-pull; http-pull presumably won't be encountering pack files yet) and
make them do appropriate things with pack files.

One thing that is in the patch is a change to the comment, specifying
that fetch() could also get other objects in addition to the one
specified, if there's some reason to think this is a good idea; the fix
for local-pull is probably to link/symlink/copy the pack file if the
object is in one.

For ssh-pull, serve_object in ssh-push needs to be taught how to extract
an object from a pack file and send it.

However, there's a bug in pull.c, covering up a terrible performance
issue: it doesn't actually make sure you have all the parent of a commit
that you had when it checked (due to not having a way of caching the
result of checking this, which would require you to put the entire
repository through cache each time you pull). This would mean that, if you
have a pack that references something outside of it, you won't get
everything with my proposal above.

I should be able to spend some time on these issues over the weekend.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [ANNOUNCE] Cogito-0.12
  2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis
  2005-07-06 12:01 ` Brian Gerst
@ 2005-07-07  6:22 ` Chris Wright
  1 sibling, 0 replies; 66+ messages in thread
From: Chris Wright @ 2005-07-07  6:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

* Petr Baudis (pasky@suse.cz) wrote:
>   I'm happy to announce the release of the 0.12 version of the Cogito
> SCM-like layer over Linus' GIT tree history storage tool. Get it at
> 
> 	http://www.kernel.org/pub/software/scm/cogito/

RPMs uploading to:
	http://www.kernel.org/pub/software/scm/cogito/RPMS

thanks,
-chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2005-07-12  4:48 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis
2005-07-06 12:01 ` Brian Gerst
2005-07-07 14:45   ` Petr Baudis
2005-07-07 17:21     ` Junio C Hamano
2005-07-07 19:04       ` Linus Torvalds
2005-07-07 19:57         ` Junio C Hamano
2005-07-07 21:58           ` Linus Torvalds
2005-07-07 22:10             ` Junio C Hamano
2005-07-07 20:00         ` Junio C Hamano
2005-07-07 21:29         ` Eric W. Biederman
2005-07-07 22:23           ` Linus Torvalds
2005-07-08  2:11             ` Eric W. Biederman
2005-07-08  1:54           ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith
2005-07-08  2:27             ` Linus Torvalds
2005-07-07 22:14         ` [ANNOUNCE] Cogito-0.12 Petr Baudis
2005-07-07 22:52           ` Linus Torvalds
2005-07-07 23:16             ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
2005-07-07 23:50               ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano
2005-07-07 23:58                 ` Linus Torvalds
2005-07-08  1:02                   ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
2005-07-08  1:33                     ` Linus Torvalds
2005-07-08  1:46                     ` Linus Torvalds
2005-07-08  2:17                       ` Junio C Hamano
2005-07-08  2:39                         ` Linus Torvalds
2005-07-09 21:09                           ` Eric W. Biederman
2005-07-10  5:11                             ` Linus Torvalds
2005-07-10  6:28                               ` Junio C Hamano
2005-07-10 21:48                             ` Sven Verdoolaege
2005-07-10 22:36                             ` Linus Torvalds
2005-07-11 15:19                               ` Eric W. Biederman
2005-07-11 16:38                                 ` Linus Torvalds
2005-07-12  0:44                                   ` Eric W. Biederman
2005-07-12  1:14                                     ` Linus Torvalds
2005-07-12  2:38                                       ` Eric W. Biederman
2005-07-12  3:21                                         ` Linus Torvalds
2005-07-12  3:39                                           ` Eric W. Biederman
2005-07-12  4:48                                             ` Linus Torvalds
2005-07-11 17:53                                 ` Linus Torvalds
     [not found]                           ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>
     [not found]                             ` <Pine.LNX.4.58.0507082109140.17536@g5.osdl.org>
     [not found]                               ` <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net>
2005-07-11  7:00                                 ` [PATCH] Check packs and then files Junio C Hamano
2005-07-08  1:03                   ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano
2005-07-07 23:50               ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano
2005-07-07 23:52             ` [ANNOUNCE] Cogito-0.12 Tony Luck
2005-07-07 23:54               ` Junio C Hamano
2005-07-07 23:59               ` Linus Torvalds
2005-07-08  0:09                 ` Tony Luck
2005-07-08  0:23                   ` Linus Torvalds
2005-07-09 21:58                     ` Russell King
2005-07-09 22:29                       ` Russell King
2005-07-09 23:46                         ` Junio C Hamano
2005-07-10  5:02                           ` Linus Torvalds
2005-07-10  5:15                             ` Linus Torvalds
2005-07-10  6:55                               ` Russell King
2005-07-10  7:15                                 ` Junio C Hamano
2005-07-10 12:46                                   ` Russell King
2005-07-10 16:51                                     ` Linus Torvalds
2005-07-10 19:15                                       ` Russell King
2005-07-10 20:03                                         ` Linus Torvalds
2005-07-10 20:32                                           ` Russell King
2005-07-10 21:40                                             ` Linus Torvalds
2005-07-10  8:09                       ` Russell King
2005-07-10 14:59                         ` Petr Baudis
2005-07-11 20:30                           ` Chris Wright
2005-07-08  0:09                 ` Linus Torvalds
2005-07-08  8:14                   ` Petr Baudis
2005-07-08 15:56                     ` Daniel Barkalow
2005-07-07  6:22 ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).