git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] http clone does not checkout working tree
@ 2008-06-04 18:38 Jeff King
  2008-06-04 18:59 ` Linus Torvalds
  2008-06-04 19:02 ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2008-06-04 18:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, git

If I try cloning a repository by http with the current 'master', I end
up with a proper git directory, but nothing gets checked out in the
working tree [I see the problem with any repository; I randomly picked
the one in the example below because it is small, and http clone can be
painfully slow]:

  $ git clone http://repo.or.cz/r/bigint.git
  [...]
  $ cd bigint
  $ ls
  $

I think the problem is that the builtin-clone now does the http commit
walking and the tree unpacking in the same process, and the commit
walker leaves the in-core objects in a funny state.

Specifically, the top-level tree object has its "parsed" flag set to 1,
but the buffer and size members are set to NULL and 0, respectively.
This happens in walker.c:process_tree, where we free the tree buffer
after walking it. It seems like we are violating the meaning of "parsed"
here, which implies that those other members are valid.

The following patch fixes it for me, but I really have no idea if there
isn't something more subtle at work. Sending to Linus, since "git blame"
points the surrounding code to you, and to Daniel, since the new clone
and the commit walker are your areas.

---
diff --git a/walker.c b/walker.c
index 31de6c1..0e68ee6 100644
--- a/walker.c
+++ b/walker.c
@@ -59,6 +59,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
 	free(tree->buffer);
 	tree->buffer = NULL;
 	tree->size = 0;
+	tree->object.parsed = 0;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] http clone does not checkout working tree
  2008-06-04 18:38 [RFC] http clone does not checkout working tree Jeff King
@ 2008-06-04 18:59 ` Linus Torvalds
  2008-06-04 19:30   ` Linus Torvalds
  2008-06-04 20:10   ` Daniel Barkalow
  2008-06-04 19:02 ` Junio C Hamano
  1 sibling, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2008-06-04 18:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Barkalow, git



On Wed, 4 Jun 2008, Jeff King wrote:
> 
> The following patch fixes it for me, but I really have no idea if there
> isn't something more subtle at work. Sending to Linus, since "git blame"
> points the surrounding code to you, and to Daniel, since the new clone
> and the commit walker are your areas.

Ack. This is correct.

That said, a *lot* of code does this pattern (freeing the tree buffer 
after use, without marking the tree non-parsed), and I suspect the only 
reason I'm blamed is because this got copied from some other use of that 
same model. 

Normally it's fine, because the whole object use is temporary, but as you 
point out, doing things in the same process will re-use old object info. 
It's one of the subtler implications of doing built-ins without fork/exec.

It is possible that we should clean this up by adding some kind of

	static void forget_tree(struct tree *tree)
	{
		free(tree->buffer);
		tree->buffer = NULL;
		tree->size = 0;
		tree->parsed = 0;
	}

to make this more robust and obvious. That said, a lot of the users are 
basically the type

	if (parse_tree(tree) < 0)
		die(..);
	init_tree_desc(&desc, tree->buffer, tree->size);
	while (tree_entry(&desc, &entry)) {
		...
	}
	forget_tree();

and quite frankly, it's rather possible that we should get rid of the 
"void *buffer" and "unsigned long size" in the tree *entirely*, because 
the above would likely be better written as

	buffer = read_tree_desc(&desc);
	while (tree_entry(&desc, &entry)) {
		...
	}
	free(buffer);

and make "struct tree" smaller, and not ever need parsing at all!

I think that realisitcially, all tree users are of that format, and nobody 
really wants to save the buffer (because buffer re-use is fairly unlikely, 
an re-generating it isn't all that expensive).

But that would be a much bigger patch. And maybe I'm wrong, and some uses 
really do want the longer-term caching because they end up re-using the 
tree a lot. So it would need more thinking about.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] http clone does not checkout working tree
  2008-06-04 18:38 [RFC] http clone does not checkout working tree Jeff King
  2008-06-04 18:59 ` Linus Torvalds
@ 2008-06-04 19:02 ` Junio C Hamano
  2008-06-04 19:16   ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-06-04 19:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Daniel Barkalow, git

Jeff King <peff@peff.net> writes:

> The following patch fixes it for me, but I really have no idea if there
> isn't something more subtle at work. Sending to Linus, since "git blame"
> points the surrounding code to you, and to Daniel, since the new clone
> and the commit walker are your areas.
>
> ---
> diff --git a/walker.c b/walker.c
> index 31de6c1..0e68ee6 100644
> --- a/walker.c
> +++ b/walker.c
> @@ -59,6 +59,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
>  	free(tree->buffer);
>  	tree->buffer = NULL;
>  	tree->size = 0;
> +	tree->object.parsed = 0;
>  	return 0;
>  }
>  

The patch looks good to me.

And blaming Linus for this is slightly unfair, as the context the original
"process_tree()" taken out of was that tree object was used once and then
never used after this codepath is done with it, even though he _could_
have a perfect foresight to anticipate that someday somebody might want to
call the routine from elsewhere without understanding the implications.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] http clone does not checkout working tree
  2008-06-04 19:02 ` Junio C Hamano
@ 2008-06-04 19:16   ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2008-06-04 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Daniel Barkalow, git

On Wed, Jun 04, 2008 at 12:02:26PM -0700, Junio C Hamano wrote:

> And blaming Linus for this is slightly unfair, as the context the original

Right. I didn't mean to blame Linus so much as I had the notion that he
would have some clue that there wasn't something trickier going on.

Maybe I should have said "git annotate told me that..." ;)

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] http clone does not checkout working tree
  2008-06-04 18:59 ` Linus Torvalds
@ 2008-06-04 19:30   ` Linus Torvalds
  2008-06-04 19:59     ` Daniel Barkalow
  2008-06-04 20:10   ` Daniel Barkalow
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-06-04 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Barkalow, git



On Wed, 4 Jun 2008, Linus Torvalds wrote:
> 
> and quite frankly, it's rather possible that we should get rid of the 
> "void *buffer" and "unsigned long size" in the tree *entirely*, because 
> the above would likely be better written as
..

Side note: the actual historical context here is that "parse_tree()" used 
to create that "tree_entry_list" of all the entries in the tree. So we 
used to do things like

	struct tree_entry_list *list;

	if (parse_tree(tree))
		die(..)
	list = tree->entries;
	while (list) {
		...

so "parse_tree()" was something much bigger (and generated much slower and 
less dense data structures).

These days, parse_tree() basically just reads the object buffer and 
length. So it boils down to just caching the result of "read_sha1_file()", 
but we have all those legacy uses that come from the old historical thing. 
And to some degree it may have made sense to drop the buffer, but keep the 
actual list of entries in that old model.

See commit 2d9c58c69d1bab601e67b036d0546e85abcee7eb.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] http clone does not checkout working tree
  2008-06-04 19:30   ` Linus Torvalds
@ 2008-06-04 19:59     ` Daniel Barkalow
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Barkalow @ 2008-06-04 19:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, git

On Wed, 4 Jun 2008, Linus Torvalds wrote:

> On Wed, 4 Jun 2008, Linus Torvalds wrote:
> > 
> > and quite frankly, it's rather possible that we should get rid of the 
> > "void *buffer" and "unsigned long size" in the tree *entirely*, because 
> > the above would likely be better written as
> ..
> 
> Side note: the actual historical context here is that "parse_tree()" used 
> to create that "tree_entry_list" of all the entries in the tree. So we 
> used to do things like
> 
> 	struct tree_entry_list *list;
> 
> 	if (parse_tree(tree))
> 		die(..)
> 	list = tree->entries;
> 	while (list) {
> 		...
> 
> so "parse_tree()" was something much bigger (and generated much slower and 
> less dense data structures).
> 
> These days, parse_tree() basically just reads the object buffer and 
> length. So it boils down to just caching the result of "read_sha1_file()", 
> but we have all those legacy uses that come from the old historical thing. 
> And to some degree it may have made sense to drop the buffer, but keep the 
> actual list of entries in that old model.
> 
> See commit 2d9c58c69d1bab601e67b036d0546e85abcee7eb.

I think the lineage of walker.c is to the version in fetch.c, which, in 
that commit, you added the same "drop the buffer" optimization from 
rev-list. If the old style ever dropped the buffer, it would have been in 
parse_tree_buffer() and you'd have had to remove it to get the new style 
to work at all.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] http clone does not checkout working tree
  2008-06-04 18:59 ` Linus Torvalds
  2008-06-04 19:30   ` Linus Torvalds
@ 2008-06-04 20:10   ` Daniel Barkalow
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Barkalow @ 2008-06-04 20:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, git

On Wed, 4 Jun 2008, Linus Torvalds wrote:

> On Wed, 4 Jun 2008, Jeff King wrote:
> > 
> > The following patch fixes it for me, but I really have no idea if there
> > isn't something more subtle at work. Sending to Linus, since "git blame"
> > points the surrounding code to you, and to Daniel, since the new clone
> > and the commit walker are your areas.
> 
> Ack. This is correct.
> 
> That said, a *lot* of code does this pattern (freeing the tree buffer 
> after use, without marking the tree non-parsed), and I suspect the only 
> reason I'm blamed is because this got copied from some other use of that 
> same model. 
> 
> Normally it's fine, because the whole object use is temporary, but as you 
> point out, doing things in the same process will re-use old object info. 
> It's one of the subtler implications of doing built-ins without fork/exec.
> 
> It is possible that we should clean this up by adding some kind of
> 
> 	static void forget_tree(struct tree *tree)
> 	{
> 		free(tree->buffer);
> 		tree->buffer = NULL;
> 		tree->size = 0;
> 		tree->parsed = 0;
> 	}
> 
> to make this more robust and obvious. That said, a lot of the users are 
> basically the type
> 
> 	if (parse_tree(tree) < 0)
> 		die(..);
> 	init_tree_desc(&desc, tree->buffer, tree->size);
> 	while (tree_entry(&desc, &entry)) {
> 		...
> 	}
> 	forget_tree();
> 
> and quite frankly, it's rather possible that we should get rid of the 
> "void *buffer" and "unsigned long size" in the tree *entirely*, because 
> the above would likely be better written as
> 
> 	buffer = read_tree_desc(&desc);

read_tree_desc(&desc, tree), which would use the hash in the tree object, 
which wouldn't need to be parse, I assume.

> 	while (tree_entry(&desc, &entry)) {
> 		...
> 	}
> 	free(buffer);
> 
> and make "struct tree" smaller, and not ever need parsing at all!
> 
> I think that realisitcially, all tree users are of that format, and nobody 
> really wants to save the buffer (because buffer re-use is fairly unlikely, 
> an re-generating it isn't all that expensive).
>
> But that would be a much bigger patch. And maybe I'm wrong, and some uses 
> really do want the longer-term caching because they end up re-using the 
> tree a lot. So it would need more thinking about.

I think that your current unpack_trees() benefits from the fact that, if 
some but not all of the trees are the same, you only uncompress the tree 
objects once and store one copy of the resulting buffer. So, I don't think 
long-term caching is worthwhile, but sharing between concurrent users 
might be.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-06-04 20:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 18:38 [RFC] http clone does not checkout working tree Jeff King
2008-06-04 18:59 ` Linus Torvalds
2008-06-04 19:30   ` Linus Torvalds
2008-06-04 19:59     ` Daniel Barkalow
2008-06-04 20:10   ` Daniel Barkalow
2008-06-04 19:02 ` Junio C Hamano
2008-06-04 19:16   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).