git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible --remove-empty bug
@ 2006-03-12 14:12 Marco Costalba
  2006-03-12 21:31 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Marco Costalba @ 2006-03-12 14:12 UTC (permalink / raw)
  To: junkio; +Cc: git

>From today git:

$ git-rev-parse HEAD
be767c91724275c4534965c0d25c452b76057602
$ git-rev-list be767c91724275c4534965c0d25c452b76057602 -- imap-send.c
f2561fda364ad984ef1441a80c90b0ee04f1a7c4
$ git-rev-list --remove-empty be767c91724275c4534965c0d25c452b76057602
-- imap-send.c
$

>From git-rev-list documentation:

--remove-empty::
	Stop when a given path disappears from the tree.

But isn't it to be intended *after* a path disapperas from the tree?
In this case I would expect to see revision
f2561fda364ad984ef1441a80c90b0ee04f1a7c4 also with --remove-empty
option.

BTW rev f2561fda364ad984ef1441a80c90b0ee04f1a7c4 is the 'Add
git-imap-send, derived from isync 1.0.1.' patch.

Thanks
Marco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-12 14:12 Possible --remove-empty bug Marco Costalba
@ 2006-03-12 21:31 ` Junio C Hamano
  2006-03-12 22:54   ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-03-12 21:31 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git, Linus Torvalds

"Marco Costalba" <mcostalba@gmail.com> writes:

>>>From git-rev-list documentation:
>
> --remove-empty::
> 	Stop when a given path disappears from the tree.
>
> But isn't it to be intended *after* a path disapperas from the tree?

To be honest, I do not know how --remove-empty is intended to
work.  What revision traversal code does and what the above says
are different.

The traversal code goes like this:

	* Start from given commits (both interesting and
          uninteresting), look at still-to-be-procesed commit
          one by one, by calling add_parents_to_list().

          * add_parents_to_list() grows still-to-be-processed
            list; if the current commit is uninteresting, mark its
            parents also uninteresting, and if no interesting
            commit remains in the still-to-be-processed list, we
            are done.  On the other hand, if the current commit is
            interesting, place it to the list of results.

        * After the above traversal is done, the consumer calls
	  get_revision() to retrieve commits from the list of
	  results one-by-one.  We return only interesting ones.

And in add_parents_to_list()

	* if the commit is interesting, and when we are limiting
          by paths, we call try_to_simplify_commit().  This
          checks if the tree associated with the current commit
          is the same as one of its parents' with respect to
          specified paths, and if so pretend that the current
          commit has only that parent and no other.  This can
          make a merge commit to lose other parents that we do
          not inherit the specified paths from.

        * try_to_simplify_commit() looks at each parent, and:

          - if we find a parent that has the same tree (wrt the
            paths we are interested in), we pretend it is the
            sole parent of this commit.

	  - if we find a parent that does not have any of the
            specified paths, we pretend we do not have that
            parent under --remove-empty.

	  - otherwise we do not munge the list of parents.

My understanding of what the code is doing from the above
reading is to lose that empty parent, and it does not have much
to do with stop traversing the ancestry chain at such commit.  I
am not sure that is what was intended...

Maybe something like this is closer to what the documentation
says.

-- >8 --
diff --git a/revision.c b/revision.c
index c8d93ff..03085ff 100644
--- a/revision.c
+++ b/revision.c
@@ -315,9 +315,14 @@ static void try_to_simplify_commit(struc
 			return;
 
 		case TREE_NEW:
-			if (revs->remove_empty_trees && same_tree_as_empty(p->tree)) {
-				*pp = parent->next;
-				continue;
+			if (revs->remove_empty_trees &&
+			    same_tree_as_empty(p->tree)) {
+				/* We are adding all the specified paths from
+				 * this parent, so the parents of it is
+				 * not interesting, but the difference between
+				 * this parent and us still is interesting.
+				 */
+				p->object.flags |= UNINTERESTING;
 			}
 		/* fallthrough */
 		case TREE_DIFFERENT:

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-12 21:31 ` Junio C Hamano
@ 2006-03-12 22:54   ` Linus Torvalds
  2006-03-13  1:08     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2006-03-12 22:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git



On Sun, 12 Mar 2006, Junio C Hamano wrote:
> 
> To be honest, I do not know how --remove-empty is intended to
> work.

It's supposed to stop traversing the tree once a pathname disappears.

> Maybe something like this is closer to what the documentation
> says.

If it is, then the documentation is broken.

The fact that a pathname disappears does _not_ make the commit 
uninteresting. It just means that we should stop traversing that parent.

"uninteresting" has a big side effect: it inherits to parents. So if you 
have

	    a
	   / \
	  b   c
	   \ /
	    d

where the pathname disappeared in "b", you must NOT mark it uninteresting, 
because that would mean that "d" is also uninteresting.

There's a huge difference between saying "I will not traverse down this 
line any more" and "I mark this commit uninteresting". The first one just 
stops adding commits to the commit list (but parents deeper down might 
still be interesting because they are also reached through another 
pathway). The second says "this commit and all of its ancestors are 
deemed worthless".

The "path goes away" case is meant to just stop traversal, not mark all 
parents worthless.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-12 22:54   ` Linus Torvalds
@ 2006-03-13  1:08     ` Junio C Hamano
  2006-03-13  5:41       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-03-13  1:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 12 Mar 2006, Junio C Hamano wrote:
>> 
>> To be honest, I do not know how --remove-empty is intended to
>> work.
>
> It's supposed to stop traversing the tree once a pathname disappears.

Then what we should simplify is the parent commit that does not
have those pathnames (i.e. remove parents from that parent
commit).  In other words, currently the code removes the parent
commit that makes the tree disappear, but we would want to keep
that parent, remove the grandparents, and then mark the parent
uninteresting.

-- >8 --
[PATCH] revision traversal: --remove-empty fix (take #2).

Marco Costalba reports that --remove-empty omits the commit that
created paths we are interested in.  try_to_simplify_commit()
logic was dropping a parent we introduced those paths against,
which I think is not what we meant.  Instead, this makes such
parent parentless.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

diff --git a/revision.c b/revision.c
index c8d93ff..73fba5d 100644
--- a/revision.c
+++ b/revision.c
@@ -315,9 +315,18 @@ static void try_to_simplify_commit(struc
 			return;
 
 		case TREE_NEW:
-			if (revs->remove_empty_trees && same_tree_as_empty(p->tree)) {
-				*pp = parent->next;
-				continue;
+			if (revs->remove_empty_trees &&
+			    same_tree_as_empty(p->tree)) {
+				/* We are adding all the specified
+				 * paths from this parent, so the
+				 * history beyond this parent is not
+				 * interesting.  Remove its parents
+				 * (they are grandparents for us).
+				 * IOW, we pretend this parent is a
+				 * "root" commit.
+				 */
+				parse_commit(p);
+				p->parents = NULL;
 			}
 		/* fallthrough */
 		case TREE_DIFFERENT:

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-13  1:08     ` Junio C Hamano
@ 2006-03-13  5:41       ` Junio C Hamano
  2006-03-13 19:03         ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-03-13  5:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marco Costalba, git

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@osdl.org> writes:
>
>> It's supposed to stop traversing the tree once a pathname disappears.
>
> Then what we should simplify is the parent commit that does not
> have those pathnames (i.e. remove parents from that parent
> commit).  In other words, currently the code removes the parent
> commit that makes the tree disappear, but we would want to keep
> that parent, remove the grandparents, and then mark the parent
> uninteresting.

Sorry, the last clause in the above comment is wrong, and does
not describe what the code attached does.

It removes the grandparents from the parent, and leaves the
parent still interesting.  As a result, in your example:

    ... if you have

                a
               / \
              b   c
               \ /
                d

    where the pathname disappeared in "b"...

we would get this world view:

                a
               / \
              b   c
                 /
                d

because when inspecting a and finding that "b" does not have any
paths we are interested in, b loses all of its parents.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-13  5:41       ` Junio C Hamano
@ 2006-03-13 19:03         ` Linus Torvalds
  2006-03-17 10:57           ` Marco Costalba
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2006-03-13 19:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git



On Sun, 12 Mar 2006, Junio C Hamano wrote:
> 
> It removes the grandparents from the parent, and leaves the
> parent still interesting.  As a result, in your example:
> 
>     ... if you have
> 
>                 a
>                / \
>               b   c
>                \ /
>                 d
> 
>     where the pathname disappeared in "b"...
> 
> we would get this world view:
> 
>                 a
>                / \
>               b   c
>                  /
>                 d

Yeah, that's correct. That way you still see all the history that is 
relevant to the tree that became empty.

However, to be honest, the only reason to ever use --remove-empty is for 
rename detection, and Frederik's approach of doing that through the 
library interface directly is actually a much superior option. So we might 
as well drop the compilcation of --remove-empty entirely, unless somebody 
has already started using it.

The _real_ optimization would be to make the pathname based pruning be 
done incrementally instead of having to build up the whole tree. That 
would be much more important than the --remove-empty stuff from a 
usability standpoint. I'm absolutely sure it's possible, and I simplified 
the code earlier so that it should be simpler to do, but every time I 
actually look at the code I get confused again.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-13 19:03         ` Linus Torvalds
@ 2006-03-17 10:57           ` Marco Costalba
  2006-03-18  6:40             ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Marco Costalba @ 2006-03-17 10:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On 3/13/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> However, to be honest, the only reason to ever use --remove-empty is for
> rename detection, and Frederik's approach of doing that through the
> library interface directly is actually a much superior option. So we might
> as well drop the compilcation of --remove-empty entirely, unless somebody
> has already started using it.
>

In case of a rather recent file --remove-empty option gives a good
speed up in history loading with git-rev-list. So qgit uses that
option.

Annotation in qgit it is little different from blame algorithm because
it works from oldest revision to latest instead of the contrary. This
is due because it calculates in one pass _all_ the annotations of  all
the different revisions of a given file, starting from file history,
i.e. 'git-rev-list file_path' output.

The GUI interface of qgit let's the user quickly jump between two file
revisions. So the corresponding annotations should be already prepared
to make it snappy.


> The _real_ optimization would be to make the pathname based pruning be
> done incrementally instead of having to build up the whole tree.

Anything that speeds-up file git-rev-list history loading surely gives
a big boost to all annotation/blame stuff. I have made some profiling
on qgit (the public git repo version that is much faster, not the
qgit1.1 released one) and I found that 50% to 70% of the time is spent
on git-rev-list, of the remaining the 80% is spent on git-diff-tree -p
patch loading and only a small amount is spent on actually calculate
the annotation.

Of course these numbers can vary a little according to file/repo, but
the big picture stays the same.


Marco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-17 10:57           ` Marco Costalba
@ 2006-03-18  6:40             ` Junio C Hamano
  2006-03-18  7:36               ` Marco Costalba
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-03-18  6:40 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git, Linus Torvalds

"Marco Costalba" <mcostalba@gmail.com> writes:

> On 3/13/06, Linus Torvalds <torvalds@osdl.org> wrote:
>>
>> However, to be honest, the only reason to ever use --remove-empty is for
>> rename detection, and Frederik's approach of doing that through the
>> library interface directly is actually a much superior option. So we might
>> as well drop the compilcation of --remove-empty entirely, unless somebody
>> has already started using it.
>
> In case of a rather recent file --remove-empty option gives a good
> speed up in history loading with git-rev-list. So qgit uses that
> option.

So you _do_ use it, and I think I still have that remove-empty
stuff held back in "next" branch.  Should I unleash it?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible --remove-empty bug
  2006-03-18  6:40             ` Junio C Hamano
@ 2006-03-18  7:36               ` Marco Costalba
  0 siblings, 0 replies; 9+ messages in thread
From: Marco Costalba @ 2006-03-18  7:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On 3/18/06, Junio C Hamano <junkio@cox.net> wrote:
> "Marco Costalba" <mcostalba@gmail.com> writes:
>
> >
> > In case of a rather recent file --remove-empty option gives a good
> > speed up in history loading with git-rev-list. So qgit uses that
> > option.
>
> So you _do_ use it, and I think I still have that remove-empty
> stuff held back in "next" branch.  Should I unleash it?
>
>

Yes please.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-03-18  7:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-12 14:12 Possible --remove-empty bug Marco Costalba
2006-03-12 21:31 ` Junio C Hamano
2006-03-12 22:54   ` Linus Torvalds
2006-03-13  1:08     ` Junio C Hamano
2006-03-13  5:41       ` Junio C Hamano
2006-03-13 19:03         ` Linus Torvalds
2006-03-17 10:57           ` Marco Costalba
2006-03-18  6:40             ` Junio C Hamano
2006-03-18  7:36               ` Marco Costalba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).