git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG?] git log picks up bad commit
@ 2008-02-02 12:21 Tilman Sauerbeck
  2008-02-03  3:00 ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Tilman Sauerbeck @ 2008-02-02 12:21 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

Hi,
I think I either found a bug in git log, or I'm working with a broken
repository. I can reproduce this with current git master.

I'm trying to list the last n commits since a given commit on a given
branch like this:
  git log -n N commit.. branch
The problem is, if there are less than N commits that match the
criteria, git log also prints the very first commit of the repository.

I'm operating on a bare repository here. When I actually check out the
branch I'm interested in, git log behaves as expected.

I've uploaded the .git directory to
http://crux.nu/~tilman/broken_repo.tar.bz2 (use -C to extract!)

Reproduce like this:
mkdir /tmp/blah
cd /tmp/blah
tar xjf broken_repo.tar.bz2

git log -n 3 --abbrev-commit --pretty=oneline \
1dd567d596b072e3ce44ea5ad8c373871686b078.. 2.4

The output I'm getting is:

47f585a... syslinux: Updated 3.54 -> 3.60
b3444e1... lzma: 4.32.4 -> 4.32.5
d5d6fa1... Created repository

When I check out the "2.4" branch and run the git log command again, I
get the expected output:

47f585a... syslinux: Updated 3.54 -> 3.60
b3444e1... lzma: 4.32.4 -> 4.32.5

Any idea on what's going on there?

Thanks,
Tilman

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [BUG?] git log picks up bad commit
  2008-02-02 12:21 [BUG?] git log picks up bad commit Tilman Sauerbeck
@ 2008-02-03  3:00 ` Jeff King
  2008-02-03  4:33   ` [RFH] revision limiting sometimes ignored Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2008-02-03  3:00 UTC (permalink / raw)
  To: git

On Sat, Feb 02, 2008 at 01:21:36PM +0100, Tilman Sauerbeck wrote:

> I think I either found a bug in git log, or I'm working with a broken
> repository. I can reproduce this with current git master.

I think it is a bug in your command line.

> git log -n 3 --abbrev-commit --pretty=oneline \
> 1dd567d596b072e3ce44ea5ad8c373871686b078.. 2.4

The space betwen ".." and "2.4" means that they are two separate
arguments. Thus the second part of your ".." operator is blank, which is
treated as HEAD. Thus it is equivalent to:

  1dd567d596b072e3ce44ea5ad8c373871686b078..HEAD 2.4

When you switch to branch 2.4, then 2.4 becomes your HEAD.

That being said, the commit in your 'master' branch _is_ part of
1dd567d5, and should be culled. So I'm not clear on why it shows up only
when you ask to see both branches, and that may be a bug.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFH] revision limiting sometimes ignored
  2008-02-03  3:00 ` Jeff King
@ 2008-02-03  4:33   ` Jeff King
  2008-02-03  6:24     ` Junio C Hamano
                       ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Jeff King @ 2008-02-03  4:33 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

On Sat, Feb 02, 2008 at 10:00:54PM -0500, Jeff King wrote:

> That being said, the commit in your 'master' branch _is_ part of
> 1dd567d5, and should be culled. So I'm not clear on why it shows up only
> when you ask to see both branches, and that may be a bug.

OK, there is definitely a bug here, but I'm having some trouble figuring
out the correct fix. It's in the revision walker, so I have cc'd those
who are more clueful than I.

You can recreate a problematic repo using this script:

-- >8 --
mkdir repo && cd repo
git init

touch file && git add file
commit() {
  echo $1 >file && git commit -a -m $1 && git tag $1
}

commit one
commit two
commit three
git checkout -b other two
commit alt-three
git checkout master
git merge other || true
commit merged
commit four
-- 8< --

So a fairly simple repo, but with the key element that it contains a
merge. Now try this:

  git log one --not four

You get the 'one' commit, even though it should be removed by "--not
four". But if you try this:

  git log one --not two

you correctly get no output.

It seems that in limit_list, we do two things:
  - first add the 'one' commit to the new list (since we process it
    before it gets marked uninteresting)
  - then traverse from 'four', marking commits and their parents as
    uninteresting as we go

However, the traversal seems to have trouble going over the merge. We
add the parents, but we end up marking them all as uninteresting, and
the everybody_uninteresting() optimization triggers, quitting the limit
before we have a chance to reach back to 'one' and mark it. The patch
below fixes it, but I'm very uncertain whether there is something else
going on that I'm missing that should be handling this case.

---
diff --git a/revision.c b/revision.c
index 6e85aaa..7d91ca1 100644
--- a/revision.c
+++ b/revision.c
@@ -579,8 +579,6 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			if (everybody_uninteresting(list))
-				break;
 			continue;
 		}
 		if (revs->min_age != -1 && (commit->date > revs->min_age))

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  4:33   ` [RFH] revision limiting sometimes ignored Jeff King
@ 2008-02-03  6:24     ` Junio C Hamano
  2008-02-03  6:39     ` Junio C Hamano
  2008-02-04 17:32     ` Linus Torvalds
  2 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2008-02-03  6:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> On Sat, Feb 02, 2008 at 10:00:54PM -0500, Jeff King wrote:
>
>> That being said, the commit in your 'master' branch _is_ part of
>> 1dd567d5, and should be culled. So I'm not clear on why it shows up only
>> when you ask to see both branches, and that may be a bug.
>
> OK, there is definitely a bug here, but I'm having some trouble figuring
> out the correct fix. It's in the revision walker, so I have cc'd those
> who are more clueful than I.

I am officially in semi-vacation-post-release mode, but I
noticed this is reproducible even with a prehistoric git
(v1.0.0).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  4:33   ` [RFH] revision limiting sometimes ignored Jeff King
  2008-02-03  6:24     ` Junio C Hamano
@ 2008-02-03  6:39     ` Junio C Hamano
  2008-02-03  7:13       ` Jeff King
  2008-02-04 17:32     ` Linus Torvalds
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-03  6:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> On Sat, Feb 02, 2008 at 10:00:54PM -0500, Jeff King wrote:
>
>> That being said, the commit in your 'master' branch _is_ part of
>> 1dd567d5, and should be culled. So I'm not clear on why it shows up only
>> when you ask to see both branches, and that may be a bug.
>
> OK, there is definitely a bug here, but I'm having some trouble figuring
> out the correct fix. It's in the revision walker, so I have cc'd those
> who are more clueful than I.
>
> You can recreate a problematic repo using this script:
>
> -- >8 --
> mkdir repo && cd repo
> git init
>
> touch file && git add file
> commit() {
>   echo $1 >file && git commit -a -m $1 && git tag $1
> }
>
> commit one
> commit two
> commit three
> git checkout -b other two
> commit alt-three
> git checkout master
> git merge other || true
> commit merged
> commit four
> -- 8< --
>
> So a fairly simple repo, but with the key element that it contains a
> merge.

It is not so simple, it appears.  If I add for reproducibility
"test_tick" like this:

        commit () {
                test_tick &&
                echo $1 >file &&
                git commit -a -m $1 &&
                git tag $1
        }

the problem goes away.

So there is some interaction with "insert_by_date()".  Still
digging.

 t/t6009-rev-list-parent.sh |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/t/t6009-rev-list-parent.sh b/t/t6009-rev-list-parent.sh
new file mode 100755
index 0000000..66164e9
--- /dev/null
+++ b/t/t6009-rev-list-parent.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='properly cull all ancestors'
+
+. ./test-lib.sh
+
+commit () {
+	: test_tick &&
+	echo $1 >file &&
+	git commit -a -m $1 &&
+	git tag $1
+}
+
+test_expect_success setup '
+
+	touch file &&
+	git add file &&
+
+	commit one &&
+	commit two &&
+	commit three &&
+
+	git checkout -b other two &&
+	commit alt-three &&
+
+	git checkout master &&
+
+	git merge -s ours other &&
+
+	commit merged &&
+	commit four &&
+
+	git -p show-branch --more=999
+
+'
+
+test_expect_failure 'one is ancestor of others and should not be shown' '
+
+	git rev-list one --not four >result &&
+	>expect &&
+	diff -u expect result 
+
+'

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  6:39     ` Junio C Hamano
@ 2008-02-03  7:13       ` Jeff King
  2008-02-03  7:18         ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2008-02-03  7:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Sat, Feb 02, 2008 at 10:39:18PM -0800, Junio C Hamano wrote:

> It is not so simple, it appears.  If I add for reproducibility
> "test_tick" like this:
> 
>         commit () {
>                 test_tick &&
>                 echo $1 >file &&
>                 git commit -a -m $1 &&
>                 git tag $1
>         }

Ah. I think what is happening is something like this:

  - when we add 'four' as uninteresting, we mark its parents as
    uninteresting in handle_commit
  - we don't recursively follow all of its parents because we haven't
    parsed them yet
  - when we get to limit_list, we call mark_parents_uninteresting again.
    But we have already marked four^ as uninteresting, and therefore we
    do not recurse in marking
  - we add the parents to the list, but they are not interesting, and
    therefore we quit

The reason it works with test_tick is that it changes the order we deal
with the commits in limit_list. We deal with 'one' _after_ dealing with
the uninteresting parents, so we never bail with
everybody_uninteresting.

> +test_expect_failure 'one is ancestor of others and should not be shown' '
> +
> +	git rev-list one --not four >result &&
> +	>expect &&
> +	diff -u expect result 
> +
> +'

Hooray, test_expect_failure is used properly. But you are still
missing a test_done. :)

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  7:13       ` Jeff King
@ 2008-02-03  7:18         ` Jeff King
  2008-02-03  7:40           ` Junio C Hamano
  2008-02-03  8:18           ` Junio C Hamano
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2008-02-03  7:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Sun, Feb 03, 2008 at 02:13:18AM -0500, Jeff King wrote:

> Ah. I think what is happening is something like this:
> 
>   - when we add 'four' as uninteresting, we mark its parents as
>     uninteresting in handle_commit
>   - we don't recursively follow all of its parents because we haven't
>     parsed them yet
>   - when we get to limit_list, we call mark_parents_uninteresting again.
>     But we have already marked four^ as uninteresting, and therefore we
>     do not recurse in marking
>   - we add the parents to the list, but they are not interesting, and
>     therefore we quit

So the "fix" I posted before was to stop bailing on
everybody_uninteresting; clearly it is possible that although those
commits are uninteresting, we still have work to do on their ancestors.
There is probably a performance impact since we will end up traversing
the whole commit chain just to mark them all uninteresting.

We could also always recurse in make_parents_uninteresting; I think this
has the same performance problem, since we have to parse the parents for
each commit.

We could topologically order the commits going into limit_list (it just
works most of the time because the date ordering is _mostly_ right).
This guarantees that we deal with 'four' before 'one'. But topo sorting
is expensive.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  7:18         ` Jeff King
@ 2008-02-03  7:40           ` Junio C Hamano
  2008-02-03  7:47             ` Junio C Hamano
  2008-02-03  8:18           ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-03  7:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> We could topologically order the commits going into limit_list (it just
> works most of the time because the date ordering is _mostly_ right).
> This guarantees that we deal with 'four' before 'one'. But topo sorting
> is expensive.

I recall we did a rather clever optimization in merge-base.  I
am starting to suspect that we would need a similar trick there.

The issue is:

 * We have pushed "one" out already to "newlist", but we haven't
   given UNINTERESTING bit to it yet.

 * We are responsible to mark "one" UNINTERESTING, if it can be
   reached from a commit that is UNINTERESTING.  We expect
   further looping of the "while (list)" and
   mark_parents_uninteresting() in that loop will eventually
   smudge it.

 * We can obviously prove that we marked all UNINTERESTING
   commits that matters by traversing _all_ history (i.e. make
   sure mark_parents_uninteresting() recurses, and wait until
   "list" truly becomes empty), but we would want to somehow
   optimize it.  The everybody_uninteresting() check was
   introduced for that purpose, but that is not a right
   optimization if commit timestamps are skewed like this.

The right optimization is probably:

 * Wait until everybody on "list" is UNINTERESTING.  IOW, keep
   the "everybody_uninteresting()" check with break as is.

   At that point "newlist" will contain all the commits that we
   might be interested in (e.g. "one").  The issue is reduced
   from "mark _all_ commits that can be reached from known
   UNINTERESTING ones" to "make sure the commits on the newlist
   that are reachable from UNINTERESTING ones in the "list" are
   marked as UNINTERESTING (e.g. "one" should be checked for
   reachability from the remaining UNINTERESTING commits in
   "list", we do not have to check for anything else).

 * After the loop exits, traverse from all non UNINTERESTING
   commits on the "newlist" and all remaining commits on the
   "list" (by definition, the latter are UNINTERESTING) down to
   their common merge base, propagating UNINTERESTING bit down.

   Once we do that, we have proven that "one" is reachable from
   any of the UNINTERESTING commit.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  7:40           ` Junio C Hamano
@ 2008-02-03  7:47             ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2008-02-03  7:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

By the way, the issue does not have anything to do with merges.

This is the absolute minimum (and reliable) reproduction recipe.
You need four commits.  If you remove "commit three &&", the
traversal will contaminate "one".

---

 t/t6009-rev-list-parent.sh |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/t/t6009-rev-list-parent.sh b/t/t6009-rev-list-parent.sh
new file mode 100755
index 0000000..1da5dbb
--- /dev/null
+++ b/t/t6009-rev-list-parent.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='properly cull all ancestors'
+
+. ./test-lib.sh
+
+commit () {
+	test_tick &&
+	echo $1 >file &&
+	git commit -a -m $1 &&
+	git tag $1
+}
+
+test_expect_success setup '
+
+	touch file &&
+	git add file &&
+
+	commit one &&
+
+	test_tick=$(($test_tick - 2400))
+
+	commit two &&
+	commit three &&
+	commit four &&
+
+	git log --pretty=oneline --abbrev-commit
+'
+
+test_expect_failure 'one is ancestor of others and should not be shown' '
+
+	git rev-list one --not four >result &&
+	>expect &&
+	diff -u expect result 
+
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  7:18         ` Jeff King
  2008-02-03  7:40           ` Junio C Hamano
@ 2008-02-03  8:18           ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2008-02-03  8:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> There is probably a performance impact since we will end up traversing
> the whole commit chain just to mark them all uninteresting.

Yes.  "tip..initial" obviously need to traverse almost all
history to find out that the initial is reachable from tip, but
there is no avoiding that.

However, a typical usage is "old..new" and old and new are both
far from the initial commit, and forcing traversal down to the
initial commit in such a usual case is unacceptable.

I think we only need to traverse down to their merge bases to
prove that new cannot be reachable from old, and we can find out
all of their merge bases without traversing down to root.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-03  4:33   ` [RFH] revision limiting sometimes ignored Jeff King
  2008-02-03  6:24     ` Junio C Hamano
  2008-02-03  6:39     ` Junio C Hamano
@ 2008-02-04 17:32     ` Linus Torvalds
  2008-02-04 17:37       ` Linus Torvalds
  2008-02-04 19:08       ` Junio C Hamano
  2 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-04 17:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano



On Sat, 2 Feb 2008, Jeff King wrote:
> 
> OK, there is definitely a bug here, but I'm having some trouble figuring
> out the correct fix. It's in the revision walker, so I have cc'd those
> who are more clueful than I.

Ok, I agree that there is a bug, and your two-liner fix is a "fix" in that 
it works, but I think it's absolutely the wrogn fix because it is totally 
unacceptable from a performance angle. We obviously need to break out of 
the loop before we have walked the whole commit chain.

>  		if (obj->flags & UNINTERESTING) {
>  			mark_parents_uninteresting(commit);
> -			if (everybody_uninteresting(list))
> -				break;
>  			continue;
>  		}

So I think the real problem here is not that the logic is wrong in 
general, but that there is one *special* case where the logic to break out 
is wrong.

And that special case is when we hit the root commit which isn't negative.

That case is special because *normally*, if we have a positive commit, we 
will always continue to walk the parents of that positive commit, so the 
"everybody_interesting()" check will not trigger. BUT! If we hit a root 
commit and it is positive, that won't happen (since, by definition, it has 
no parents to keep the list populated with), and now we break out early.

So I think your fix is wrong, but it's "close" to right: I suspect that we 
can fix it by marking the "we hit the root commit" case, and just 
disabling it for that case.

This patch is untested and obviously won't even compile (I didn't actually 
add the "hit_root" bitfield to the revision struct), but shows what I 
*think* should fix this issue, without the performance problem.

But maybe I haven't thought it entirely through, and there is some other 
case that can trigger this bug.

So please somebody double-check my thinking.

			Linus

---
diff --git a/revision.c b/revision.c
index 6e85aaa..0e90988 100644
--- a/revision.c
+++ b/revision.c
@@ -456,6 +456,9 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit, str
 
 	left_flag = (commit->object.flags & SYMMETRIC_LEFT);
 
+	if (!commit->parents)
+		revs->hit_root = 1;
+
 	rest = !revs->first_parent_only;
 	for (parent = commit->parents, add = 1; parent; add = rest) {
 		struct commit *p = parent->item;
@@ -579,7 +582,7 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			if (everybody_uninteresting(list))
+			if (!revs->hit_root && everybody_uninteresting(list))
 				break;
 			continue;
 		}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-04 17:32     ` Linus Torvalds
@ 2008-02-04 17:37       ` Linus Torvalds
  2008-02-04 19:08       ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-04 17:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano



On Mon, 4 Feb 2008, Linus Torvalds wrote:
>
> This patch is untested and obviously won't even compile (I didn't actually 
> add the "hit_root" bitfield to the revision struct), but shows what I 
> *think* should fix this issue, without the performance problem.

Ok, so I was lazy. Here's the updated patch that actually compiles and is 
also now verified to fix Junio's test-case.

(Same patch, just the added bitfield declaration, and the testing ;)

		Linus
---
 revision.c |    5 ++++-
 revision.h |    3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index 6e85aaa..0e90988 100644
--- a/revision.c
+++ b/revision.c
@@ -456,6 +456,9 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit, str
 
 	left_flag = (commit->object.flags & SYMMETRIC_LEFT);
 
+	if (!commit->parents)
+		revs->hit_root = 1;
+
 	rest = !revs->first_parent_only;
 	for (parent = commit->parents, add = 1; parent; add = rest) {
 		struct commit *p = parent->item;
@@ -579,7 +582,7 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			if (everybody_uninteresting(list))
+			if (!revs->hit_root && everybody_uninteresting(list))
 				break;
 			continue;
 		}
diff --git a/revision.h b/revision.h
index 8572315..5188a2f 100644
--- a/revision.h
+++ b/revision.h
@@ -48,7 +48,8 @@ struct rev_info {
 			parents:1,
 			reverse:1,
 			cherry_pick:1,
-			first_parent_only:1;
+			first_parent_only:1,
+			hit_root:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-04 17:32     ` Linus Torvalds
  2008-02-04 17:37       ` Linus Torvalds
@ 2008-02-04 19:08       ` Junio C Hamano
  2008-02-04 20:03         ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-04 19:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So I think the real problem here is not that the logic is wrong in 
> general, but that there is one *special* case where the logic to break out 
> is wrong.
>
> And that special case is when we hit the root commit which isn't negative.
>
> That case is special because *normally*, if we have a positive commit, we 
> will always continue to walk the parents of that positive commit, so the 
> "everybody_interesting()" check will not trigger. BUT! If we hit a root 
> commit and it is positive, that won't happen (since, by definition, it has 
> no parents to keep the list populated with), and now we break out early.
>
> So I think your fix is wrong, but it's "close" to right: I suspect that we 
> can fix it by marking the "we hit the root commit" case, and just 
> disabling it for that case.

Ahh, I was preparing a response that begins with "Wow, a joy of
working in a mailing list with people more clever than me!  It's
so obvious but I did not think of it."  I've written something
like that more than a few times on this list responding to
several people, I think.

However, I am afraid that is not quite enough.  It is not just
"when we hit the root".

Consider the same topology in the small test (1-2-3-4) but with
three additional commits:

         B---C
        /
    ---A---1---2---3---4

Again, 2-3-4 are in nice chronological order, but 1 has the
younguest timestamp, and A-B-C are all younger than 1.

	$ rev-list 1 ^4 ^A
        $ rev-list 1 ^4 ^B

These two would both mark A as uninteresting while processing
the command line (revision.c::handle_commit()).  When we pop 1
off, the call to add_parents_to_list() for it will not add
anything positive back.

	$ rev-list 1 ^4 ^C

This would not mark A as uninteresting immediately, but by the
time 1 gets its turn, A is marked uninteresting.

So I think the rule to notice this situation with "hit-root"
flag is something like:

    when we pop a positive commit that does not have any
    positive parent left (root is a special case of this), and
    the negative parents were contaminated either by:

        (1) being listed as negative on the command line or being a
            direct parent of a negative commit listed on the
            command line; or by

        (2) traversing the list of negative commits who are all
            younger than the positive commit in question.

---

 t/t6009-rev-list-parent.sh |   36 ++++++++++++++++++++++++++++++++++--
 1 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/t/t6009-rev-list-parent.sh b/t/t6009-rev-list-parent.sh
index be3d238..0bb5ac4 100755
--- a/t/t6009-rev-list-parent.sh
+++ b/t/t6009-rev-list-parent.sh
@@ -16,6 +16,14 @@ test_expect_success setup '
 	touch file &&
 	git add file &&
 
+	commit zero &&
+	commit A &&
+	commit B &&
+	commit C &&
+
+	git reset --hard A &&
+
+	test_tick=$(($test_tick - 1200))
 	commit one &&
 
 	test_tick=$(($test_tick - 2400))
@@ -27,9 +35,33 @@ test_expect_success setup '
 	git log --pretty=oneline --abbrev-commit
 '
 
-test_expect_failure 'one is ancestor of others and should not be shown' '
+test_expect_failure '"zero ^four" should be empty' '
+
+	git rev-list zero --not four >result &&
+	>expect &&
+	diff -u expect result
+
+'
+
+test_expect_failure '"one ^four ^A" should be empty' '
+
+	git rev-list one --not four A >result &&
+	>expect &&
+	diff -u expect result
+
+'
+
+test_expect_failure '"one ^four ^B should be empty' '
+
+	git rev-list one --not four B >result &&
+	>expect &&
+	diff -u expect result
+
+'
+
+test_expect_failure '"one ^four ^C should be empty' '
 
-	git rev-list one --not four >result &&
+	git rev-list one --not four C >result &&
 	>expect &&
 	diff -u expect result
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-04 19:08       ` Junio C Hamano
@ 2008-02-04 20:03         ` Linus Torvalds
  2008-02-04 20:06           ` Linus Torvalds
  2008-02-04 20:50           ` Linus Torvalds
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-04 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git



On Mon, 4 Feb 2008, Junio C Hamano wrote:
> 
> However, I am afraid that is not quite enough.  It is not just
> "when we hit the root".

You're right. The proper test is actually "is the list of commits 
disconnected".

Which is not an entirely trivial thing to test for efficiently.

I suspect that we can solve it by not even bothering to be efficient, and 
instead just ask that question when we hit the "are all commits negative" 
query.

Gaah. This is that stupid apporach. The smarter one might be harder, 
especially for the special cases where you truly have two totally 
unconnected trees, ie:

	a -> b -> c

	d -> e -> f

and do

	git rev-list c f ^b ^e

were you actually do not _have_ a single connected graph at all, but you 
know the result should be "c" and "f" because they are *individually* 
connected to what we already know is uninteresting.

Not really tested at all, not really thought through. And that recursion 
avoidance could be smarter.

			Linus

---
 revision.c |   37 ++++++++++++++++++++++++++++++++++++-
 1 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/revision.c b/revision.c
index 6e85aaa..26b2343 100644
--- a/revision.c
+++ b/revision.c
@@ -558,6 +558,41 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 	free_patch_ids(&ids);
 }
 
+static inline int commit_is_connected(struct commit *commit)
+{
+	struct commit_list *parents;
+	for (;;) {
+		if (commit->object.flags & UNINTERESTING)
+			return 1;
+		parents = commit->parents;
+		if (!parents)
+			return 0;
+		if (parents->next)
+			break;
+		commit = parents->item;
+	}
+
+	do {
+		if (!commit_is_connected(parents->item))
+			return 0;
+		parents = parents->next;
+	} while (parents);
+	return 1;
+}
+
+/* Check that the positive list is connected to the negative one.. */
+static int is_connected(struct commit_list *list)
+{
+	while (list) {
+		struct commit *commit = list->item;
+
+		list = list->next;
+		if (!commit_is_connected(commit))
+			return 0;
+	}
+	return 1;
+}
+
 static int limit_list(struct rev_info *revs)
 {
 	struct commit_list *list = revs->commits;
@@ -579,7 +614,7 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			if (everybody_uninteresting(list))
+			if (everybody_uninteresting(list) && is_connected(newlist))
 				break;
 			continue;
 		}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-04 20:03         ` Linus Torvalds
@ 2008-02-04 20:06           ` Linus Torvalds
  2008-02-04 20:50           ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-04 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git



On Mon, 4 Feb 2008, Linus Torvalds wrote:
> 
> Not really tested at all, not really thought through. And that recursion 
> avoidance could be smarter.

.. and by "could be smarter", I obviously mean "really *really* should be 
smarter". Because this is O(2**n) in the number of merges, which is not 
acceptable even if the constant is really small.

So I really don't mean that we should do it this way. The right thing to 
do would be to add a new object flag for that "connected to UNINTERESTING" 
property, and setting it as we traverse the graph in that 
"commit_is_connected()" logic. That should get rid of the exponential 
behaviour.

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-04 20:03         ` Linus Torvalds
  2008-02-04 20:06           ` Linus Torvalds
@ 2008-02-04 20:50           ` Linus Torvalds
  2008-02-05  7:14             ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2008-02-04 20:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git



On Mon, 4 Feb 2008, Linus Torvalds wrote:
> 
> Gaah. This is that stupid apporach.

.. and it won't actually solve the problem you pointed to. It's not enough 
that the positive commits should be connected to the negative ones, the 
problem is that no negative ones could possibly connect to the positives. 

So scratch that patch as broken too. 

Really annoying. It does look like we really want to check the *totally* 
connected case, and we simply cannot do the "two unconnected trees" 
decision case without traversing both trees fully (since we won't know 
that they are *really* unconnected until we do).

And that seems really quite expensive. I wonder if I've missed something 
again.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-04 20:50           ` Linus Torvalds
@ 2008-02-05  7:14             ` Junio C Hamano
  2008-02-05 21:23               ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-05  7:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, 4 Feb 2008, Linus Torvalds wrote:
>> 
>> Gaah. This is that stupid apporach.
>
> .. and it won't actually solve the problem you pointed to. It's not enough 
> that the positive commits should be connected to the negative ones, the 
> problem is that no negative ones could possibly connect to the positives. 
>
> So scratch that patch as broken too. 
>
> Really annoying. It does look like we really want to check the *totally* 
> connected case, and we simply cannot do the "two unconnected trees" 
> decision case without traversing both trees fully (since we won't know 
> that they are *really* unconnected until we do).
>
> And that seems really quite expensive. I wonder if I've missed something 
> again.

I tend to agree.  In a totally connected history, the upper
bound we would need to traverse is down to the merge base of
still positive commits in the "newlist" and negative ones still
on the "list" when everybody on list becomes uninteresting.  And
if there are two unrelated histories, that traversal will need
to traverse down to respective roots.

Which sucks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05  7:14             ` Junio C Hamano
@ 2008-02-05 21:23               ` Linus Torvalds
  2008-02-05 22:34                 ` Johannes Schindelin
  2008-02-05 23:44                 ` Junio C Hamano
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-05 21:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git



On Mon, 4 Feb 2008, Junio C Hamano wrote:
> 
> I tend to agree.  In a totally connected history, the upper
> bound we would need to traverse is down to the merge base of
> still positive commits in the "newlist" and negative ones still
> on the "list" when everybody on list becomes uninteresting.  And
> if there are two unrelated histories, that traversal will need
> to traverse down to respective roots.
> 
> Which sucks.

I really wonder if the right thing is not simply to admit that we consider 
the commit time meaningful (within some fudge factor!), and then do:

 - make commit warn if any parent commit date is in the future from the 
   current commit date (allow a *small* fudge factor here, say 5 minutes).

 - teach fsck to complain about parent commits being in the future from 
   their children (allow the same small fudge factor).

 - make the revision walking code realize that if times are too close to 
   each other, it should walk a bit further back...

because quite frankly, this bug only shows up when your time goes 
backwards (or stays the same, but the fudge-factor should take care of 
that too).

So the revision walking "fudge factor" could be as simple as the 
following..

		Linus

---
 revision.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/revision.c b/revision.c
index 6e85aaa..e32e1e3 100644
--- a/revision.c
+++ b/revision.c
@@ -558,6 +558,8 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 	free_patch_ids(&ids);
 }
 
+#define FUDGE (60*60)	/* 1 hour fudge-factor */
+
 static int limit_list(struct rev_info *revs)
 {
 	struct commit_list *list = revs->commits;
@@ -579,6 +581,15 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
+
+			/*
+			 * If we have commits on the newlist, we don't
+			 * want to do the "everybody_uninteresting()"
+			 * test until we've hit a negative commit that
+			 * is solidly in the past
+			 */
+			if (newlist && newlist->item->date < commit->date + FUDGE)
+				continue;
 			if (everybody_uninteresting(list))
 				break;
 			continue;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 21:23               ` Linus Torvalds
@ 2008-02-05 22:34                 ` Johannes Schindelin
  2008-02-05 23:59                   ` Linus Torvalds
                                     ` (2 more replies)
  2008-02-05 23:44                 ` Junio C Hamano
  1 sibling, 3 replies; 34+ messages in thread
From: Johannes Schindelin @ 2008-02-05 22:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jeff King, git

Hi,

On Tue, 5 Feb 2008, Linus Torvalds wrote:

> I really wonder if the right thing is not simply to admit that we consider 
> the commit time meaningful (within some fudge factor!), and then do:
> 
>  - make commit warn if any parent commit date is in the future from the 
>    current commit date (allow a *small* fudge factor here, say 5 minutes).

5 minutes seems a little narrow to me.  I think we can even go with 86400 
seconds.

>  - teach fsck to complain about parent commits being in the future from 
>    their children (allow the same small fudge factor).
> 
>  - make the revision walking code realize that if times are too close to 
>    each other, it should walk a bit further back...

There is one big problem, though.  Sometimes your clock is set wrong for 
all the wrong reasons.  Look at

	http://article.gmane.org/gmane.comp.version-control.git/67848/raw

for example (look at the Original-Date).  If that happens without being 
noticed (and it happened here!), and it is pushed, you have to live with 
it.

_However_, I could imagine that you can get most of such errors by doing 
the same as gmane: realise that the clock skew is into the future, and 
just take the sensible date.

In our case, this would mean that the revision walker should realise that 
a child whose date is not older than its parent commit must be wrong.  And 
just take the parent's date instead (but maybe only for the purpose of 
limiting).

Hmm?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 21:23               ` Linus Torvalds
  2008-02-05 22:34                 ` Johannes Schindelin
@ 2008-02-05 23:44                 ` Junio C Hamano
  2008-02-06  0:52                   ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-05 23:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I really wonder if the right thing is not simply to admit that we consider 
> the commit time meaningful (within some fudge factor!), and then do:
>
>  - make commit warn if any parent commit date is in the future from the 
>    current commit date (allow a *small* fudge factor here, say 5 minutes).

Hmmm.  In other words, you are punished for trying to build on
top of somebody else who screwed up.  That sucks.

>  - teach fsck to complain about parent commits being in the future from 
>    their children (allow the same small fudge factor).

And this is even worse.

But I think these are sane thing to do regardless.  To prevent
problematic commits from contaminating other people, we could
add default post-receive hook and perhaps a new pre-merge hook
to inspect the newly obtained history to warn and reject a merge
(including fast-forward) if it contains commits far into the
future.  We would also need a mechanism to force a merge with
such a clock-skewed history if we did so.

>  - make the revision walking code realize that if times are too close to 
>    each other, it should walk a bit further back...
>
> because quite frankly, this bug only shows up when your time goes 
> backwards (or stays the same, but the fudge-factor should take care of 
> that too).

> @@ -579,6 +581,15 @@ static int limit_list(struct rev_info *revs)
>  			return -1;
>  		if (obj->flags & UNINTERESTING) {
>  			mark_parents_uninteresting(commit);
> +
> +			/*
> +			 * If we have commits on the newlist, we don't
> +			 * want to do the "everybody_uninteresting()"
> +			 * test until we've hit a negative commit that
> +			 * is solidly in the past
> +			 */
> +			if (newlist && newlist->item->date < commit->date + FUDGE)
> +				continue;
>  			if (everybody_uninteresting(list))
>  				break;
>  			continue;

We picked up commit which we know is the youngest in the "list".
Because we push into newlist as we traverse, newlist is sorted
by date, and the element pointed by it is the youngest in there.
That is something we have processed earlier, so it ought to be
younger than this commit.  Otherwise we have found a problematic
clock skew.  Ok, I think it should work.

The clock skew t6009 artificially introduces needs to be
shortened for this, though.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 22:34                 ` Johannes Schindelin
@ 2008-02-05 23:59                   ` Linus Torvalds
  2008-02-06 16:43                     ` Tilman Sauerbeck
  2008-02-06  1:22                   ` Nicolas Pitre
  2008-02-06  1:51                   ` Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2008-02-05 23:59 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Jeff King, Git Mailing List, Tilman Sauerbeck



On Tue, 5 Feb 2008, Johannes Schindelin wrote:
> > 
> >  - make commit warn if any parent commit date is in the future from the 
> >    current commit date (allow a *small* fudge factor here, say 5 minutes).
> 
> 5 minutes seems a little narrow to me.  I think we can even go with 86400 
> seconds.

Well, notice how I said *warn*. Not abort the commit. Not stop. Just make 
people very aware of the fact that clocks are skewed.

In the case that actually triggered this whole discussion, the problem 
seems to sadly have been in the original CVS tree (or whatever it was 
imported from): the project started in 2006, had lots of regular commits 
up to October 2007, and then suddenly it had a commit that had a date in 
2002!

[ For those interested in looking at this, the broken commit in that 
  Tilman's repo was commit 3a7340af2bd57488f832d7070b0ce96c4baa6b54, which 
  is from October 2002, and which is surrounded by commits from October 
  2007, so somebody was literally off by five years ]

In other words, the repo really was pretty broken, and the git behaviour 
came from that breakage.

One way to work around this kind of thing is to flag broken dates, and 
yes, we can probably find most of these kinds of random breakages (in the 
case of the broken repo, we had the parent of the broken commit already 
parsed, we could have seen that the date was bogus).

But yeah, I have to also admit that exactly *because* the bug came from 
some import from somewhere else, the date requirement cannot work - I 
don't want to change even obviously bogus data from an external import.

I don't see a good way to find the breakage efficiently and generally, 
though. In the particular case that this hit us, it's visible because the 
breakage is entirely local (ie you can see the broken commit  by just 
looking directly at its parents), but even if you have just *two* commits 
that are broken in succession, the breakage is no longer locally obvious 
at the later one.

Nasty.

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 23:44                 ` Junio C Hamano
@ 2008-02-06  0:52                   ` Linus Torvalds
  2008-02-06  5:30                     ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2008-02-06  0:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git



On Tue, 5 Feb 2008, Junio C Hamano wrote:
> >
> >  - make commit warn if any parent commit date is in the future from the 
> >    current commit date (allow a *small* fudge factor here, say 5 minutes).
> 
> Hmmm.  In other words, you are punished for trying to build on
> top of somebody else who screwed up.  That sucks.

Well, I was actually thinking that the most reasonable thing to do is that 
if you pull from somebody, and you get this warning, you send an email 
saying "you suck, I will not pull your broken crap".

But the real problem is that you might be importing it from some external 
legacy SCM entity, and then you can't say "you suck, I won't pull", 
because the whole point is that external entity obviously *does* suck, and 
you want to simply stop using it. And then the "I won't pull" isn't an 
option ;)

So yeah, I don't think the warnings really work, if only because of that 
"import from crappy CVS repo" issue.

But the revision.c change might be worth it, if only as a slight band-aid 
for the current issue. It won't fix the original problem, though (because 
that broken repo had a five *year* clock skew, not an hour :)

I'll continue to think about whether I can come up with some sane 
heuristic that allows non-broken cases to not go all the way up to the 
root.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 22:34                 ` Johannes Schindelin
  2008-02-05 23:59                   ` Linus Torvalds
@ 2008-02-06  1:22                   ` Nicolas Pitre
  2008-02-06  1:51                   ` Junio C Hamano
  2 siblings, 0 replies; 34+ messages in thread
From: Nicolas Pitre @ 2008-02-06  1:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, Jeff King, git

On Tue, 5 Feb 2008, Johannes Schindelin wrote:

> In our case, this would mean that the revision walker should realise that 
> a child whose date is not older than its parent commit must be wrong.  And 
> just take the parent's date instead (but maybe only for the purpose of 
> limiting).

Only for the committer's date of course.  Author's date might be 
completely random.


Nicolas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 22:34                 ` Johannes Schindelin
  2008-02-05 23:59                   ` Linus Torvalds
  2008-02-06  1:22                   ` Nicolas Pitre
@ 2008-02-06  1:51                   ` Junio C Hamano
  2008-02-06  6:05                     ` Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-06  1:51 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Jeff King, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> In our case, this would mean that the revision walker should realise that 
> a child whose date is not older than its parent commit must be wrong.  And 
> just take the parent's date instead (but maybe only for the purpose of 
> limiting).

No.

	1---2---3---4

Timestamps are 2 < 3 < 4 < 1 and you ask:

	$ git rev-list 1 ^4

We push 1 and ^4 in "list".  We pick 1 and push it out to
"newlist" (possible results, but the hope is they may later be
marked as UNINTERESTING as we traverse the remaining one still
on "list").  We pick ^4, mark 3 as UNINTERESTING and push ^3
into "list", and realize there is nobody that is still positive
(i.e. without UNINTERESTING bit).  We have "clever" optimization
that stops in such a case.

Nowhere in this sequence we can notice that "A child whose date
is not older than its parent".  We do not even get to commit 2
during the traversal.

In order to notice the problem, you need to make sure we will
see the link between 1 and 2 (i.e. the fact that 1 has a child
that is older than itself).  That would take traversing "all the
way down".

The "all the way down" is not quite correct, though.  If we have
other commits, like this:

              B---C
             /
     ---0---A---1---2---3---4

where timestamps are 0 < A < B < C < 2 < 3 < 4 < 1, and if you
ask:

	$ git rev-list 1 ^4 ^A
	$ git rev-list 1 ^4 ^B
	$ git rev-list 1 ^4 ^C

we will have a similar issue.  We do not have to go down the
potentially long history beyond A.  But we at least need to
traverse down to the merge base of negatives in "list" and
positives in "newlist" when "list" becomes all UNINTERESTING (in
this case, traverse all paths as if we are trying to find out
the merge-base between 1 and 3.  That traversal will see 2 and
we will see your clock skew).

But the point is that the condition you mentioned cannot be
found out unless you traverse to 2, and at that point you have
traversed enough already.

As Linus earlier said, the question really is: for positive
commits in "newlist", have we not missed any its UNINTERESTING
descendants?

For a toy-scale graph, a parallel merge-base traversal like what
show-branch does may work, but for a real workload, newlist
would contain literally hundreds of commits, so using unaltered
"merge-base" algorithm is probably not an option either.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06  0:52                   ` Linus Torvalds
@ 2008-02-06  5:30                     ` Junio C Hamano
  2008-02-06  8:16                       ` Karl Hasselström
  2008-02-06 10:34                       ` Linus Torvalds
  0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2008-02-06  5:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But the revision.c change might be worth it, if only as a slight band-aid 
> for the current issue. It won't fix the original problem, though (because 
> that broken repo had a five *year* clock skew, not an hour :)
>
> I'll continue to think about whether I can come up with some sane 
> heuristic that allows non-broken cases to not go all the way up to the 
> root.

I really wish this was still May 2005.  Then I (actually, you)
could just decree:

	Sorry guys, but you all need to run convert-objects to
	update your repo.  What it does is to add a "generation"
	header to each and every commit object.  Then upgrade
	your git to this version, that maintains the
	"generation" number, defined as:

        (1) parentless commit gets generation #0;

        (2) otherwise the generation number of a commit is
	    max(its parents' generation number)+1.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06  1:51                   ` Junio C Hamano
@ 2008-02-06  6:05                     ` Junio C Hamano
  2008-02-06  6:17                       ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2008-02-06  6:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Jeff King, git

Junio C Hamano <gitster@pobox.com> writes:

> As Linus earlier said, the question really is: for positive
> commits in "newlist", have we not missed any its UNINTERESTING
> descendants?
>
> For a toy-scale graph, a parallel merge-base traversal like what
> show-branch does may work, but for a real workload, newlist
> would contain literally hundreds of commits, so using unaltered
> "merge-base" algorithm is probably not an option either.

After exiting the while (list) we need to prove that each
positive commit in "newlist" cannot be reached by any of the
negative commit still in "list".

Even though "newlist" may have thousands of commits, we do not
have to inspect all of them.  In order to prove that we
traversed everything that matters, we will only need to look at
the ones whose ancestors are not in "newlist" (bottom commits)
and see if each of them can be reached from the negative ones.
If a non-bottom commit is reachable from one of the negative
ones, then the bottom commit that is ancestor of that non-bottom
commit surely is reachable as well.

We can make one pass to mark everything on "newlist" with one
bit from flags, and then another pass to mark the positive ones
whose parent has that bit set, so we would need two bits in
total while finding out the set of bottom commits (we can reuse
these two bits after we know what they are).

Once we find the set of bottom commits in "newlist", we would
need to prove that none of them can be reached from any of the
negative commits still in "list".  We can do this traversal
using two bits from flags, exactly like commit.c::merge_bases()

    for each bottom commit B {
	L = empty list
	B.flags |= PARENT2
	L.append(B)
	for each negative commit C in "list from limit_list()"
            C.flags |= PARENT1
            L.append(C)
	while (L) {
	    C = shift L;
	    flag = C.flags & (PARENT1|PARENT2);
            if (flag ==  (PARENT1|PARENT2))
 		continue; /* common */
	    for each parent P of commit C:
		pflag = P.flags & (PARENT1|PARENT2);
		if (pflag == flag)
                    continue;
		P.flags |= flags;
                L.append(P)
	}
        if (B.flags & PARENT1)
            we still need to traverse -- everybody_uninteresting()
	    in limit_list() main loop was not enough!
    }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06  6:05                     ` Junio C Hamano
@ 2008-02-06  6:17                       ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2008-02-06  6:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Jeff King, git

Junio C Hamano <gitster@pobox.com> writes:

> Once we find the set of bottom commits in "newlist", we would
> need to prove that none of them can be reached from any of the
> negative commits still in "list".  We can do this traversal
> using two bits from flags, exactly like commit.c::merge_bases()
>
>     for each bottom commit B {
>       L = empty list
>       B.flags |= PARENT2
>       L.append(B)
>       for each negative commit C in "list from limit_list()"
>           C.flags |= PARENT1
>           L.append(C)
>       while (L) {
>           C = shift L;
>           flag = C.flags & (PARENT1|PARENT2);
>             if (flag ==  (PARENT1|PARENT2))
>               continue; /* common */
>           for each parent P of commit C:
>               pflag = P.flags & (PARENT1|PARENT2);
>               if (pflag == flag)
>                     continue;
>               P.flags |= flags;
>                 L.append(P)
>       }
>       if (B.flags & PARENT1)
>           we still need to traverse -- everybody_uninteresting()
>           in limit_list() main loop was not enough!
>     }

Actually when we do this traversal, we can mark the positive
commits on "newlist" as negative when it is painted with
PARENT1.  I was assuming that as soon as we find that we are in
problematic history with this procedure we exit the whole thing
and resume to limit_list(), but we may be able to use this as
the postprocessing step to clean-up the result.  I just need to
prove that this postprocessing step will smudge all the "falsely
positive but in reality negative" ones...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06  5:30                     ` Junio C Hamano
@ 2008-02-06  8:16                       ` Karl Hasselström
  2008-02-06 10:34                       ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Karl Hasselström @ 2008-02-06  8:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Jeff King, git

On 2008-02-05 21:30:38 -0800, Junio C Hamano wrote:

> I really wish this was still May 2005. Then I (actually, you) could
> just decree:
>
>       Sorry guys, but you all need to run convert-objects to update
>       your repo. What it does is to add a "generation" header to
>       each and every commit object. Then upgrade your git to this
>       version, that maintains the "generation" number, defined as:
>
>         (1) parentless commit gets generation #0;
>
>         (2) otherwise the generation number of a commit is max(its
>             parents' generation number)+1.

Would it be possible to start adding a generation header to new
commits, so that this problem (and others -- I recall hearing this
same wish a year or two ago regarding some gitk toposorting issue)
will eventually fade away?

For old commits without an embedded generation number, git could
conceivably compute their generation number once and store them in a
(local) file somewhere.

I expect that this has been considered already, and I'd be interested
in hearing why it doesn't work, if you (or someone else) have some
time to waste. :-)

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06  5:30                     ` Junio C Hamano
  2008-02-06  8:16                       ` Karl Hasselström
@ 2008-02-06 10:34                       ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-06 10:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git



On Tue, 5 Feb 2008, Junio C Hamano wrote:
> 
> I really wish this was still May 2005.  Then I (actually, you)
> could just decree:

Yeah, we really should have done that, when this came up last.

We could still decide it's a good idea to do, and simply decide that

 - within all-new ranges (that *do* have generation numbers) we can use 
   the generation number to give certain guarantees.

 - when any commits involved don't have generation numbers, we just fall 
   back on the not-strict-guarantees-use-commit-date-heuristics.

but here's something I whipped up because I woke up at 2AM and decided 
that there is a simple heuristic that works *most* of the time.. We just 
add a bit of slop, namely:

 - we always walk an extra SLOP commits from the source list even if we 
   decide that the source list is probably all done (unless the source is 
   entirely empty, of course, because then we really can't do anything at 
   all)

 - we keep track of the date of the last commit we added to the 
   destination list (this will *generally* be the oldest entry we've seen 
   so far)

 - we compare that with the youngest entry (the first one) of the source 
   list, and if the destination is older than the source, we know we want 
   to look at the source.

 - otherwise, do the "everybody_uninteresting()" test to see whether we're 
   still interested in the source.

I dunno. It's really late (or early ;), and I'm having a headache. Maybe 
it doesn't really work. But the idea is that this should be able to handle 
the few occasional incorrect timestamps. Maybe.

		Linus

---
 revision.c |   35 ++++++++++++++++++++++++++++++++++-
 1 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/revision.c b/revision.c
index 6e85aaa..a50ae02 100644
--- a/revision.c
+++ b/revision.c
@@ -558,8 +558,39 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 	free_patch_ids(&ids);
 }
 
+/* How many extra uninteresting commits we want to see.. */
+#define SLOP 5
+
+static int still_interesting(struct commit_list *src, unsigned long date, int slop)
+{
+	/*
+	 * No source list at all? We're definitely done..
+	 */
+	if (!src)
+		return 0;
+
+	/*
+	 * Does the destination list contain entries with a date
+	 * before the source list? Definitely _not_ done.
+	 */
+	if (date < src->item->date)
+		return SLOP;
+
+	/*
+	 * Does the source list still have interesting commits in
+	 * it? Definitely not done..
+	 */
+	if (!everybody_uninteresting(src))
+		return SLOP;
+
+	/* Ok, we're closing in.. */
+	return slop-1;
+}
+
 static int limit_list(struct rev_info *revs)
 {
+	int slop = SLOP;
+	unsigned long date = ~0ul;
 	struct commit_list *list = revs->commits;
 	struct commit_list *newlist = NULL;
 	struct commit_list **p = &newlist;
@@ -579,12 +610,14 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			if (everybody_uninteresting(list))
+			slop = still_interesting(list, date, slop);
+			if (!slop)
 				break;
 			continue;
 		}
 		if (revs->min_age != -1 && (commit->date > revs->min_age))
 			continue;
+		date = commit->date;
 		p = &commit_list_insert(commit, p)->next;
 
 		show = show_early_output;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-05 23:59                   ` Linus Torvalds
@ 2008-02-06 16:43                     ` Tilman Sauerbeck
  2008-02-06 17:28                       ` Nicolas Pitre
  2008-02-06 19:26                       ` Linus Torvalds
  0 siblings, 2 replies; 34+ messages in thread
From: Tilman Sauerbeck @ 2008-02-06 16:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Junio C Hamano, Jeff King, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1617 bytes --]

Linus Torvalds [2008-02-05 15:59]:

Hi guys,
thanks for looking into this.

> On Tue, 5 Feb 2008, Johannes Schindelin wrote:
> > > 
> > >  - make commit warn if any parent commit date is in the future from the 
> > >    current commit date (allow a *small* fudge factor here, say 5 minutes).
> > 
> > 5 minutes seems a little narrow to me.  I think we can even go with 86400 
> > seconds.
> 
> Well, notice how I said *warn*. Not abort the commit. Not stop. Just make 
> people very aware of the fact that clocks are skewed.
> 
> In the case that actually triggered this whole discussion, the problem 
> seems to sadly have been in the original CVS tree (or whatever it was 
> imported from): the project started in 2006, had lots of regular commits 
> up to October 2007, and then suddenly it had a commit that had a date in 
> 2002!
> 
> [ For those interested in looking at this, the broken commit in that 
>   Tilman's repo was commit 3a7340af2bd57488f832d7070b0ce96c4baa6b54, which 
>   is from October 2002, and which is surrounded by commits from October 
>   2007, so somebody was literally off by five years ]

I'm not sure whether this repository was import from another SCM, but I
doubt it. I'm fairly sure that 3a7340af2bd57488f832d7070b0ce96c4baa6b54
was created using git commit though. I guess the committer's clock just
was a little late at that point.

Regards,
Tilman

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06 16:43                     ` Tilman Sauerbeck
@ 2008-02-06 17:28                       ` Nicolas Pitre
  2008-02-06 17:42                         ` Linus Torvalds
  2008-02-06 19:26                       ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: Nicolas Pitre @ 2008-02-06 17:28 UTC (permalink / raw)
  To: Tilman Sauerbeck
  Cc: Linus Torvalds, Johannes Schindelin, Junio C Hamano, Jeff King,
	Git Mailing List

On Wed, 6 Feb 2008, Tilman Sauerbeck wrote:

> Linus Torvalds [2008-02-05 15:59]:
> 
> Hi guys,
> thanks for looking into this.
> 
> > On Tue, 5 Feb 2008, Johannes Schindelin wrote:
> > > > 
> > > >  - make commit warn if any parent commit date is in the future from the 
> > > >    current commit date (allow a *small* fudge factor here, say 5 minutes).
> > > 
> > > 5 minutes seems a little narrow to me.  I think we can even go with 86400 
> > > seconds.
> > 
> > Well, notice how I said *warn*. Not abort the commit. Not stop. Just make 
> > people very aware of the fact that clocks are skewed.
> > 
> > In the case that actually triggered this whole discussion, the problem 
> > seems to sadly have been in the original CVS tree (or whatever it was 
> > imported from): the project started in 2006, had lots of regular commits 
> > up to October 2007, and then suddenly it had a commit that had a date in 
> > 2002!
> > 
> > [ For those interested in looking at this, the broken commit in that 
> >   Tilman's repo was commit 3a7340af2bd57488f832d7070b0ce96c4baa6b54, which 
> >   is from October 2002, and which is surrounded by commits from October 
> >   2007, so somebody was literally off by five years ]
> 
> I'm not sure whether this repository was import from another SCM, but I
> doubt it. I'm fairly sure that 3a7340af2bd57488f832d7070b0ce96c4baa6b54
> was created using git commit though. I guess the committer's clock just
> was a little late at that point.

Anyway, the author's date are not necessarily monotonic.

If I pick up patches on a mailing list and apply them in a different 
order than they were posted for whatever sensible reason, I expect Git 
to preserve the original date, even if that means my branch will end up 
with author dates stepping back and forth.  I might even apply a patch 
that was posted last month, or even last year -- that shouldn't matter.


Nicolas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06 17:28                       ` Nicolas Pitre
@ 2008-02-06 17:42                         ` Linus Torvalds
  2008-02-06 17:48                           ` Nicolas Pitre
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2008-02-06 17:42 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tilman Sauerbeck, Johannes Schindelin, Junio C Hamano, Jeff King,
	Git Mailing List



On Wed, 6 Feb 2008, Nicolas Pitre wrote:
> 
> Anyway, the author's date are not necessarily monotonic.

Git never even looks at the author date. Only the commit date matters.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06 17:42                         ` Linus Torvalds
@ 2008-02-06 17:48                           ` Nicolas Pitre
  0 siblings, 0 replies; 34+ messages in thread
From: Nicolas Pitre @ 2008-02-06 17:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tilman Sauerbeck, Johannes Schindelin, Junio C Hamano, Jeff King,
	Git Mailing List

On Wed, 6 Feb 2008, Linus Torvalds wrote:

> 
> 
> On Wed, 6 Feb 2008, Nicolas Pitre wrote:
> > 
> > Anyway, the author's date are not necessarily monotonic.
> 
> Git never even looks at the author date. Only the commit date matters.

OK good.


Nicolas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFH] revision limiting sometimes ignored
  2008-02-06 16:43                     ` Tilman Sauerbeck
  2008-02-06 17:28                       ` Nicolas Pitre
@ 2008-02-06 19:26                       ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2008-02-06 19:26 UTC (permalink / raw)
  To: Tilman Sauerbeck
  Cc: Johannes Schindelin, Junio C Hamano, Jeff King, Git Mailing List



On Wed, 6 Feb 2008, Tilman Sauerbeck wrote:
> 
> I'm not sure whether this repository was import from another SCM, but I
> doubt it. I'm fairly sure that 3a7340af2bd57488f832d7070b0ce96c4baa6b54
> was created using git commit though. I guess the committer's clock just
> was a little late at that point.

Heh. I thought it was imported from the outside because the commit log 
looks so damn nasty. I'm used to projects with good and readable logs, so 
I associate "native" git repos with logs that actually explain what's 
going on.

But yeah, it looks like your repo is a perfectly native git repo, it just 
has really sucky log messages. I guess you can create crap logs in any 
SCM, even if the system is written to encourage good logs ;)

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2008-02-06 19:28 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-02 12:21 [BUG?] git log picks up bad commit Tilman Sauerbeck
2008-02-03  3:00 ` Jeff King
2008-02-03  4:33   ` [RFH] revision limiting sometimes ignored Jeff King
2008-02-03  6:24     ` Junio C Hamano
2008-02-03  6:39     ` Junio C Hamano
2008-02-03  7:13       ` Jeff King
2008-02-03  7:18         ` Jeff King
2008-02-03  7:40           ` Junio C Hamano
2008-02-03  7:47             ` Junio C Hamano
2008-02-03  8:18           ` Junio C Hamano
2008-02-04 17:32     ` Linus Torvalds
2008-02-04 17:37       ` Linus Torvalds
2008-02-04 19:08       ` Junio C Hamano
2008-02-04 20:03         ` Linus Torvalds
2008-02-04 20:06           ` Linus Torvalds
2008-02-04 20:50           ` Linus Torvalds
2008-02-05  7:14             ` Junio C Hamano
2008-02-05 21:23               ` Linus Torvalds
2008-02-05 22:34                 ` Johannes Schindelin
2008-02-05 23:59                   ` Linus Torvalds
2008-02-06 16:43                     ` Tilman Sauerbeck
2008-02-06 17:28                       ` Nicolas Pitre
2008-02-06 17:42                         ` Linus Torvalds
2008-02-06 17:48                           ` Nicolas Pitre
2008-02-06 19:26                       ` Linus Torvalds
2008-02-06  1:22                   ` Nicolas Pitre
2008-02-06  1:51                   ` Junio C Hamano
2008-02-06  6:05                     ` Junio C Hamano
2008-02-06  6:17                       ` Junio C Hamano
2008-02-05 23:44                 ` Junio C Hamano
2008-02-06  0:52                   ` Linus Torvalds
2008-02-06  5:30                     ` Junio C Hamano
2008-02-06  8:16                       ` Karl Hasselström
2008-02-06 10:34                       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).