* rev-list --parents --full-history + path: something's fishy
[not found] <e1dab3980805230808s59798351r9ed702c7d0dedd2a@mail.gmail.com>
@ 2008-05-24 20:16 ` Johannes Sixt
2008-05-25 1:21 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2008-05-24 20:16 UTC (permalink / raw)
To: git; +Cc: David Tweed
This script creates this history:
C--M
/ /
A--B
Commits A and B modify file a.
git init
echo a > a
git add a
git commit -m A
echo 1 > a
git commit -m B -a
git checkout -b side master~1
echo b > b
git add b
git commit -m C
git merge master
At this point, this command returns the expected output (SHA1s rewritten to
A,B,C,M as above):
$ git rev-list --full-history HEAD -- a
B
A
but this does not:
$ git rev-list --full-history --parents HEAD -- a
M A B
B A
A
Of course, I'd expected to see this:
$ git rev-list --full-history --parents HEAD -- a
B A
A
Just as a heads-up, I've also seen a case (in David's repository) where a
git rev-list --full-history --parents --reverse HEAD -- some/path
incorrectly prints a commit somewhere in the middle *without* a parent where it
definitely should have printed a parent. I don't have a repeatable small
test-case, yet.
-- Hannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy
2008-05-24 20:16 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt
@ 2008-05-25 1:21 ` Linus Torvalds
2008-05-25 12:26 ` Johannes Sixt
2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt
0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2008-05-25 1:21 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git, David Tweed
On Sat, 24 May 2008, Johannes Sixt wrote:
>
> but this does not:
>
> $ git rev-list --full-history --parents HEAD -- a
> M A B
> B A
> A
That is the "correct" output.
That's what "--full-history" means: do not simplify merges away when all
the data comes from just one branch (in this case from "B").
So it shows you commit 'M' because you asked for full-history.
Commit 'M' has parents 'C' and 'B', but since 'C' doesn't actually modify
the file at all, the regular commit simplification will simplify 'C' away,
so now that parent 'C' will become 'A'. So 'M' has the _simplified_
parent's 'A' and 'B'.
Then it shows 'B' (parent 'A') and 'A' (no parent).
> Of course, I'd expected to see this:
>
> $ git rev-list --full-history --parents HEAD -- a
> B A
> A
Why did you ask for --full-history, if you're not interested in merges
that are irrelevant? To get what you wanted, just do
git rev-list --parents HEAD -- a
and it should give you exactly that output.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy
2008-05-25 1:21 ` Linus Torvalds
@ 2008-05-25 12:26 ` Johannes Sixt
2008-05-25 21:23 ` Linus Torvalds
2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt
1 sibling, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2008-05-25 12:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, David Tweed
On Sunday 25 May 2008 03:21, Linus Torvalds wrote:
> On Sat, 24 May 2008, Johannes Sixt wrote:
> > but this does not:
> >
> > $ git rev-list --full-history --parents HEAD -- a
> > M A B
> > B A
> > A
>
> That is the "correct" output.
>
> That's what "--full-history" means: do not simplify merges away when all
> the data comes from just one branch (in this case from "B").
>
> So it shows you commit 'M' because you asked for full-history.
>
> Commit 'M' has parents 'C' and 'B', but since 'C' doesn't actually modify
> the file at all, the regular commit simplification will simplify 'C' away,
> so now that parent 'C' will become 'A'. So 'M' has the _simplified_
> parent's 'A' and 'B'.
>
> Then it shows 'B' (parent 'A') and 'A' (no parent).
The history was this:
C--M
/ /
A--B
Now assume that both B and C change a, but so that it is identical in both B
and C. I thought that --full-history makes a difference *only* for this case,
because without --full-history the revision walk would choose either B or C
(not quite at random, but in an unspecified manner), but not both; but
with --full-history the revision walk would go both paths.
This makes a difference in git-filter-branch --subdirectory-filter: We do want
to simplify history to those commits that touch a path, but we don't want to
simplify away the case outlined in the previous paragraph.
> > Of course, I'd expected to see this:
> >
> > $ git rev-list --full-history --parents HEAD -- a
> > B A
> > A
>
> Why did you ask for --full-history, if you're not interested in merges
> that are irrelevant? To get what you wanted, just do
>
> git rev-list --parents HEAD -- a
>
> and it should give you exactly that output.
In the case at hand this would be sufficient, but in git-filter-branch we
don't want to prune branches whose modifications to a path make the path
identical.
What shall we do in git-filter-branch --subdirectory-filter?
-- Hannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy
2008-05-25 1:21 ` Linus Torvalds
2008-05-25 12:26 ` Johannes Sixt
@ 2008-05-25 19:58 ` Johannes Sixt
2008-05-25 21:30 ` Linus Torvalds
1 sibling, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2008-05-25 19:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, David Tweed
On Sunday 25 May 2008 03:21, Linus Torvalds wrote:
> On Sat, 24 May 2008, Johannes Sixt wrote:
> > but this does not:
> >
> > $ git rev-list --full-history --parents HEAD -- a
> > M A B
> > B A
> > A
>
> That is the "correct" output.
But why does this:
$ git rev-list --full-history HEAD -- a
B
A
not list M (note the lack of --parents)?
-- Hannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy
2008-05-25 12:26 ` Johannes Sixt
@ 2008-05-25 21:23 ` Linus Torvalds
2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2008-05-25 21:23 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git, David Tweed
On Sun, 25 May 2008, Johannes Sixt wrote:
>
> The history was this:
>
> C--M
> / /
> A--B
>
> Now assume that both B and C change a, but so that it is identical in both B
> and C. I thought that --full-history makes a difference *only* for this case,
> because without --full-history the revision walk would choose either B or C
> (not quite at random, but in an unspecified manner), but not both; but
> with --full-history the revision walk would go both paths.
You mis-understood what --full history does.
It literally just means "walk all paths". The fact that *also* means that
it often cannot simplify away merges at all is unfortunate, but there you
have it.
In other words, it will now leave the merge alone, even if it can see that
everything we care about (file 'a') came from just one side.
> This makes a difference in git-filter-branch --subdirectory-filter: We do want
> to simplify history to those commits that touch a path, but we don't want to
> simplify away the case outlined in the previous paragraph.
If so, you need to expand on the history simplification a *lot*.
It currently does two things:
- simplify away merges when it can see that one parent is identical in
content to the end result - it then removes the merge entirely, and
replaces it with the identical parent.
This is the thing that would normally take the merge 'M', and just
replace it with 'B', because it sees that the contents all come from B.
But if 'C' _also_ changed the file, and 'M' was actually a data merge,
then it will be left alone (because the merge 'M' really is meaningful
as far as the data is concerned)
This is what "--full-history" disables.
So when you say "--full-history", all merges are left alone.
- The *other* simplification is the non-merge case, where it simplifies
away commits that don't change the files. This is the one that then
removes 'C' in your original example.
You can disable this simplification with "--sparse". Not that anybody
ever wants to.
What you seem to want in a *third* level of simplification, which is to
remove merges that turn out to be pointless, because all parents end up
being directly related. We don't do that simplification, and we never
have.
I'd love to do it, but it's somewhat costly and very much more complicated
than the simplifications we do do.
> What shall we do in git-filter-branch --subdirectory-filter?
See above. I can tell you _where_ you'd need to add the logic. See the
file revision.c: remove_duplicate_parents(). You'd could "just" extend
that to remove not just 100% duplicates, but also remove parents that are
direct ancestors of each other. I say "just" in quotes, because that's not
trivial to do efficiently.
But it gets worse. If that removal of parents then turns the commit into a
regular one, *and* it didn't actually change the files you are interested
in, you'd need to remove it entirely, which in turn means that you'd also
need to rewrite the parent information of the children. Which means that
simplify_commit() is actually too late, because that one happens when you
print things out, and the children have already been returned (with what
now is stale parenthood!).
So the _obvious_ place to do that simplification is actually too late.
But doing it earlier is also hard, because that "simplify_commit()" is
what removes the trivial linear ones.
So you'd actually need to add a whole new phase that removes the trivial
linear cases *before* we are in the whole get_revision() phase, and then
does the commit simplification. It's nasty and quite complicated. Which is
why we don't do it.
The "revision.c" commit history simplification is already arguably some of
the most complicated and subtle code in all of git. The code needs to be
able to handle the "normal" case (which is to stream the commits without
pre-computing the whole DAG) efficiently, but then there are those
complicated cases that need the whole DAG. And they have to live
side-by-side.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy
2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt
@ 2008-05-25 21:30 ` Linus Torvalds
0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2008-05-25 21:30 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git, David Tweed
On Sun, 25 May 2008, Johannes Sixt wrote:
>
> But why does this:
>
> $ git rev-list --full-history HEAD -- a
> B
> A
>
> not list M (note the lack of --parents)?
Because when we don't ask for --parents, the whole problem is *much*
simpler. The parent rewriting means that the history has to all "fit
together". But when you don't need parenthood, then suddenly that doesn't
matter at all - who cares if it fits together or not, when you can't *see*
that it doesn't fit together anyway?
In this case, it's "simplify_commit()", and this piece of code in
particular (note how it's even commented!):
...
/* Commit without changes? */
if (commit->object.flags & TREESAME) {
/* drop merges unless we want parenthood */
if (!revs->rewrite_parents)
return commit_ignore;
...
ie if we're looking at a commit that doesn't actually introduce any
changes of its own (it took all the changes from at least _one_ of its
parents - ie it got TREESAME set because the tree was identical to one of
the parents), then if we don't have 'rewrite_parents' set, we just drop
that commit, because it is uninteresting.
IOW, we dropped 'M' because there was no point in showing it: we know
nobody refers to it (because no other commit will list it as a parent!),
and the commit itself didn't actually introduce any changes (because all
the changes came from 'B').
But we can *not* drop that merge commit when we do the parenthood
tracking, because if we did so, we'd just have an "empty spot" in history
(we have other commits that point to that emrge and list it as a parent).
Of course, in your trivial example, that didn't actually happen (because
'M' was the top commit), but try it with something more complex.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history"
2008-05-25 21:23 ` Linus Torvalds
@ 2008-05-26 19:09 ` Johannes Sixt
2008-05-27 10:55 ` David Tweed
2008-05-28 5:25 ` Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Johannes Sixt @ 2008-05-26 19:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, David Tweed, Johannes Schindelin
This reverts commit cfabd6eee1745cfec58cfcb794ce8847e43b888a. I had
implemented it without understanding what --full-history does. Consider
this history:
C--M--N
/ / /
A--B /
\ /
D-/
where B and C modify a path, X, in the same way so that the result is
identical, and D does not modify it at all. With the path limiter X and
without --full-history this is simplified to
A--B
i.e. only one of the paths via B or C is chosen. I had assumed that
--full-history would keep both paths like this
C--M
/ /
A--B
removing the path via D; but in fact it keeps the entire history.
Currently, git does not have the capability to simplify to this
intermediary case. However, the other extreme to keep the entire history
is not wanted either in usual cases. I think we can expect that histories
like the above are rare, and in the usual cases we want a simplified
history. So let's remove --full-history again.
(Concerning t7003, subsequent tests depend on what the test case sets up,
so we can't just back out the entire test case.)
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
---
On Sunday 25 May 2008 23:23, Linus Torvalds wrote:
> On Sun, 25 May 2008, Johannes Sixt wrote:
> > The history was this:
> >
> > C--M
> > / /
> > A--B
> >
> > Now assume that both B and C change a, but so that it is identical in
> > both B and C. I thought that --full-history makes a difference *only* for
> > this case, because without --full-history the revision walk would choose
> > either B or C (not quite at random, but in an unspecified manner), but
> > not both; but with --full-history the revision walk would go both paths.
>
> You mis-understood what --full history does.
Yes, indeed. Thank you for your explanations. I'm not prepared to dive into
the revision walk manchinery, so I instead propose to just remove
--full-history from git-filter-branch.
-- Hannes
git-filter-branch.sh | 2 +-
t/t7003-filter-branch.sh | 13 ++-----------
2 files changed, 3 insertions(+), 12 deletions(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 80e99e5..d04c346 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -234,7 +234,7 @@ case "$filter_subdir" in
;;
*)
git rev-list --reverse --topo-order --default HEAD \
- --parents --full-history "$@" -- "$filter_subdir"
+ --parents "$@" -- "$filter_subdir"
esac > ../revs || die "Could not get the commits"
commits=$(wc -l <../revs | tr -d " ")
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 1639c7a..3577aa6 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -97,7 +97,7 @@ test_expect_success 'subdirectory filter result looks okay' '
test_must_fail git show sub:subdir
'
-test_expect_success 'setup and filter history that requires --full-history' '
+test_expect_success 'more setup' '
git checkout master &&
mkdir subdir &&
echo A > subdir/new &&
@@ -107,16 +107,7 @@ test_expect_success 'setup and filter history that requires --full-history' '
git rm a &&
test_tick &&
git commit -m "again subdir on master" &&
- git merge branch &&
- git branch sub-master &&
- git-filter-branch -f --subdirectory-filter subdir sub-master
-'
-
-test_expect_success 'subdirectory filter result looks okay' '
- test 3 = $(git rev-list -1 --parents sub-master | wc -w) &&
- git show sub-master^:new &&
- git show sub-master^2:new &&
- test_must_fail git show sub:subdir
+ git merge branch
'
test_expect_success 'use index-filter to move into a subdirectory' '
--
1.5.6.rc0.17.gbc20
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history"
2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt
@ 2008-05-27 10:55 ` David Tweed
2008-05-28 5:25 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: David Tweed @ 2008-05-27 10:55 UTC (permalink / raw)
To: Johannes Sixt, Linus Torvalds, git, Johannes.Schindelin
On Mon, May 26, 2008 at 8:09 PM, Johannes Sixt <johannes.sixt@telecom.at> wrote:
> This reverts commit cfabd6eee1745cfec58cfcb794ce8847e43b888a. I had
FWIW, applying this patch (getting 1.5.6.rc0.29.g3beb5.dirty) it now
filters the case I was trying successfully (although I understand
Johannes' point that this fix only works because my commit graph is
only "quite messy" and it wouldn't work if it was "extremely messy":-)
).
Thanks to Johannes for his time looking at this.
--
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history"
2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt
2008-05-27 10:55 ` David Tweed
@ 2008-05-28 5:25 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2008-05-28 5:25 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Linus Torvalds, git, David Tweed, Johannes Schindelin
Johannes Sixt <johannes.sixt@telecom.at> writes:
> ... I'm not prepared to dive into
> the revision walk manchinery, so I instead propose to just remove
> --full-history from git-filter-branch.
Ok, let's punt for now and rethink this in the next cycle for 1.6.0.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-05-28 5:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <e1dab3980805230808s59798351r9ed702c7d0dedd2a@mail.gmail.com>
2008-05-24 20:16 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt
2008-05-25 1:21 ` Linus Torvalds
2008-05-25 12:26 ` Johannes Sixt
2008-05-25 21:23 ` Linus Torvalds
2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt
2008-05-27 10:55 ` David Tweed
2008-05-28 5:25 ` Junio C Hamano
2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt
2008-05-25 21:30 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).