* rev-list --parents --full-history + path: something's fishy [not found] <e1dab3980805230808s59798351r9ed702c7d0dedd2a@mail.gmail.com> @ 2008-05-24 20:16 ` Johannes Sixt 2008-05-25 1:21 ` Linus Torvalds 0 siblings, 1 reply; 9+ messages in thread From: Johannes Sixt @ 2008-05-24 20:16 UTC (permalink / raw) To: git; +Cc: David Tweed This script creates this history: C--M / / A--B Commits A and B modify file a. git init echo a > a git add a git commit -m A echo 1 > a git commit -m B -a git checkout -b side master~1 echo b > b git add b git commit -m C git merge master At this point, this command returns the expected output (SHA1s rewritten to A,B,C,M as above): $ git rev-list --full-history HEAD -- a B A but this does not: $ git rev-list --full-history --parents HEAD -- a M A B B A A Of course, I'd expected to see this: $ git rev-list --full-history --parents HEAD -- a B A A Just as a heads-up, I've also seen a case (in David's repository) where a git rev-list --full-history --parents --reverse HEAD -- some/path incorrectly prints a commit somewhere in the middle *without* a parent where it definitely should have printed a parent. I don't have a repeatable small test-case, yet. -- Hannes ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy 2008-05-24 20:16 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt @ 2008-05-25 1:21 ` Linus Torvalds 2008-05-25 12:26 ` Johannes Sixt 2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt 0 siblings, 2 replies; 9+ messages in thread From: Linus Torvalds @ 2008-05-25 1:21 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, David Tweed On Sat, 24 May 2008, Johannes Sixt wrote: > > but this does not: > > $ git rev-list --full-history --parents HEAD -- a > M A B > B A > A That is the "correct" output. That's what "--full-history" means: do not simplify merges away when all the data comes from just one branch (in this case from "B"). So it shows you commit 'M' because you asked for full-history. Commit 'M' has parents 'C' and 'B', but since 'C' doesn't actually modify the file at all, the regular commit simplification will simplify 'C' away, so now that parent 'C' will become 'A'. So 'M' has the _simplified_ parent's 'A' and 'B'. Then it shows 'B' (parent 'A') and 'A' (no parent). > Of course, I'd expected to see this: > > $ git rev-list --full-history --parents HEAD -- a > B A > A Why did you ask for --full-history, if you're not interested in merges that are irrelevant? To get what you wanted, just do git rev-list --parents HEAD -- a and it should give you exactly that output. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy 2008-05-25 1:21 ` Linus Torvalds @ 2008-05-25 12:26 ` Johannes Sixt 2008-05-25 21:23 ` Linus Torvalds 2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt 1 sibling, 1 reply; 9+ messages in thread From: Johannes Sixt @ 2008-05-25 12:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, David Tweed On Sunday 25 May 2008 03:21, Linus Torvalds wrote: > On Sat, 24 May 2008, Johannes Sixt wrote: > > but this does not: > > > > $ git rev-list --full-history --parents HEAD -- a > > M A B > > B A > > A > > That is the "correct" output. > > That's what "--full-history" means: do not simplify merges away when all > the data comes from just one branch (in this case from "B"). > > So it shows you commit 'M' because you asked for full-history. > > Commit 'M' has parents 'C' and 'B', but since 'C' doesn't actually modify > the file at all, the regular commit simplification will simplify 'C' away, > so now that parent 'C' will become 'A'. So 'M' has the _simplified_ > parent's 'A' and 'B'. > > Then it shows 'B' (parent 'A') and 'A' (no parent). The history was this: C--M / / A--B Now assume that both B and C change a, but so that it is identical in both B and C. I thought that --full-history makes a difference *only* for this case, because without --full-history the revision walk would choose either B or C (not quite at random, but in an unspecified manner), but not both; but with --full-history the revision walk would go both paths. This makes a difference in git-filter-branch --subdirectory-filter: We do want to simplify history to those commits that touch a path, but we don't want to simplify away the case outlined in the previous paragraph. > > Of course, I'd expected to see this: > > > > $ git rev-list --full-history --parents HEAD -- a > > B A > > A > > Why did you ask for --full-history, if you're not interested in merges > that are irrelevant? To get what you wanted, just do > > git rev-list --parents HEAD -- a > > and it should give you exactly that output. In the case at hand this would be sufficient, but in git-filter-branch we don't want to prune branches whose modifications to a path make the path identical. What shall we do in git-filter-branch --subdirectory-filter? -- Hannes ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy 2008-05-25 12:26 ` Johannes Sixt @ 2008-05-25 21:23 ` Linus Torvalds 2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt 0 siblings, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2008-05-25 21:23 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, David Tweed On Sun, 25 May 2008, Johannes Sixt wrote: > > The history was this: > > C--M > / / > A--B > > Now assume that both B and C change a, but so that it is identical in both B > and C. I thought that --full-history makes a difference *only* for this case, > because without --full-history the revision walk would choose either B or C > (not quite at random, but in an unspecified manner), but not both; but > with --full-history the revision walk would go both paths. You mis-understood what --full history does. It literally just means "walk all paths". The fact that *also* means that it often cannot simplify away merges at all is unfortunate, but there you have it. In other words, it will now leave the merge alone, even if it can see that everything we care about (file 'a') came from just one side. > This makes a difference in git-filter-branch --subdirectory-filter: We do want > to simplify history to those commits that touch a path, but we don't want to > simplify away the case outlined in the previous paragraph. If so, you need to expand on the history simplification a *lot*. It currently does two things: - simplify away merges when it can see that one parent is identical in content to the end result - it then removes the merge entirely, and replaces it with the identical parent. This is the thing that would normally take the merge 'M', and just replace it with 'B', because it sees that the contents all come from B. But if 'C' _also_ changed the file, and 'M' was actually a data merge, then it will be left alone (because the merge 'M' really is meaningful as far as the data is concerned) This is what "--full-history" disables. So when you say "--full-history", all merges are left alone. - The *other* simplification is the non-merge case, where it simplifies away commits that don't change the files. This is the one that then removes 'C' in your original example. You can disable this simplification with "--sparse". Not that anybody ever wants to. What you seem to want in a *third* level of simplification, which is to remove merges that turn out to be pointless, because all parents end up being directly related. We don't do that simplification, and we never have. I'd love to do it, but it's somewhat costly and very much more complicated than the simplifications we do do. > What shall we do in git-filter-branch --subdirectory-filter? See above. I can tell you _where_ you'd need to add the logic. See the file revision.c: remove_duplicate_parents(). You'd could "just" extend that to remove not just 100% duplicates, but also remove parents that are direct ancestors of each other. I say "just" in quotes, because that's not trivial to do efficiently. But it gets worse. If that removal of parents then turns the commit into a regular one, *and* it didn't actually change the files you are interested in, you'd need to remove it entirely, which in turn means that you'd also need to rewrite the parent information of the children. Which means that simplify_commit() is actually too late, because that one happens when you print things out, and the children have already been returned (with what now is stale parenthood!). So the _obvious_ place to do that simplification is actually too late. But doing it earlier is also hard, because that "simplify_commit()" is what removes the trivial linear ones. So you'd actually need to add a whole new phase that removes the trivial linear cases *before* we are in the whole get_revision() phase, and then does the commit simplification. It's nasty and quite complicated. Which is why we don't do it. The "revision.c" commit history simplification is already arguably some of the most complicated and subtle code in all of git. The code needs to be able to handle the "normal" case (which is to stream the commits without pre-computing the whole DAG) efficiently, but then there are those complicated cases that need the whole DAG. And they have to live side-by-side. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" 2008-05-25 21:23 ` Linus Torvalds @ 2008-05-26 19:09 ` Johannes Sixt 2008-05-27 10:55 ` David Tweed 2008-05-28 5:25 ` Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Johannes Sixt @ 2008-05-26 19:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, David Tweed, Johannes Schindelin This reverts commit cfabd6eee1745cfec58cfcb794ce8847e43b888a. I had implemented it without understanding what --full-history does. Consider this history: C--M--N / / / A--B / \ / D-/ where B and C modify a path, X, in the same way so that the result is identical, and D does not modify it at all. With the path limiter X and without --full-history this is simplified to A--B i.e. only one of the paths via B or C is chosen. I had assumed that --full-history would keep both paths like this C--M / / A--B removing the path via D; but in fact it keeps the entire history. Currently, git does not have the capability to simplify to this intermediary case. However, the other extreme to keep the entire history is not wanted either in usual cases. I think we can expect that histories like the above are rare, and in the usual cases we want a simplified history. So let's remove --full-history again. (Concerning t7003, subsequent tests depend on what the test case sets up, so we can't just back out the entire test case.) Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> --- On Sunday 25 May 2008 23:23, Linus Torvalds wrote: > On Sun, 25 May 2008, Johannes Sixt wrote: > > The history was this: > > > > C--M > > / / > > A--B > > > > Now assume that both B and C change a, but so that it is identical in > > both B and C. I thought that --full-history makes a difference *only* for > > this case, because without --full-history the revision walk would choose > > either B or C (not quite at random, but in an unspecified manner), but > > not both; but with --full-history the revision walk would go both paths. > > You mis-understood what --full history does. Yes, indeed. Thank you for your explanations. I'm not prepared to dive into the revision walk manchinery, so I instead propose to just remove --full-history from git-filter-branch. -- Hannes git-filter-branch.sh | 2 +- t/t7003-filter-branch.sh | 13 ++----------- 2 files changed, 3 insertions(+), 12 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 80e99e5..d04c346 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -234,7 +234,7 @@ case "$filter_subdir" in ;; *) git rev-list --reverse --topo-order --default HEAD \ - --parents --full-history "$@" -- "$filter_subdir" + --parents "$@" -- "$filter_subdir" esac > ../revs || die "Could not get the commits" commits=$(wc -l <../revs | tr -d " ") diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 1639c7a..3577aa6 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -97,7 +97,7 @@ test_expect_success 'subdirectory filter result looks okay' ' test_must_fail git show sub:subdir ' -test_expect_success 'setup and filter history that requires --full-history' ' +test_expect_success 'more setup' ' git checkout master && mkdir subdir && echo A > subdir/new && @@ -107,16 +107,7 @@ test_expect_success 'setup and filter history that requires --full-history' ' git rm a && test_tick && git commit -m "again subdir on master" && - git merge branch && - git branch sub-master && - git-filter-branch -f --subdirectory-filter subdir sub-master -' - -test_expect_success 'subdirectory filter result looks okay' ' - test 3 = $(git rev-list -1 --parents sub-master | wc -w) && - git show sub-master^:new && - git show sub-master^2:new && - test_must_fail git show sub:subdir + git merge branch ' test_expect_success 'use index-filter to move into a subdirectory' ' -- 1.5.6.rc0.17.gbc20 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" 2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt @ 2008-05-27 10:55 ` David Tweed 2008-05-28 5:25 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: David Tweed @ 2008-05-27 10:55 UTC (permalink / raw) To: Johannes Sixt, Linus Torvalds, git, Johannes.Schindelin On Mon, May 26, 2008 at 8:09 PM, Johannes Sixt <johannes.sixt@telecom.at> wrote: > This reverts commit cfabd6eee1745cfec58cfcb794ce8847e43b888a. I had FWIW, applying this patch (getting 1.5.6.rc0.29.g3beb5.dirty) it now filters the case I was trying successfully (although I understand Johannes' point that this fix only works because my commit graph is only "quite messy" and it wouldn't work if it was "extremely messy":-) ). Thanks to Johannes for his time looking at this. -- cheers, dave tweed__________________________ david.tweed@gmail.com Rm 124, School of Systems Engineering, University of Reading. "while having code so boring anyone can maintain it, use Python." -- attempted insult seen on slashdot ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" 2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt 2008-05-27 10:55 ` David Tweed @ 2008-05-28 5:25 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2008-05-28 5:25 UTC (permalink / raw) To: Johannes Sixt; +Cc: Linus Torvalds, git, David Tweed, Johannes Schindelin Johannes Sixt <johannes.sixt@telecom.at> writes: > ... I'm not prepared to dive into > the revision walk manchinery, so I instead propose to just remove > --full-history from git-filter-branch. Ok, let's punt for now and rethink this in the next cycle for 1.6.0. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy 2008-05-25 1:21 ` Linus Torvalds 2008-05-25 12:26 ` Johannes Sixt @ 2008-05-25 19:58 ` Johannes Sixt 2008-05-25 21:30 ` Linus Torvalds 1 sibling, 1 reply; 9+ messages in thread From: Johannes Sixt @ 2008-05-25 19:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, David Tweed On Sunday 25 May 2008 03:21, Linus Torvalds wrote: > On Sat, 24 May 2008, Johannes Sixt wrote: > > but this does not: > > > > $ git rev-list --full-history --parents HEAD -- a > > M A B > > B A > > A > > That is the "correct" output. But why does this: $ git rev-list --full-history HEAD -- a B A not list M (note the lack of --parents)? -- Hannes ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rev-list --parents --full-history + path: something's fishy 2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt @ 2008-05-25 21:30 ` Linus Torvalds 0 siblings, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2008-05-25 21:30 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, David Tweed On Sun, 25 May 2008, Johannes Sixt wrote: > > But why does this: > > $ git rev-list --full-history HEAD -- a > B > A > > not list M (note the lack of --parents)? Because when we don't ask for --parents, the whole problem is *much* simpler. The parent rewriting means that the history has to all "fit together". But when you don't need parenthood, then suddenly that doesn't matter at all - who cares if it fits together or not, when you can't *see* that it doesn't fit together anyway? In this case, it's "simplify_commit()", and this piece of code in particular (note how it's even commented!): ... /* Commit without changes? */ if (commit->object.flags & TREESAME) { /* drop merges unless we want parenthood */ if (!revs->rewrite_parents) return commit_ignore; ... ie if we're looking at a commit that doesn't actually introduce any changes of its own (it took all the changes from at least _one_ of its parents - ie it got TREESAME set because the tree was identical to one of the parents), then if we don't have 'rewrite_parents' set, we just drop that commit, because it is uninteresting. IOW, we dropped 'M' because there was no point in showing it: we know nobody refers to it (because no other commit will list it as a parent!), and the commit itself didn't actually introduce any changes (because all the changes came from 'B'). But we can *not* drop that merge commit when we do the parenthood tracking, because if we did so, we'd just have an "empty spot" in history (we have other commits that point to that emrge and list it as a parent). Of course, in your trivial example, that didn't actually happen (because 'M' was the top commit), but try it with something more complex. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-05-28 5:26 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <e1dab3980805230808s59798351r9ed702c7d0dedd2a@mail.gmail.com> 2008-05-24 20:16 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt 2008-05-25 1:21 ` Linus Torvalds 2008-05-25 12:26 ` Johannes Sixt 2008-05-25 21:23 ` Linus Torvalds 2008-05-26 19:09 ` [PATCH/RFC] Revert "filter-branch: subdirectory filter needs --full-history" Johannes Sixt 2008-05-27 10:55 ` David Tweed 2008-05-28 5:25 ` Junio C Hamano 2008-05-25 19:58 ` rev-list --parents --full-history + path: something's fishy Johannes Sixt 2008-05-25 21:30 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).