* [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter @ 2008-05-30 21:43 Kevin Ballard 2008-05-30 22:52 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Kevin Ballard @ 2008-05-30 21:43 UTC (permalink / raw) To: git; +Cc: Kevin Ballard, Junio C Hamano The old description was misleading and logically impossible. It claimed that the ancestors of the original commit would be re-written to have the multiple emitted ids as parents. Not only would this modify existing objects, but it would create a cycle. What this actually does is pass the multiple emitted ids to the newly-created children to use as parents. Signed-off-by: Kevin Ballard <kevin@sb.org> --- Documentation/git-filter-branch.txt | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 506c37a..541bf23 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -113,8 +113,8 @@ OPTIONS stdin. The commit id is expected on stdout. + As a special extension, the commit filter may emit multiple -commit ids; in that case, ancestors of the original commit will -have all of them as parents. +commit ids; in that case, the rewritten children of the original commit will +have all of them as parents. You probably don't want to do this. + You can use the 'map' convenience function in this filter, and other convenience functions, too. For example, calling 'skip_commit "$@"' -- 1.5.6.rc0.158.g7c7a1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-30 21:43 [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter Kevin Ballard @ 2008-05-30 22:52 ` Junio C Hamano 2008-05-30 23:07 ` Kevin Ballard 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2008-05-30 22:52 UTC (permalink / raw) To: Kevin Ballard; +Cc: git Kevin Ballard <kevin@sb.org> writes: > The old description was misleading and logically impossible. It claimed that > the ancestors of the original commit would be re-written to have the multiple > emitted ids as parents. Not only would this modify existing objects, but it > would create a cycle. What this actually does is pass the multiple emitted ids > to the newly-created children to use as parents. > > Signed-off-by: Kevin Ballard <kevin@sb.org> > --- > Documentation/git-filter-branch.txt | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt > index 506c37a..541bf23 100644 > --- a/Documentation/git-filter-branch.txt > +++ b/Documentation/git-filter-branch.txt > @@ -113,8 +113,8 @@ OPTIONS > stdin. The commit id is expected on stdout. > + > As a special extension, the commit filter may emit multiple > -commit ids; in that case, ancestors of the original commit will > -have all of them as parents. > +commit ids; in that case, the rewritten children of the original commit will > +have all of them as parents. You probably don't want to do this. > + Now I am _very_ confused. The original description sounds as if: In this history, when rewriting commit C, if we emit A from the filter: B \ ---A---C---D We will somehow make 'A' and 'B' have A as their parents. which is wrong as you pointed out. But I am also confused by the new description: In that history, we will make sure that rewritten D (original commit being C) have A as parent. IOW, we will have --A'--C' D' / A which is not what happens. What it does is that the commits in the output from the filter (i.e. A) are first mapped to the corresponding commits in the rewritten history (i.e. A'), and they will be used as the parents of the rewritten commit, to form this history: --A'--C' isn't it? Also you did not defend why you added "You probably don't want to do this" to the description. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-30 22:52 ` Junio C Hamano @ 2008-05-30 23:07 ` Kevin Ballard 2008-05-30 23:41 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Kevin Ballard @ 2008-05-30 23:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On May 30, 2008, at 3:52 PM, Junio C Hamano wrote: > Kevin Ballard <kevin@sb.org> writes: > >> The old description was misleading and logically impossible. It >> claimed that >> the ancestors of the original commit would be re-written to have >> the multiple >> emitted ids as parents. Not only would this modify existing >> objects, but it >> would create a cycle. What this actually does is pass the multiple >> emitted ids >> to the newly-created children to use as parents. >> >> Signed-off-by: Kevin Ballard <kevin@sb.org> >> --- >> Documentation/git-filter-branch.txt | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/git-filter-branch.txt b/Documentation/ >> git-filter-branch.txt >> index 506c37a..541bf23 100644 >> --- a/Documentation/git-filter-branch.txt >> +++ b/Documentation/git-filter-branch.txt >> @@ -113,8 +113,8 @@ OPTIONS >> stdin. The commit id is expected on stdout. >> + >> As a special extension, the commit filter may emit multiple >> -commit ids; in that case, ancestors of the original commit will >> -have all of them as parents. >> +commit ids; in that case, the rewritten children of the original >> commit will >> +have all of them as parents. You probably don't want to do this. >> + > > Now I am _very_ confused. > > The original description sounds as if: > > In this history, when rewriting commit C, if we emit A from the > filter: > > B > \ > ---A---C---D > > We will somehow make 'A' and 'B' have A as their parents. > > which is wrong as you pointed out. > > But I am also confused by the new description: > > In that history, we will make sure that rewritten D (original > commit being C) have A as parent. IOW, we will have > > --A'--C' D' > / > A > > which is not what happens. What it does is that the commits in the > output > from the filter (i.e. A) are first mapped to the corresponding > commits in > the rewritten history (i.e. A'), and they will be used as the > parents of > the rewritten commit, to form this history: > > --A'--C' > > isn't it? So basically, you think it's missing the fact that the emitted id is mapped to rewritten commits? From reading the git-filter-branch code, I don't think that's correct. When each commit is created, its original parents get mapped to new values, but the results of the commit-filter are dumped straight into the map. To give an example, let's examine your tree. A will be processed first, and A' gets put into the map for A. B gets processed next (or maybe before A, but that's irrelevant) and B' gets put into the map for B. C gets processed, and it emits A, so A goes into the map for C. Then D is processed, and its original parent C is looked up in the map and A is returned. So, as near as I can tell, that "broken" history is exactly what you'll get if the commit-filter returns A for C. This means that when you're writing a commit-filter for this, you probably want to emit $(map A), not A. Perhaps the description should be significantly expanded to include the diagrams and explanations? > Also you did not defend why you added "You probably don't want to do > this" > to the description. Because when the commit-filter emits multiple ids, it's converting the child commits into merges without even knowing what the child commits will be. I was just trying to warn people away from using this feature unless they know exactly what they're doing. Usually you want to use a parent-filter if you're converting commits into merges, because that way you know exactly what commits you're modifying. -Kevin -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-30 23:07 ` Kevin Ballard @ 2008-05-30 23:41 ` Junio C Hamano 2008-05-31 0:33 ` Kevin Ballard 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2008-05-30 23:41 UTC (permalink / raw) To: Kevin Ballard; +Cc: git Kevin Ballard <kevin@sb.org> writes: >> But I am also confused by the new description: >> >> In that history, we will make sure that rewritten D (original >> commit being C) have A as parent. IOW, we will have >> >> --A'--C' D' >> / >> A >> >> which is not what happens. What it does is that the commits in the >> output >> from the filter (i.e. A) are first mapped to the corresponding >> commits in >> the rewritten history (i.e. A'), and they will be used as the >> parents of >> the rewritten commit, to form this history: >> >> --A'--C' >> >> isn't it? > > So basically, you think it's missing the fact that the emitted id is > mapped to rewritten commits? From reading the git-filter-branch code, > I don't think that's correct. When each commit is created, its > original parents get mapped to new values, but the results of the > commit-filter are dumped straight into the map. Ah, I misread this part of the code: parentstr= for parent in $parents; do for reparent in $(map "$parent"); do parentstr="$parentstr -p $reparent" done done if [ "$filter_parent" ]; then parentstr="$(echo "$parentstr" | eval "$filter_parent")" || die "parent filter failed: $filter_parent" fi You get the commit object names from the new history, and you are supposed to give back names from the new history from the filter. So "the rewritten commit will have the output from parent-filter as its parents" is what happens, right? IOW, in the history in the previous message, when rewriting C (to create C'), the filter will get A' and B' (i.e. from the new history), and can choose to return A', and that is recorded when creating C'. "the rewritten children of the commit" in your: +commit ids; in that case, the rewritten children of the original +commit will have all of them as parents. sounded as if you are talking about D' not C', and that was what I was confused about. > To give an example, let's examine your tree. A will be processed > first, and A' gets put into the map for A. B gets processed next (or > maybe before A, but that's irrelevant) and B' gets put into the map > for B. C gets processed, and it emits A, so A goes into the map for C. Hmm? I meant C is rewritten to become C' but when it does so filter can remove B from its parent set (iow, the filter is told that unless it intervenes C' will have A' and B' as its parents, but the filter can choose to return only A'). So I do not quite get "it emits A" part. Do you mean "the filter outputs A'"? Also do you mean by "map" the mapping from <A, B, C> to <A', B', C'> commit namespace? If so, even when the filter "emits A'", I do not think it "goes into the map for C". When the filter "emits A'", it is used as _the_ single parent to create C', and it is C' that "goes into the map for C". Am I still confused? Now, I admit that I did not look at the implementation of the "map" in the code quoted above. Perhaps that thing is busted, I dunno. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-30 23:41 ` Junio C Hamano @ 2008-05-31 0:33 ` Kevin Ballard 2008-05-31 1:48 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Kevin Ballard @ 2008-05-31 0:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On May 30, 2008, at 4:41 PM, Junio C Hamano wrote: > Kevin Ballard <kevin@sb.org> writes: > >>> But I am also confused by the new description: >>> >>> In that history, we will make sure that rewritten D (original >>> commit being C) have A as parent. IOW, we will have >>> >>> --A'--C' D' >>> / >>> A >>> >>> which is not what happens. What it does is that the commits in the >>> output >>> from the filter (i.e. A) are first mapped to the corresponding >>> commits in >>> the rewritten history (i.e. A'), and they will be used as the >>> parents of >>> the rewritten commit, to form this history: >>> >>> --A'--C' >>> >>> isn't it? >> >> So basically, you think it's missing the fact that the emitted id is >> mapped to rewritten commits? From reading the git-filter-branch code, >> I don't think that's correct. When each commit is created, its >> original parents get mapped to new values, but the results of the >> commit-filter are dumped straight into the map. > > Ah, I misread this part of the code: > > parentstr= > for parent in $parents; do > for reparent in $(map "$parent"); do > parentstr="$parentstr -p $reparent" > done > done > if [ "$filter_parent" ]; then > parentstr="$(echo "$parentstr" | eval "$filter_parent")" || > die "parent filter failed: $filter_parent" > fi > > You get the commit object names from the new history, and you are > supposed > to give back names from the new history from the filter. > > So "the rewritten commit will have the output from parent-filter as > its > parents" is what happens, right? > > IOW, in the history in the previous message, when rewriting C (to > create > C'), the filter will get A' and B' (i.e. from the new history), and > can > choose to return A', and that is recorded when creating C'. "the > rewritten children of the commit" in your: > > +commit ids; in that case, the rewritten children of the original > +commit will have all of them as parents. > > sounded as if you are talking about D' not C', and that was what I was > confused about. I was talking about D'. You've switched from talking about the commit- filter to talking about the parent-filter. If a parent-filter is specified, it will be run on the parents that are handed to the commit-filter (before the commit-filter is invoked), but that's not relevant right now. When using the commit-filter to rewrite C (to create C'), when A is returned, that's dumped into the map so in the future, $(map C) will return A. This means that when D is processed, it's handed A as the parent. >> To give an example, let's examine your tree. A will be processed >> first, and A' gets put into the map for A. B gets processed next (or >> maybe before A, but that's irrelevant) and B' gets put into the map >> for B. C gets processed, and it emits A, so A goes into the map for >> C. > > Hmm? I meant C is rewritten to become C' but when it does so filter > can > remove B from its parent set (iow, the filter is told that unless it > intervenes C' will have A' and B' as its parents, but the filter can > choose to return only A'). So I do not quite get "it emits A" > part. Do > you mean "the filter outputs A'"? Also do you mean by "map" the > mapping > from <A, B, C> to <A', B', C'> commit namespace? If so, even when the > filter "emits A'", I do not think it "goes into the map for C". > When the > filter "emits A'", it is used as _the_ single parent to create C', > and it > is C' that "goes into the map for C". Am I still confused? > > Now, I admit that I did not look at the implementation of the "map" > in the > code quoted above. Perhaps that thing is busted, I dunno. You're still talking about the parent-filter here. I think you're quite confused. Given this history to start: B \ A---C---D Also given that we're specifying a commit-filter but *not* a parent- filter. When that commit-filter is given A and B, it returns A' and B' without doing any changes (A' and B' in this case come from git-commit-tree). When given C, the commit-filter returns A. When given D, it returns D' again with no changes. The resulting history will look like this: B' A' A---D' As you can see, there is no C' commit (as the commit-filter simply returned A), and the A' and B' commits are left dangling. Here is a shell script which will generate this exact repo: http://pastie.textmate.org/206265 If you inspect the result, you'll see HEAD points to D' with a parent of A, and that there are two dangling commits A' and B'. What I was talking about with the map can be demonstrated here. If you edit the commit-filter to print out some debug info while writing D', you can see that $(map $(git rev-parse C)) will output $(git rev-parse A). -Kevin -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-31 0:33 ` Kevin Ballard @ 2008-05-31 1:48 ` Junio C Hamano 2008-05-31 22:33 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2008-05-31 1:48 UTC (permalink / raw) To: Kevin Ballard; +Cc: git Kevin Ballard <kevin@sb.org> writes: > You're still talking about the parent-filter here. I think you're > quite confused. Blush. I should go to bed. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-31 1:48 ` Junio C Hamano @ 2008-05-31 22:33 ` Junio C Hamano 2008-05-31 23:50 ` Kevin Ballard 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2008-05-31 22:33 UTC (permalink / raw) To: Kevin Ballard; +Cc: git, Johannes Schindelin, Petr Baudis Junio C Hamano <gitster@pobox.com> writes: > Kevin Ballard <kevin@sb.org> writes: > >> You're still talking about the parent-filter here. I think you're >> quite confused. > > Blush. I should go to bed. Now after following the codepath, your original diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 506c37a..541bf23 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -113,8 +113,8 @@ OPTIONS stdin. The commit id is expected on stdout. + As a special extension, the commit filter may emit multiple -commit ids; in that case, ancestors of the original commit will -have all of them as parents. +commit ids; in that case, the rewritten children of the original commit will +have all of them as parents. You probably don't want to do this. + You can use the 'map' convenience function in this filter, and other convenience functions, too. For example, calling 'skip_commit "$@"' does make sense to me. Except for "You probably don't want to do this." part. It is just "the utility of this feature is unknown to us" ;-) I dug the code with "git blame" and the basic logic has been the same since its introduction to git with 6f6826c (Add git-filter-branch, 2007-06-03). The commit-filter itself appeared first in Cogito as d690516 (cg-admin-rewritehist --commit-filter for omitting commits, 2006-03-26), and the commit log message claims that it was primarily meant to _omit_ unwanted commits from the history, but at the same time it advertises the multiple commits case as a "feature" without telling why somebody wants to do so. Except for this gem, which may have been lost in our copy: # ... Note that this handles merges properly! In case Darl # committed a merge between P1 and P2, it will be propagated properly # and all children of the merge will become merge commits with P1,P2 # as their parents instead of the merge commit. IOW, to rewrite this history: ---A---C---D---E / B to pretend C never happened, you would give A' and B' back when you rewrite C, to end up with this history: ---A'--D'--E' / B' I'd agree with "You probably don't want to do this", but perhaps it needs a bit of clarification as to _why_ you would not: - If the history is being rewritten for the whole tree, this will make D' an evil merge that contains difference between C to D. - If the filtering of the history is done to ignore parts of the tree that is touched between C and D (iow, history simplification would leave trees C and D the same), you would want to simplify away D' not C'. IOW, you would want the resulting history to look like: ---A'--C'--E' / B' and for that you do not need to use this "feature". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter 2008-05-31 22:33 ` Junio C Hamano @ 2008-05-31 23:50 ` Kevin Ballard 0 siblings, 0 replies; 8+ messages in thread From: Kevin Ballard @ 2008-05-31 23:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Johannes Schindelin, Petr Baudis On May 31, 2008, at 3:33 PM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Kevin Ballard <kevin@sb.org> writes: >> >>> You're still talking about the parent-filter here. I think you're >>> quite confused. >> >> Blush. I should go to bed. > > Now after following the codepath, your original > > diff --git a/Documentation/git-filter-branch.txt b/Documentation/ > git-filter-branch.txt > index 506c37a..541bf23 100644 > --- a/Documentation/git-filter-branch.txt > +++ b/Documentation/git-filter-branch.txt > @@ -113,8 +113,8 @@ OPTIONS > stdin. The commit id is expected on stdout. > + > As a special extension, the commit filter may emit multiple > -commit ids; in that case, ancestors of the original commit will > -have all of them as parents. > +commit ids; in that case, the rewritten children of the original > commit will > +have all of them as parents. You probably don't want to do this. > + > You can use the 'map' convenience function in this filter, and > other > convenience functions, too. For example, calling 'skip_commit > "$@"' > > does make sense to me. Except for "You probably don't want to do > this." > part. It is just "the utility of this feature is unknown to us" ;-) > > I dug the code with "git blame" and the basic logic has been the same > since its introduction to git with 6f6826c (Add git-filter-branch, > 2007-06-03). The commit-filter itself appeared first in Cogito as > d690516 > (cg-admin-rewritehist --commit-filter for omitting commits, > 2006-03-26), > and the commit log message claims that it was primarily meant to > _omit_ > unwanted commits from the history, but at the same time it > advertises the > multiple commits case as a "feature" without telling why somebody > wants to > do so. > > Except for this gem, which may have been lost in our copy: > > # ... Note that this handles merges properly! In case Darl > # committed a merge between P1 and P2, it will be propagated > properly > # and all children of the merge will become merge commits with > P1,P2 > # as their parents instead of the merge commit. > > IOW, to rewrite this history: > > ---A---C---D---E > / > B > > to pretend C never happened, you would give A' and B' back when you > rewrite C, to end up with this history: > > ---A'--D'--E' > / > B' > > I'd agree with "You probably don't want to do this", but perhaps it > needs > a bit of clarification as to _why_ you would not: > > - If the history is being rewritten for the whole tree, this will > make D' an evil merge that contains difference between C to D. > > - If the filtering of the history is done to ignore parts of the tree > that is touched between C and D (iow, history simplification would > leave trees C and D the same), you would want to simplify away D' > not > C'. IOW, you would want the resulting history to look like: > > ---A'--C'--E' > / > B' > > and for that you do not need to use this "feature". Yeah, this utility of omitting commits occurred to me last night after I went to bed. It does seem pretty limited in use, but I guess someone might want to do it. For example, if C resolved merge conflicts incorrectly and D fixed it, and then later somebody said "why do I have two commits when I should just have one?" and wanted to omit C and leave D behind as the merge. I'll submit a new patch later that has better wording and perhaps a diagram or two. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-05-31 23:50 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-30 21:43 [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter Kevin Ballard 2008-05-30 22:52 ` Junio C Hamano 2008-05-30 23:07 ` Kevin Ballard 2008-05-30 23:41 ` Junio C Hamano 2008-05-31 0:33 ` Kevin Ballard 2008-05-31 1:48 ` Junio C Hamano 2008-05-31 22:33 ` Junio C Hamano 2008-05-31 23:50 ` Kevin Ballard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox