* git filter-branch --subdirectory-filter, still a mistery @ 2008-08-06 13:39 Jan Wielemaker 2008-08-07 7:13 ` Jan Wielemaker 2008-08-07 7:50 ` Thomas Rast 0 siblings, 2 replies; 40+ messages in thread From: Jan Wielemaker @ 2008-08-06 13:39 UTC (permalink / raw) To: git Hi, I've been puzzling most of today to do something that must be simple. I've got a big repo which contains a project with several nicely related subprojects in directories. Only now, we want to share some of these subprojects with another project. I.e. they must start to live there own life. Of course, I would like to keep the history. So, I did (git --version: 1.5.6.GIT): % git clone /home/git/pl.git % cd pl % git filter-branch --subdirectory-filter packages/chr HEAD This indeed creates a nice directory holding only the contents of packages/chr. But, starting qgit I see that all commits, also those that had absolutely nothing to do with this dir are still there. Also, all tags are still there with exactly the same SHA1 as the original. I'd expect the tags to be rewritten such that their SHA1 refers to the state of this single directory and its contents!? Of course, these tags give me access to everything, so the repository doesn't shrink much too. I must be missing something important ... I found similar complaints, but few decent answers and the few answer I did find appeared outdated. The one at http://use.perl.org/~rjbs/journal/34411 comes closest, although the reset --hard is no longer needed and the copying and gc-ing doesn't help much anymore. Should I write a tree-filter that removes all but the directory I want to keep? I.e. something like this? Feels like and overkill and I fear I'll have a lot of empty commits left. 'mv packages/chr .. && rm -r * && mv ../chr/* . && rmdir ../chr' I'll be grateful for a clue! Cheers --- Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: git filter-branch --subdirectory-filter, still a mistery 2008-08-06 13:39 git filter-branch --subdirectory-filter, still a mistery Jan Wielemaker @ 2008-08-07 7:13 ` Jan Wielemaker 2008-08-07 7:50 ` Thomas Rast 1 sibling, 0 replies; 40+ messages in thread From: Jan Wielemaker @ 2008-08-07 7:13 UTC (permalink / raw) To: git On Wednesday 06 August 2008 15:39:50 you wrote: > Hi, > > I've been puzzling most of today to do something that must be simple. > I've got a big repo which contains a project with several nicely related > subprojects in directories. Only now, we want to share some of these > subprojects with another project. I.e. they must start to live there own > life. Of course, I would like to keep the history. So, I did (git > --version: 1.5.6.GIT): > > % git clone /home/git/pl.git > % cd pl > % git filter-branch --subdirectory-filter packages/chr HEAD > > This indeed creates a nice directory holding only the contents of > packages/chr. But, starting qgit I see that all commits, also those > that had absolutely nothing to do with this dir are still there. Also, > all tags are still there with exactly the same SHA1 as the original. > I'd expect the tags to be rewritten such that their SHA1 refers to the > state of this single directory and its contents!? Of course, these > tags give me access to everything, so the repository doesn't shrink > much too. > > I must be missing something important ... I found similar complaints, > but few decent answers and the few answer I did find appeared outdated. > The one at http://use.perl.org/~rjbs/journal/34411 comes closest, although > the reset --hard is no longer needed and the copying and gc-ing doesn't > help much anymore. > > Should I write a tree-filter that removes all but the directory I want > to keep? I.e. something like this? Feels like and overkill and I fear > I'll have a lot of empty commits left. > > 'mv packages/chr .. && rm -r * && mv ../chr/* . && rmdir ../chr' > > I'll be grateful for a clue! Weirdness goes on. I tried this: git filter-branch --tree-filter '/home/jan/nobackup/tmp2/keep packages/chr' where `keep' is a shell-script: ---------------------------------------------------------------- tmp=/home/jan/nobackup/tmp2 dir="$1" if [ -d "$dir" ]; then b=`basename $dir` mv "$dir" $tmp/$b rm -rf * mv $tmp/$b/* . mv $tmp/$b/.??* . rmdir $tmp/$b else rm -rf * fi ---------------------------------------------------------------- This kind of works. I.e. I end up (after 3 hours) with a tree that only contains files from packages/chr. Using qgit it no longer shows the other files in the `tree' view. Only, it has *all* commits of the original project, most of which of course do not change this directory, but now at least their diff is empty. I'd assume there is a command to remove these (which?) Space wise this isn't ok. The original project GIT is 140M, after this action and a git gc, it is 63M: *much* too big. Whats more weird: all tags still have the same sha1. I copied using git clone --no-hardlinks pl chr, deleted all refs/tags from packed-refs and gave a "git gc --prune", to end up with 1.1 GIGABYTE repository!? I'm starting to feel a bit stupid that I can't get this done ... Clues? --- Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: git filter-branch --subdirectory-filter, still a mistery 2008-08-06 13:39 git filter-branch --subdirectory-filter, still a mistery Jan Wielemaker 2008-08-07 7:13 ` Jan Wielemaker @ 2008-08-07 7:50 ` Thomas Rast 2008-08-07 10:14 ` Jan Wielemaker 2008-08-07 14:04 ` [PATCH] Documentation: filter-branch: document how to filter all refs Thomas Rast 1 sibling, 2 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-07 7:50 UTC (permalink / raw) To: Jan Wielemaker; +Cc: git [-- Attachment #1: Type: text/plain, Size: 837 bytes --] Jan Wielemaker wrote: [...] > % git filter-branch --subdirectory-filter packages/chr HEAD > > This indeed creates a nice directory holding only the contents of > packages/chr. But, starting qgit I see that all commits, also those > that had absolutely nothing to do with this dir are still there. The trick is to rewrite all refs, not just HEAD. I usually proceed as follows: cp -a repo repo.old # just to keep a backup cd repo git filter-branch --subdirectory-filter somedir -- --all The --all tells it to rewrite as many refs as possible. Note that the -- is required. Also note that refs/original/* will still point to the old commits, so they won't "just vanish". You may want to clone the repository or delete them manually once you are sure the filter-branch did the right thing. - Thomas [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: git filter-branch --subdirectory-filter, still a mistery 2008-08-07 7:50 ` Thomas Rast @ 2008-08-07 10:14 ` Jan Wielemaker 2008-08-07 23:48 ` Thomas Rast 2008-08-07 14:04 ` [PATCH] Documentation: filter-branch: document how to filter all refs Thomas Rast 1 sibling, 1 reply; 40+ messages in thread From: Jan Wielemaker @ 2008-08-07 10:14 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2880 bytes --] Hi Thomas, On Thursday 07 August 2008 09:50:03 am Thomas Rast wrote: > Jan Wielemaker wrote: > [...] > > > % git filter-branch --subdirectory-filter packages/chr HEAD > > > > This indeed creates a nice directory holding only the contents of > > packages/chr. But, starting qgit I see that all commits, also those > > that had absolutely nothing to do with this dir are still there. > > The trick is to rewrite all refs, not just HEAD. I usually proceed as > follows: > > cp -a repo repo.old # just to keep a backup > cd repo > git filter-branch --subdirectory-filter somedir -- --all > > The --all tells it to rewrite as many refs as possible. Note that the > -- is required. Also note that refs/original/* will still point to > the old commits, so they won't "just vanish". You may want to clone > the repository or delete them manually once you are sure the > filter-branch did the right thing. Thanks. That is moving in the right direction! There are some, possibly related, problems left (using 1.5.6.GIT). According to git fsck, my repo is clean. I got: Ref 'refs/tags/V5.6.50' was rewritten error: Ref refs/tags/V5.6.50 is at 8678b32f71178019c06aefa40e2d3fb9a2e8ef25 but expected 2e8aef64e2fed088720a19ac2ffa2481e5bc7806 fatal: Cannot lock the ref 'refs/tags/V5.6.50'. Could not rewrite refs/tags/V5.6.50 Now, if I look in .git/packed-refs, I see this (i.e. a second line with a ^) for all refs that cause problems: 274ec8ac671542206ba3567ff5d72b3e54c5603c refs/tags/V5.6.59 ^28920c3c0a184698d9cd15a65cd643367200bbf5 faf203f9d9e350d84b6b38b7746e710b6232fc97 refs/tags/V5.6.58 ^1edb1adedcc47ec15c3242234cc6b7ede94bbfba 48488c871227beabcb3ba167b737d6e33ced65bc refs/tags/V5.6.57 ^766587b09e3d2f09c87b03ad0d7faf3529c9dcff After a bit of puzzling I discovered the the SHA1 after the ^ refers to the actual commit and I changed all these to `lightweight' tags by putting the SHA1 behind ^ before the tag itself. I wrote a little sh/awk script to automate this (attached). Now it runs to the end. Unfortunagtely the history is completely screwed up :-(: * There are a lot of commits that are not related to the dir * Commits start long before the directory came into existence, Looks like it just shows the whole project at this place. I think the problem is related to the fact that the directory I want to filter didn't exists at the start of the project. Looking at git-rev-list, I found --remove-empty, so I added that after the --all, but that doesn't appear to help. I must admit I don't really know what I'm doing (though I still think the result I want it well defined and its hard to imagine I'm the only person who wants this). If someone wants to help: clone git://gollem.science.uva.nl/home/git/pl.git and try to filter the dir packages/chr. You can browse the git at http://gollem.science.uva.nl/git/pl.git Thanks --- Jan [-- Attachment #2: git-lightweight-tags --] [-- Type: application/x-shellscript, Size: 677 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: git filter-branch --subdirectory-filter, still a mistery 2008-08-07 10:14 ` Jan Wielemaker @ 2008-08-07 23:48 ` Thomas Rast 2008-08-07 23:50 ` [PATCH] filter-branch: be more helpful when an annotated tag changes Thomas Rast ` (3 more replies) 0 siblings, 4 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-07 23:48 UTC (permalink / raw) To: Jan Wielemaker; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1734 bytes --] Jan Wielemaker wrote: > Ref 'refs/tags/V5.6.50' was rewritten > error: Ref refs/tags/V5.6.50 is at 8678b32f71178019c06aefa40e2d3fb9a2e8ef25 > but > expected 2e8aef64e2fed088720a19ac2ffa2481e5bc7806 > fatal: Cannot lock the ref 'refs/tags/V5.6.50'. > Could not rewrite refs/tags/V5.6.50 [...] > Now, if I look in .git/packed-refs [...] and I changed all these to > `lightweight' tags This appears to be a bug. I've whipped up a patch that will follow and should fix the bug. It has nothing to do with packed-refs; the current filter-branch chokes on annotated tags during --subdirectory-filter, even though there is support for tag rewriting. However, to enable tag rewriting, you need to say --tag-name-filter cat. > Now it runs to the end. Unfortunagtely the history is completely > screwed up :-(: > > * There are a lot of commits that are not related to the dir > * Commits start long before the directory came into existence, > Looks like it just shows the whole project at this place. For some reason the ancestor detection does not work right. I'm also following up with an RFH patch that significantly improves the success rate (in terms of branches and tags successfully mapped to a rewritten commit) in the case of your repository. I doubt more staring at the code would yield any more ideas at this hour, so ideas would be appreciated. The rest is just the other commits/tags showing a lot of the history. I don't know of any built-in way to prune the branches and tags that aren't part of the new master, but git branch -a --no-merged master can tell you which branches aren't ancestors of master. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] filter-branch: be more helpful when an annotated tag changes 2008-08-07 23:48 ` Thomas Rast @ 2008-08-07 23:50 ` Thomas Rast 2008-08-08 20:10 ` [TOY PATCH] filter-branch: add option --delete-unchanged Thomas Rast 2008-08-07 23:54 ` [RFH] filter-branch: ancestor detection weirdness Thomas Rast ` (2 subsequent siblings) 3 siblings, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-07 23:50 UTC (permalink / raw) To: git; +Cc: Jan Wielemaker, gitster Previously, git-filter-branch failed if it attempted to update an annotated tag. Now we ignore this condition if --tag-name-filter is given, so that we can later rewrite the tag. If no such option was provided, we warn the user that he might want to run with --tag-name-filter cat to achieve the intended effect. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- git-filter-branch.sh | 14 +++++++++++--- 1 files changed, 11 insertions(+), 3 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 182822a..a324cf0 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -361,9 +361,17 @@ do ;; $_x40) echo "Ref '$ref' was rewritten" - git update-ref -m "filter-branch: rewrite" \ - "$ref" $rewritten $sha1 || - die "Could not rewrite $ref" + if ! git update-ref -m "filter-branch: rewrite" \ + "$ref" $rewritten $sha1 2>/dev/null; then + if test $(git cat-file -t "$ref") = tag; then + if test -z "$filter_tag_name"; then + warn "WARNING: You said to rewrite tagged commits, but not the corresponding tag." + warn "WARNING: Perhaps use '--tag-name-filter cat' to rewrite the tag." + fi + else + die "Could not rewrite $ref" + fi + fi ;; *) # NEEDSWORK: possibly add -Werror, making this an error -- 1.6.0.rc2.19.g3c9ba ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [TOY PATCH] filter-branch: add option --delete-unchanged 2008-08-07 23:50 ` [PATCH] filter-branch: be more helpful when an annotated tag changes Thomas Rast @ 2008-08-08 20:10 ` Thomas Rast 2008-08-09 0:35 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-08 20:10 UTC (permalink / raw) To: git; +Cc: Jan Wielemaker With --delete-unchanged, we nuke refs whose targets did not change during rewriting. It is intended to be used along with --subdirectory-filter to clean out old refs from before the first commit to the filtered subdirectory. (They would otherwise keep the old history alive.) Obviously this is a rather dangerous mode of operation. Note the "sort -u" is required: Without it, --all includes 'origin/master' twice (from 'origin/master' and via 'origin/HEAD'), and the second pass concludes it is unchanged and nukes the ref. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- This applies on top of "filter-branch: be more helpful when an annotated tag changes". I'm not really sure if this should go in, but it might have solved Jan's problem. git-filter-branch.sh | 33 +++++++++++++++++++++++---------- 1 files changed, 23 insertions(+), 10 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index a140337..539b2e6 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -114,6 +114,7 @@ filter_tag_name= filter_subdir= orig_namespace=refs/original/ force= +delete_unchanged= while : do case "$1" in @@ -126,6 +127,11 @@ do force=t continue ;; + --delete-unchanged-refs) + shift + delete_unchanged=t + continue + ;; -*) ;; *) @@ -215,6 +221,7 @@ export GIT_DIR GIT_WORK_TREE # The refs should be updated if their heads were rewritten git rev-parse --no-flags --revs-only --symbolic-full-name --default HEAD "$@" | +sort -u | sed -e '/^^/d' >"$tempdir"/heads test -s "$tempdir"/heads || @@ -344,7 +351,7 @@ do sha1=$(git rev-parse "$ref"^0) rewritten=$(map $sha1) - test $sha1 = "$rewritten" && + test $sha1 = "$rewritten" -a -z "$delete_unchanged" && warn "WARNING: Ref '$ref' is unchanged" && continue @@ -355,16 +362,22 @@ do die "Could not delete $ref" ;; $_x40) - echo "Ref '$ref' was rewritten" - if ! git update-ref -m "filter-branch: rewrite" \ - "$ref" $rewritten $sha1 2>/dev/null; then - if test $(git cat-file -t "$ref") = tag; then - if test -z "$filter_tag_name"; then - warn "WARNING: You said to rewrite tagged commits, but not the corresponding tag." - warn "WARNING: Perhaps use '--tag-name-filter cat' to rewrite the tag." + if test "$delete_unchanged" -a $sha1 = "$rewritten"; then + echo "Ref '$ref' was deleted because it is unchanged" + git update-ref -m "filter-branch: delete" -d "$ref" $sha1 || + die "Could not delete $ref" + else + echo "Ref '$ref' was rewritten" + if ! git update-ref -m "filter-branch: rewrite" \ + "$ref" $rewritten $sha1 2>/dev/null; then + if test $(git cat-file -t "$ref") = tag; then + if test -z "$filter_tag_name"; then + warn "WARNING: You said to rewrite tagged commits, but not the corresponding tag." + warn "WARNING: Perhaps use '--tag-name-filter cat' to rewrite the tag." + fi + else + die "Could not rewrite $ref" fi - else - die "Could not rewrite $ref" fi fi ;; -- 1.6.0.rc2.24.gf1dd.dirty ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [TOY PATCH] filter-branch: add option --delete-unchanged 2008-08-08 20:10 ` [TOY PATCH] filter-branch: add option --delete-unchanged Thomas Rast @ 2008-08-09 0:35 ` Johannes Schindelin 2008-08-11 10:43 ` Jan Wielemaker 2008-09-14 16:29 ` Felipe Contreras 2 siblings, 0 replies; 40+ messages in thread From: Johannes Schindelin @ 2008-08-09 0:35 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Jan Wielemaker Hi, On Fri, 8 Aug 2008, Thomas Rast wrote: > With --delete-unchanged, we nuke refs whose targets did not change > during rewriting. Frankly, I do not see any value in this. Not even with your explanation. Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [TOY PATCH] filter-branch: add option --delete-unchanged 2008-08-08 20:10 ` [TOY PATCH] filter-branch: add option --delete-unchanged Thomas Rast 2008-08-09 0:35 ` Johannes Schindelin @ 2008-08-11 10:43 ` Jan Wielemaker 2008-09-14 16:29 ` Felipe Contreras 2 siblings, 0 replies; 40+ messages in thread From: Jan Wielemaker @ 2008-08-11 10:43 UTC (permalink / raw) To: Thomas Rast; +Cc: git Hi Thomas, On Friday 08 August 2008 10:10:24 pm Thomas Rast wrote: > With --delete-unchanged, we nuke refs whose targets did not change > during rewriting. It is intended to be used along with > --subdirectory-filter to clean out old refs from before the first > commit to the filtered subdirectory. (They would otherwise keep the > old history alive.) > > Obviously this is a rather dangerous mode of operation. > > Note the "sort -u" is required: Without it, --all includes > 'origin/master' twice (from 'origin/master' and via 'origin/HEAD'), > and the second pass concludes it is unchanged and nukes the ref. > > Signed-off-by: Thomas Rast <trast@student.ethz.ch> > --- > > This applies on top of "filter-branch: be more helpful when an > annotated tag changes". > > I'm not really sure if this should go in, but it might have solved > Jan's problem. I may hope it isn't just `my problem' :-) I tested with this patch, and I can confirm the following produces precisily what I want: git clone /home/git/pl.git/ cd pl git remote rm origin git filter-branch --subdirectory-filter packages/chr --tag-name-filter cat --delete-unchanged-refs -- --all rm -r .git/refs/original cd .. git clone file://pl chr chr is now a nice clean 2 MB repository, starting in 2004, the epoch of this directory rather than 1992 (the overall project epoch). B.t.w. Pretending a remote clone was the only way to get a nice 2MB repo. The initial is 140MB. After the filtering it is 62 MB. Funny: after a git gc it grows to 1.1 Gb!? Anyway, thanks a lot and I hope this makes it into the next git release! Cheers --- Jan > git-filter-branch.sh | 33 +++++++++++++++++++++++---------- > 1 files changed, 23 insertions(+), 10 deletions(-) > > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index a140337..539b2e6 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -114,6 +114,7 @@ filter_tag_name= > filter_subdir= > orig_namespace=refs/original/ > force= > +delete_unchanged= > while : > do > case "$1" in > @@ -126,6 +127,11 @@ do > force=t > continue > ;; > + --delete-unchanged-refs) > + shift > + delete_unchanged=t > + continue > + ;; > -*) > ;; > *) > @@ -215,6 +221,7 @@ export GIT_DIR GIT_WORK_TREE > > # The refs should be updated if their heads were rewritten > git rev-parse --no-flags --revs-only --symbolic-full-name --default HEAD > "$@" | +sort -u | > sed -e '/^^/d' >"$tempdir"/heads > > test -s "$tempdir"/heads || > @@ -344,7 +351,7 @@ do > sha1=$(git rev-parse "$ref"^0) > rewritten=$(map $sha1) > > - test $sha1 = "$rewritten" && > + test $sha1 = "$rewritten" -a -z "$delete_unchanged" && > warn "WARNING: Ref '$ref' is unchanged" && > continue > > @@ -355,16 +362,22 @@ do > die "Could not delete $ref" > ;; > $_x40) > - echo "Ref '$ref' was rewritten" > - if ! git update-ref -m "filter-branch: rewrite" \ > - "$ref" $rewritten $sha1 2>/dev/null; then > - if test $(git cat-file -t "$ref") = tag; then > - if test -z "$filter_tag_name"; then > - warn "WARNING: You said to rewrite tagged commits, but not the > corresponding tag." - warn "WARNING: Perhaps use '--tag-name-filter > cat' to rewrite the tag." + if test "$delete_unchanged" -a $sha1 = > "$rewritten"; then > + echo "Ref '$ref' was deleted because it is unchanged" > + git update-ref -m "filter-branch: delete" -d "$ref" $sha1 || > + die "Could not delete $ref" > + else > + echo "Ref '$ref' was rewritten" > + if ! git update-ref -m "filter-branch: rewrite" \ > + "$ref" $rewritten $sha1 2>/dev/null; then > + if test $(git cat-file -t "$ref") = tag; then > + if test -z "$filter_tag_name"; then > + warn "WARNING: You said to rewrite tagged commits, but not the > corresponding tag." + warn "WARNING: Perhaps use '--tag-name-filter > cat' to rewrite the tag." + fi > + else > + die "Could not rewrite $ref" > fi > - else > - die "Could not rewrite $ref" > fi > fi > ;; ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [TOY PATCH] filter-branch: add option --delete-unchanged 2008-08-08 20:10 ` [TOY PATCH] filter-branch: add option --delete-unchanged Thomas Rast 2008-08-09 0:35 ` Johannes Schindelin 2008-08-11 10:43 ` Jan Wielemaker @ 2008-09-14 16:29 ` Felipe Contreras 2 siblings, 0 replies; 40+ messages in thread From: Felipe Contreras @ 2008-09-14 16:29 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Jan Wielemaker On Fri, Aug 8, 2008 at 11:10 PM, Thomas Rast <trast@student.ethz.ch> wrote: > With --delete-unchanged, we nuke refs whose targets did not change > during rewriting. It is intended to be used along with > --subdirectory-filter to clean out old refs from before the first > commit to the filtered subdirectory. (They would otherwise keep the > old history alive.) > > Obviously this is a rather dangerous mode of operation. > > Note the "sort -u" is required: Without it, --all includes > 'origin/master' twice (from 'origin/master' and via 'origin/HEAD'), > and the second pass concludes it is unchanged and nukes the ref. This is really useful, why isn't it merged? Personally I use filter-branch to, duh, filter a branch, so I don't want the commit objects that are not filtered, nor the refs to them. -- Felipe Contreras ^ permalink raw reply [flat|nested] 40+ messages in thread
* [RFH] filter-branch: ancestor detection weirdness 2008-08-07 23:48 ` Thomas Rast 2008-08-07 23:50 ` [PATCH] filter-branch: be more helpful when an annotated tag changes Thomas Rast @ 2008-08-07 23:54 ` Thomas Rast 2008-08-08 11:42 ` Johannes Schindelin 2008-08-08 7:44 ` git filter-branch --subdirectory-filter, still a mistery Jan Wielemaker 2008-08-08 11:25 ` Jan Wielemaker 3 siblings, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-07 23:54 UTC (permalink / raw) To: git; +Cc: Jan Wielemaker THIS WILL VERY LIKELY NOT WORK IN ALL CASES. Use git rev-list -1 -- <subdir> to discover a random ancestor, instead of more correct boundary detection. Oddly enough, this _increases_ success rate with Jan's repository and --all. May break randomly with more complicated args. --- Maybe someone understands what's going on and can fix the underlying bug... git-filter-branch.sh | 12 +++--------- 1 files changed, 3 insertions(+), 9 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 182822a..52b2bdf 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -325,15 +325,9 @@ while read ref do sha1=$(git rev-parse "$ref"^0) test -f "$workdir"/../map/$sha1 && continue - # Assign the boundarie(s) in the set of rewritten commits - # as the replacement commit(s). - # (This would look a bit nicer if --not --stdin worked.) - for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | - git rev-list $ref --boundary --stdin | - sed -n "s/^-//p") - do - map $p >> "$workdir"/../map/$sha1 - done + # Assign the first commit not pruned as the replacement. + candidate=$(git rev-list $ref -1 -- "$filter_subdir") + test "$candidate" && map "$candidate" > "$workdir"/../map/$sha1 done < "$tempdir"/heads # Finally update the refs -- 1.6.0.rc2.19.g3c9ba ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-07 23:54 ` [RFH] filter-branch: ancestor detection weirdness Thomas Rast @ 2008-08-08 11:42 ` Johannes Schindelin 2008-08-08 14:14 ` Thomas Rast 0 siblings, 1 reply; 40+ messages in thread From: Johannes Schindelin @ 2008-08-08 11:42 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Jan Wielemaker Hi, On Fri, 8 Aug 2008, Thomas Rast wrote: > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index 182822a..52b2bdf 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -325,15 +325,9 @@ while read ref > do > sha1=$(git rev-parse "$ref"^0) > test -f "$workdir"/../map/$sha1 && continue > - # Assign the boundarie(s) in the set of rewritten commits > - # as the replacement commit(s). > - # (This would look a bit nicer if --not --stdin worked.) > - for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | > - git rev-list $ref --boundary --stdin | > - sed -n "s/^-//p") > - do > - map $p >> "$workdir"/../map/$sha1 > - done > + # Assign the first commit not pruned as the replacement. > + candidate=$(git rev-list $ref -1 -- "$filter_subdir") Is it not just a question of adding '-- "$filter_subdir"' to the rev-list call you removed? Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-08 11:42 ` Johannes Schindelin @ 2008-08-08 14:14 ` Thomas Rast 2008-08-08 14:16 ` [PATCH] filter-branch: fix ancestor discovery for --subdirectory-filter Thomas Rast 2008-08-08 14:39 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 0 siblings, 2 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-08 14:14 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Jan Wielemaker [-- Attachment #1: Type: text/plain, Size: 3376 bytes --] Johannes Schindelin wrote: > > On Fri, 8 Aug 2008, Thomas Rast wrote: > > > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > > index 182822a..52b2bdf 100755 > > --- a/git-filter-branch.sh > > +++ b/git-filter-branch.sh > > @@ -325,15 +325,9 @@ while read ref > > do > > sha1=$(git rev-parse "$ref"^0) > > test -f "$workdir"/../map/$sha1 && continue > > - # Assign the boundarie(s) in the set of rewritten commits > > - # as the replacement commit(s). > > - # (This would look a bit nicer if --not --stdin worked.) > > - for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | > > - git rev-list $ref --boundary --stdin | > > - sed -n "s/^-//p") > > - do > > - map $p >> "$workdir"/../map/$sha1 > > - done > > + # Assign the first commit not pruned as the replacement. > > + candidate=$(git rev-list $ref -1 -- "$filter_subdir") I think I see the actual problem. I made a small testing repository with history that looks like this: * a6f2213... (refs/heads/master) Merge branch 'side' |\ | * 311f888... (refs/heads/side) outside | * 472893d... inside dir * | 9bd52bc... (refs/heads/stale) outside * | d1b451a... inside dir |/ * 1c48eea... initial It is available at git://persephone.dnsalias.net/git/filtertest.git if you want to try. All commits labelled 'inside dir' do something in dir/; the others don't. (You can disregard the 'other' branch for now; I wanted to test the behaviour on completely disconnected history too, since that's the case with Jan's repo.) Let's depict this as the following for now, where capitals stand for "interesting" commits under the subdirectory filter: i -- A -- b(stale) -- M(master) \ / \- C -- d(side) --/ When saying $ git filter-branch --subdirectory-filter dir -- --all' I would expect the history to look like: A(stale) -- M(master) / C(side) --/ I think treating it this way makes a lot of sense; you get the last state that your subdirectory had on the corresponding branch or tag. (Similarly, a leaf branch that does not affect 'dir' should be backed up until it hits an ancestor that survives the filter.) Now the problem with the above ancestor detection is the following. Consider that at this point, the 'map' directory contains the (unfiltered) SHA1 for every commit that was rewritten during the filtering process, i.e. $ g rev-list --all -- dir | git name-rev --stdin 093c591b3d751ce778b4a6e5c2a0906b097b5868 (other~1) a6f22134f8ab8bcc762949df53f674e3410f7fc3 (master) d1b451a4b0657ea894fd772fc609f7863b7dfd15 (stale~1) 472893d579383f56f006ff42c563dcbb730bc5b8 (side~1) So 'map' has the values for M, A, and C. Now if you expand the call (cd "$workdir"/../map; ls | sed "s/^/^/") | git rev-list $ref --boundary --stdin you'll find that during ref=refs/heads/side, it is equivalent to $ git rev-list side --boundary ^master ^side~1 ^stale~1 ^other~1 [no output!] Oops, it seems that wasn't what we wanted. The '^master', which reaches 'side' already, precludes all output. So now that I've finally understood what is going on, I think a more careful use of rev-list -1 is actually a correct and easy way to figure out an ancestor. Patch follows. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] filter-branch: fix ancestor discovery for --subdirectory-filter 2008-08-08 14:14 ` Thomas Rast @ 2008-08-08 14:16 ` Thomas Rast 2008-08-08 14:39 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 1 sibling, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-08 14:16 UTC (permalink / raw) To: git; +Cc: gitster, Johannes Schindelin, Jan Wielemaker The previous code failed on any refs that are (pre-rewrite) ancestors of commits marked for rewriting. This means that in a situation A -- B(topic) -- C(master) where B is dropped by --subdirectory-filter pruning, the 'topic' is not moved up to A as intended, but left unrewritten. Fix this by using a more stupid approach: we let 'rev-list -1' figure out a nearby ancestor, which handles the pruning automatically. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- I guarded it with a $filter_subdir check to not cause any unintended harm. It might be useful in some border cases of rev-list arguments given to filter-branch too, but I can't figure out a safe way to handle that. Either way, this fixes the problem. - Thomas git-filter-branch.sh | 27 +++++++++++---------------- 1 files changed, 11 insertions(+), 16 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index a324cf0..7924aa1 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -317,24 +317,19 @@ done <../revs # In case of a subdirectory filter, it is possible that a specified head # is not in the set of rewritten commits, because it was pruned by the -# revision walker. Fix it by mapping these heads to the next rewritten -# ancestor(s), i.e. the boundaries in the set of rewritten commits. +# revision walker. Fix it by mapping these heads to a (random!) nearby +# ancestor that survived the pruning. -# NEEDSWORK: we should sort the unmapped refs topologically first -while read ref -do - sha1=$(git rev-parse "$ref"^0) - test -f "$workdir"/../map/$sha1 && continue - # Assign the boundarie(s) in the set of rewritten commits - # as the replacement commit(s). - # (This would look a bit nicer if --not --stdin worked.) - for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | - git rev-list $ref --boundary --stdin | - sed -n "s/^-//p") +if test "$filter_subdir" +then + while read ref do - map $p >> "$workdir"/../map/$sha1 - done -done < "$tempdir"/heads + sha1=$(git rev-parse "$ref"^0) + test -f "$workdir"/../map/$sha1 && continue + ancestor=$(git rev-list -1 $ref -- "$filter_subdir") + test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 + done < "$tempdir"/heads +fi # Finally update the refs -- 1.6.0.rc2.22.g7d28.dirty ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-08 14:14 ` Thomas Rast 2008-08-08 14:16 ` [PATCH] filter-branch: fix ancestor discovery for --subdirectory-filter Thomas Rast @ 2008-08-08 14:39 ` Johannes Schindelin 2008-08-08 18:37 ` Thomas Rast 1 sibling, 1 reply; 40+ messages in thread From: Johannes Schindelin @ 2008-08-08 14:39 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Jan Wielemaker Hi, On Fri, 8 Aug 2008, Thomas Rast wrote: > I think a more careful use of rev-list -1 is actually a correct and easy > way to figure out an ancestor. I have not looked at your patch closely, or at your explanation, but I am really certain that every attempt to replace the --boundary with a -1 must fail. Let me show you why I think that. Just look at this history: A - B - C / D Where all commits except B touch the inside directory. Two options: - you make C a merge (that's what I tried with --boundary), or - you record B, and C as a commit that does not introduce changes, which is obviously wrong, or - you record B as a merge, with identical content as A and D, which is pretty tricky (which is why I avoided it). Anyway, I am really swamped in work, and will not have time to review big changes or explanations. Besides, filter-branch is no fun. rewrite-commits would have been, but Sven chickened out. Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-08 14:39 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin @ 2008-08-08 18:37 ` Thomas Rast 2008-08-08 18:39 ` [PATCH v2] filter-branch: fix ref rewriting with --subdirectory-filter Thomas Rast 2008-08-09 0:16 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 0 siblings, 2 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-08 18:37 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Jan Wielemaker [-- Attachment #1: Type: text/plain, Size: 1943 bytes --] Johannes Schindelin wrote: > On Fri, 8 Aug 2008, Thomas Rast wrote: > > > I think a more careful use of rev-list -1 is actually a correct and easy > > way to figure out an ancestor. > > I have not looked at your patch closely, or at your explanation, but I am > really certain that every attempt to replace the --boundary with a -1 must > fail. > > Let me show you why I think that. Just look at this history: > > A - B - C > / > D > > Where all commits except B touch the inside directory. Two options: 'rev-list' "solves" this problem for us. At the point where we are rewriting the branch pointers, commits have already been rewritten to whatever 'git rev-list --parents -- $subdir' told us to make them. I think there are only two cases for its output: (a) Both A and D bring the same subdirectory contents. 'rev-list --parents -- $subdir' drops one side of the merge during pruning. It does not look past the merge to see whether the contents were arrived at via different changesets. Thus the history becomes A' -- C' D' and even that only if D was reachable by a different ref, otherwise D' is simply dropped. (b) A and D bring different $subdir contents. Then the merge is interesting and remains. History is now A' -- B' -- C' / D' -/ Neither of those cases is a problem for the -1 strategy. A branch 'topic' pointing to B will be rewritten to (a) A' and (b) B'. IOW, either the merge remains and there is no problem, or the side branches vanish too and there is no problem. rev-list never "forward simplifies" merges; it merely tries to prune away commits on the incoming side of the merge until all its parents are interesting. Either that, or I missed something obvious. I think I'll have to come up with a better commit message... - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2] filter-branch: fix ref rewriting with --subdirectory-filter 2008-08-08 18:37 ` Thomas Rast @ 2008-08-08 18:39 ` Thomas Rast 2008-08-09 0:16 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 1 sibling, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-08 18:39 UTC (permalink / raw) To: git; +Cc: gitster, Johannes Schindelin, Jan Wielemaker The previous ancestor discovery code failed on any refs that are (pre-rewrite) ancestors of commits marked for rewriting. This means that in a situation A -- B(topic) -- C(master) where B is dropped by --subdirectory-filter pruning, the 'topic' was not moved up to A as intended, but left unrewritten because we asked about 'git rev-list ^master topic', which does not return anything. Instead, we use the straightforward git rev-list -1 $ref -- $filter_subdir to find the right ancestor. To justify this, note that the nearest ancestor is unique: We use the output of git rev-list --parents -- $filter_subdir to rewrite commits in the first pass, before any ref rewriting. If B is a non-merge commit, the only candidate is its parent. If it is a merge, there are two cases: - All sides of the merge bring the same subdirectory contents. Then rev-list already pruned away the merge in favour for just one of its parents, so there is only one candidate. - Some merge sides, or the merge outcome, differ. Then the merge is not pruned and can be rewritten directly. So it is always safe to use rev-list -1. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- Only comments and commit message changed since v1, to update the justification. git-filter-branch.sh | 27 +++++++++++---------------- 1 files changed, 11 insertions(+), 16 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index a324cf0..a140337 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -317,24 +317,19 @@ done <../revs # In case of a subdirectory filter, it is possible that a specified head # is not in the set of rewritten commits, because it was pruned by the -# revision walker. Fix it by mapping these heads to the next rewritten -# ancestor(s), i.e. the boundaries in the set of rewritten commits. +# revision walker. Fix it by mapping these heads to the unique nearest +# ancestor that survived the pruning. -# NEEDSWORK: we should sort the unmapped refs topologically first -while read ref -do - sha1=$(git rev-parse "$ref"^0) - test -f "$workdir"/../map/$sha1 && continue - # Assign the boundarie(s) in the set of rewritten commits - # as the replacement commit(s). - # (This would look a bit nicer if --not --stdin worked.) - for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | - git rev-list $ref --boundary --stdin | - sed -n "s/^-//p") +if test "$filter_subdir" +then + while read ref do - map $p >> "$workdir"/../map/$sha1 - done -done < "$tempdir"/heads + sha1=$(git rev-parse "$ref"^0) + test -f "$workdir"/../map/$sha1 && continue + ancestor=$(git rev-list -1 $ref -- "$filter_subdir") + test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 + done < "$tempdir"/heads +fi # Finally update the refs -- 1.6.0.rc2.23.ge69de8 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-08 18:37 ` Thomas Rast 2008-08-08 18:39 ` [PATCH v2] filter-branch: fix ref rewriting with --subdirectory-filter Thomas Rast @ 2008-08-09 0:16 ` Johannes Schindelin 2008-08-09 1:25 ` Junio C Hamano 2008-08-09 10:00 ` Thomas Rast 1 sibling, 2 replies; 40+ messages in thread From: Johannes Schindelin @ 2008-08-09 0:16 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Jan Wielemaker Hi, On Fri, 8 Aug 2008, Thomas Rast wrote: > Johannes Schindelin wrote: > > On Fri, 8 Aug 2008, Thomas Rast wrote: > > > > > I think a more careful use of rev-list -1 is actually a correct and > > > easy way to figure out an ancestor. > > > > I have not looked at your patch closely, or at your explanation, but I > > am really certain that every attempt to replace the --boundary with a > > -1 must fail. > > > > Let me show you why I think that. Just look at this history: > > > > A - B - C > > / > > D > > > > > > Where all commits except B touch the inside directory. Two > > > > options: > > 'rev-list' "solves" this problem for us. At the point where we are > rewriting the branch pointers, commits have already been rewritten to > whatever 'git rev-list --parents -- $subdir' told us to make them. I > think there are only two cases for its output: > > (a) Both A and D bring the same subdirectory contents. 'rev-list > --parents -- $subdir' drops one side of the merge during pruning. It > does not look past the merge to see whether the contents were > arrived at via different changesets. Thus the history becomes > > A' -- C' > > D' > > and even that only if D was reachable by a different ref, > otherwise D' is simply dropped. And this is what I call wrong. Simply dropping one side of the equation is not what I call "sane". If you drop information, you are disagreeing with "content is king". But hey, if other people agree with you, and this kind of thinking ends up in Git proper, I can still resort to other DVCSes. Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-09 0:16 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin @ 2008-08-09 1:25 ` Junio C Hamano 2008-08-09 9:25 ` Thomas Rast ` (2 more replies) 2008-08-09 10:00 ` Thomas Rast 1 sibling, 3 replies; 40+ messages in thread From: Junio C Hamano @ 2008-08-09 1:25 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Thomas Rast, git, Jan Wielemaker Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> (a) Both A and D bring the same subdirectory contents. 'rev-list >> --parents -- $subdir' drops one side of the merge during pruning. It >> does not look past the merge to see whether the contents were >> arrived at via different changesets. Thus the history becomes >> >> A' -- C' >> >> D' >> >> and even that only if D was reachable by a different ref, >> otherwise D' is simply dropped. > > And this is what I call wrong. Simply dropping one side of the equation > is not what I call "sane". > > If you drop information, you are disagreeing with "content is king". I think the aggressive merge simplification that gives "one simplest explanation for the contents of the paths specified" is a wrong mode of operation to use when you are filtering branches. It might be a good thing to support as an option, but I agree with you that it should not be the default. Perhaps --full-history is needed to the rev-list call (and the recent invention --simplify-merges that will hopefully appear sometime after 1.6.0)? See recent discussion of --full-history and the default merge simplification between Linus and Roman Zippel. I suspect that back when the original cg-rewritehistory was written, not many people understood the issues explained in that thread. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-09 1:25 ` Junio C Hamano @ 2008-08-09 9:25 ` Thomas Rast 2008-08-09 9:35 ` Thomas Rast 2008-08-10 14:02 ` [PATCH] filter-branch: use --simplify-merges Thomas Rast 2008-08-12 8:18 ` [RFH] filter-branch: ancestor detection weirdness Petr Baudis 2 siblings, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-09 9:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, git, Jan Wielemaker [-- Attachment #1: Type: text/plain, Size: 2394 bytes --] Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> (a) Both A and D bring the same subdirectory contents. 'rev-list > >> --parents -- $subdir' drops one side of the merge during pruning. It > >> does not look past the merge to see whether the contents were > >> arrived at via different changesets. Thus the history becomes > >> > >> A' -- C' > >> > >> D' > >> > >> and even that only if D was reachable by a different ref, > >> otherwise D' is simply dropped. > > > > And this is what I call wrong. Simply dropping one side of the equation > > is not what I call "sane". > > > > If you drop information, you are disagreeing with "content is king". I wonder why I have to be the devil's advocate here. Let me emphasise: _This is how filter-branch currently works._ It is not some obscure feature coming with my patch. The user _asks_ for this simplification by using --subdirectory-filter. It is also _happening long before branch rewriting_, and we are discussing a patch to said branch rewriting. Junio has a point: > I think the aggressive merge simplification that gives "one simplest > explanation for the contents of the paths specified" is a wrong mode of > operation to use when you are filtering branches. It might be a good > thing to support as an option, but I agree with you that it should not be > the default. > > Perhaps --full-history is needed to the rev-list call (and the recent But --full-history cannot solve this problem; it would entirely defeat the point of --subdirectory-filter. (I haven't looked into what --simplify-merges does yet.) The only thing my patch changes is the behaviour with branches _that the user asked us to rewrite to the subdirectory history_ but that don't point to a precise commit that survived the simplification. Why would rewriting the branch pointer approriately be bad when the user specifically asked for it? And your _existing_ branch rewriting code had the same thing in mind: move back to an ancestor that roughly fits the ticket. You just missed the problem with 'rev-list ^master ancestor' that has a high chance to break the mechanism with --all. And broke in Jan's case, which is why we're having this discussion, remember? - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-09 9:25 ` Thomas Rast @ 2008-08-09 9:35 ` Thomas Rast 0 siblings, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-09 9:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, git, Jan Wielemaker [-- Attachment #1: Type: text/plain, Size: 466 bytes --] Thomas Rast wrote: > Junio C Hamano wrote: > > > > Perhaps --full-history is needed to the rev-list call (and the recent > > But --full-history cannot solve this problem; it would entirely defeat > the point of --subdirectory-filter. (I haven't looked into what > --simplify-merges does yet.) Actually, on this point I stand corrected, in some tests it has a good effect. I'll look into it. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] filter-branch: use --simplify-merges 2008-08-09 1:25 ` Junio C Hamano 2008-08-09 9:25 ` Thomas Rast @ 2008-08-10 14:02 ` Thomas Rast 2008-08-12 1:54 ` Junio C Hamano 2008-08-12 8:18 ` [RFH] filter-branch: ancestor detection weirdness Petr Baudis 2 siblings, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-10 14:02 UTC (permalink / raw) To: git; +Cc: gitster, Johannes Schindelin, Jan Wielemaker Use rev-list --simplify-merges everywhere. This changes the behaviour of --subdirectory-filter in cases such as O -- A -\ \ \ \- B -- M where A and B bring the same changes to the subdirectory: It now keeps both sides of the merge. Previously, the history would have been simplified to 'O -- A'. Merges of unrelated side histories that never touch the subdirectory are still removed. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- This obviously depends on --simplify-merges which is only in 'next'. Junio C Hamano wrote: > > Perhaps --full-history is needed to the rev-list call (and the recent > invention --simplify-merges that will hopefully appear sometime after > 1.6.0)? See recent discussion of --full-history and the default merge > simplification between Linus and Roman Zippel. Following history pointers, it turns out the discussion surrounding a17171b4 (Revert "filter-branch: subdirectory filter needs --full-history") actually mentions that a simplification step on top of --full-history is needed: Junio C Hamano wrote: [http://kerneltrap.org/mailarchive/git/2007/6/13/249107] > In short, > you will end up with something like this: > > .---. (side branch) > / \ > ---A---B---C (merge) > > The "merge clean-up" would conceptually be a simple operation. > Whenever you see a merge C, you look at its parents A and B, and > cull the ones that are reachable from other parents. You notice > that A is an ancestor of B, drop A from the parents of C, and > simplify the above down to: > > ---A---B---C (not-a-merge) Well, turns out that's what you did with --simplify-merges, so let's use it. git-filter-branch.sh | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 539b2e6..60f64ac 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -239,11 +239,11 @@ mkdir ../map || die "Could not create map/ directory" case "$filter_subdir" in "") git rev-list --reverse --topo-order --default HEAD \ - --parents "$@" + --parents --simplify-merges "$@" ;; *) git rev-list --reverse --topo-order --default HEAD \ - --parents "$@" -- "$filter_subdir" + --parents --simplify-merges "$@" -- "$filter_subdir" esac > ../revs || die "Could not get the commits" commits=$(wc -l <../revs | tr -d " ") @@ -333,7 +333,8 @@ then do sha1=$(git rev-parse "$ref"^0) test -f "$workdir"/../map/$sha1 && continue - ancestor=$(git rev-list -1 $ref -- "$filter_subdir") + ancestor=$(git rev-list --simplify-merges -1 \ + $ref -- "$filter_subdir") test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 done < "$tempdir"/heads fi -- 1.6.0.rc2.29.g7ec81 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] filter-branch: use --simplify-merges 2008-08-10 14:02 ` [PATCH] filter-branch: use --simplify-merges Thomas Rast @ 2008-08-12 1:54 ` Junio C Hamano 2008-08-12 2:13 ` Junio C Hamano 0 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2008-08-12 1:54 UTC (permalink / raw) To: Thomas Rast; +Cc: git, gitster, Johannes Schindelin, Jan Wielemaker Thomas Rast <trast@student.ethz.ch> writes: > @@ -333,7 +333,8 @@ then > do > sha1=$(git rev-parse "$ref"^0) > test -f "$workdir"/../map/$sha1 && continue > - ancestor=$(git rev-list -1 $ref -- "$filter_subdir") > + ancestor=$(git rev-list --simplify-merges -1 \ > + $ref -- "$filter_subdir") > test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 > done < "$tempdir"/heads > fi Hmm, where does this preimage come from? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] filter-branch: use --simplify-merges 2008-08-12 1:54 ` Junio C Hamano @ 2008-08-12 2:13 ` Junio C Hamano 2008-08-12 5:47 ` Thomas Rast 0 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2008-08-12 2:13 UTC (permalink / raw) To: Thomas Rast; +Cc: git, gitster, Johannes Schindelin, Jan Wielemaker Junio C Hamano <gitster@pobox.com> writes: > Thomas Rast <trast@student.ethz.ch> writes: > >> @@ -333,7 +333,8 @@ then >> do >> sha1=$(git rev-parse "$ref"^0) >> test -f "$workdir"/../map/$sha1 && continue >> - ancestor=$(git rev-list -1 $ref -- "$filter_subdir") >> + ancestor=$(git rev-list --simplify-merges -1 \ >> + $ref -- "$filter_subdir") >> test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 >> done < "$tempdir"/heads >> fi > > Hmm, where does this preimage come from? Nevermind. You based this on top of the "fix ancestor discovery" patch. I'll squash these two and queue them in 'pu' for now. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] filter-branch: use --simplify-merges 2008-08-12 2:13 ` Junio C Hamano @ 2008-08-12 5:47 ` Thomas Rast 2008-08-12 6:59 ` Junio C Hamano 0 siblings, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-12 5:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Johannes Schindelin, Jan Wielemaker [-- Attachment #1: Type: text/plain, Size: 750 bytes --] Junio C Hamano wrote: > > Thomas Rast <trast@student.ethz.ch> writes: > > > >> - ancestor=$(git rev-list -1 $ref -- "$filter_subdir") > >> + ancestor=$(git rev-list --simplify-merges -1 \ > >> + $ref -- "$filter_subdir") > > > > Hmm, where does this preimage come from? > > Nevermind. You based this on top of the "fix ancestor discovery" patch. > > I'll squash these two and queue them in 'pu' for now. Please don't. I'm still convinced the "fix ancestor discovery" is a fix to current code that works independent of --simplify-merges. If you squash them, it cannot go into a release before --simplify-merges even if I manage to convince Dscho of this. Thanks. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] filter-branch: use --simplify-merges 2008-08-12 5:47 ` Thomas Rast @ 2008-08-12 6:59 ` Junio C Hamano 2008-08-12 8:45 ` [PATCH 0/3] filter-branch --subdirectory-filter improvements Thomas Rast ` (3 more replies) 0 siblings, 4 replies; 40+ messages in thread From: Junio C Hamano @ 2008-08-12 6:59 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Johannes Schindelin, Jan Wielemaker Thomas Rast <trast@student.ethz.ch> writes: > Junio C Hamano wrote: >> ... >> Nevermind. You based this on top of the "fix ancestor discovery" patch. >> >> I'll squash these two and queue them in 'pu' for now. > > Please don't. I'm still convinced the "fix ancestor discovery" is a > fix to current code that works independent of --simplify-merges. If > you squash them, it cannot go into a release before --simplify-merges > even if I manage to convince Dscho of this. Anything parked in 'pu' is a fair game for replacement later, so please send a replacement series and tell me to drop the previous ones from 'pu'. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 0/3] filter-branch --subdirectory-filter improvements 2008-08-12 6:59 ` Junio C Hamano @ 2008-08-12 8:45 ` Thomas Rast 2008-08-12 12:11 ` Jan Wielemaker 2008-08-12 8:45 ` [PATCH 1/3] filter-branch: Extend test to show rewriting bug Thomas Rast ` (2 subsequent siblings) 3 siblings, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-12 8:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin, Jan Wielemaker Junio C Hamano wrote: > Anything parked in 'pu' is a fair game for replacement later, so please > send a replacement series and tell me to drop the previous ones from 'pu'. So let's try this one. The first two do not depend on --simplify-merges. 1/3 is new, and extends the --subdirectory-filter test to prove the existence of the bug in current filter-branch. I hope it helps explain the issue. 2/3 is the same as before[*] modulo changing the test to expect success again. The third one does depend on --simplify-merges. 3/3 introduces --simplify-merges, which improves the history that results from --subdirectory-filter. It has absolutely nothing to do with 2/3, except that it touches the same area of code. (You could s/rev-list/rev-list --simplify-merges/ in master:git-filter-branch.sh, and get the improved history without the bugfix.) Sorry that I dispersed the patches and v2s randomly across the thread. - Thomas [*] http://kerneltrap.org/mailarchive/git/2008/8/8/2867244 "[PATCH v2] filter-branch: fix ref rewriting with --subdirectory-filter" Thomas Rast (3): filter-branch: Extend test to show rewriting bug filter-branch: fix ref rewriting with --subdirectory-filter filter-branch: use --simplify-merges ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/3] filter-branch --subdirectory-filter improvements 2008-08-12 8:45 ` [PATCH 0/3] filter-branch --subdirectory-filter improvements Thomas Rast @ 2008-08-12 12:11 ` Jan Wielemaker 0 siblings, 0 replies; 40+ messages in thread From: Jan Wielemaker @ 2008-08-12 12:11 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Junio C Hamano, Johannes Schindelin On Tuesday 12 August 2008 10:45:56 am Thomas Rast wrote: > Junio C Hamano wrote: > > Anything parked in 'pu' is a fair game for replacement later, so please > > send a replacement series and tell me to drop the previous ones from > > 'pu'. > > So let's try this one. The first two do not depend on > --simplify-merges. And I can confirm that (2) is a very important fix and (3) is a necessary step to achieve what I believe --subdirectory-filter is meant for: extract a directory from a big project and turn it into stand-alone project. In this case I want: % filter out dir X, creating X.git % git rm -r X % git submodule add <url> X And someone with a totally unrelated project adds X.git to his project. That requires the history of X to become totally independent from the original project. This works great with Thomas' patches. Cheers --- Jan P.s. Note that this is a common problem for people moving from some unnamed ancient SCM system, either transferring a repository or simply starting the wrong way due to historical brainwashing :-) > 1/3 is new, and extends the --subdirectory-filter test to prove the > existence of the bug in current filter-branch. I hope it helps > explain the issue. > > 2/3 is the same as before[*] modulo changing the test to expect > success again. > > The third one does depend on --simplify-merges. > > 3/3 introduces --simplify-merges, which improves the history that > results from --subdirectory-filter. It has absolutely nothing to do > with 2/3, except that it touches the same area of code. (You could > s/rev-list/rev-list --simplify-merges/ in master:git-filter-branch.sh, > and get the improved history without the bugfix.) > > Sorry that I dispersed the patches and v2s randomly across the thread. > > - Thomas > > [*] http://kerneltrap.org/mailarchive/git/2008/8/8/2867244 > "[PATCH v2] filter-branch: fix ref rewriting with --subdirectory-filter" > > > Thomas Rast (3): > filter-branch: Extend test to show rewriting bug > filter-branch: fix ref rewriting with --subdirectory-filter > filter-branch: use --simplify-merges ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 1/3] filter-branch: Extend test to show rewriting bug 2008-08-12 6:59 ` Junio C Hamano 2008-08-12 8:45 ` [PATCH 0/3] filter-branch --subdirectory-filter improvements Thomas Rast @ 2008-08-12 8:45 ` Thomas Rast 2008-08-12 8:45 ` [PATCH 2/3] filter-branch: fix ref rewriting with --subdirectory-filter Thomas Rast 2008-08-12 8:45 ` [PATCH 3/3] filter-branch: use --simplify-merges Thomas Rast 3 siblings, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-12 8:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin, Jan Wielemaker This extends the --subdirectory-filter test in t7003 to demonstrate a rewriting bug: when rewriting two refs A and B such that B is an ancestor of A, it fails to rewrite B. The underlying issue is that the rev-list invocation at git-filter-branch.sh:332 more or less boils down to git rev-list B --boundary ^A which outputs nothing because B is an ancestor of A. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- t/t7003-filter-branch.sh | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index a0ab096..4382baa 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -96,13 +96,17 @@ test_expect_success 'filter subdirectory only' ' test_tick && git commit -m "again not subdir" && git branch sub && - git-filter-branch -f --subdirectory-filter subdir refs/heads/sub + git branch sub-earlier HEAD~2 && + git-filter-branch -f --subdirectory-filter subdir \ + refs/heads/sub refs/heads/sub-earlier ' -test_expect_success 'subdirectory filter result looks okay' ' +test_expect_failure 'subdirectory filter result looks okay' ' test 2 = $(git rev-list sub | wc -l) && git show sub:new && - test_must_fail git show sub:subdir + test_must_fail git show sub:subdir && + git show sub-earlier:new && + test_must_fail git show sub-earlier:subdir ' test_expect_success 'more setup' ' -- 1.6.0.rc2.30.gb6bda ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 2/3] filter-branch: fix ref rewriting with --subdirectory-filter 2008-08-12 6:59 ` Junio C Hamano 2008-08-12 8:45 ` [PATCH 0/3] filter-branch --subdirectory-filter improvements Thomas Rast 2008-08-12 8:45 ` [PATCH 1/3] filter-branch: Extend test to show rewriting bug Thomas Rast @ 2008-08-12 8:45 ` Thomas Rast 2008-08-12 8:45 ` [PATCH 3/3] filter-branch: use --simplify-merges Thomas Rast 3 siblings, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-12 8:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin, Jan Wielemaker The previous ancestor discovery code failed on any refs that are (pre-rewrite) ancestors of commits marked for rewriting. This means that in a situation A -- B(topic) -- C(master) where B is dropped by --subdirectory-filter pruning, the 'topic' was not moved up to A as intended, but left unrewritten because we asked about 'git rev-list ^master topic', which does not return anything. Instead, we use the straightforward git rev-list -1 $ref -- $filter_subdir to find the right ancestor. To justify this, note that the nearest ancestor is unique: We use the output of git rev-list --parents -- $filter_subdir to rewrite commits in the first pass, before any ref rewriting. If B is a non-merge commit, the only candidate is its parent. If it is a merge, there are two cases: - All sides of the merge bring the same subdirectory contents. Then rev-list already pruned away the merge in favour for just one of its parents, so there is only one candidate. - Some merge sides, or the merge outcome, differ. Then the merge is not pruned and can be rewritten directly. So it is always safe to use rev-list -1. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- git-filter-branch.sh | 27 +++++++++++---------------- t/t7003-filter-branch.sh | 2 +- 2 files changed, 12 insertions(+), 17 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index a324cf0..a140337 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -317,24 +317,19 @@ done <../revs # In case of a subdirectory filter, it is possible that a specified head # is not in the set of rewritten commits, because it was pruned by the -# revision walker. Fix it by mapping these heads to the next rewritten -# ancestor(s), i.e. the boundaries in the set of rewritten commits. +# revision walker. Fix it by mapping these heads to the unique nearest +# ancestor that survived the pruning. -# NEEDSWORK: we should sort the unmapped refs topologically first -while read ref -do - sha1=$(git rev-parse "$ref"^0) - test -f "$workdir"/../map/$sha1 && continue - # Assign the boundarie(s) in the set of rewritten commits - # as the replacement commit(s). - # (This would look a bit nicer if --not --stdin worked.) - for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | - git rev-list $ref --boundary --stdin | - sed -n "s/^-//p") +if test "$filter_subdir" +then + while read ref do - map $p >> "$workdir"/../map/$sha1 - done -done < "$tempdir"/heads + sha1=$(git rev-parse "$ref"^0) + test -f "$workdir"/../map/$sha1 && continue + ancestor=$(git rev-list -1 $ref -- "$filter_subdir") + test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 + done < "$tempdir"/heads +fi # Finally update the refs diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 4382baa..233254f 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -101,7 +101,7 @@ test_expect_success 'filter subdirectory only' ' refs/heads/sub refs/heads/sub-earlier ' -test_expect_failure 'subdirectory filter result looks okay' ' +test_expect_success 'subdirectory filter result looks okay' ' test 2 = $(git rev-list sub | wc -l) && git show sub:new && test_must_fail git show sub:subdir && -- 1.6.0.rc2.30.gb6bda ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 3/3] filter-branch: use --simplify-merges 2008-08-12 6:59 ` Junio C Hamano ` (2 preceding siblings ...) 2008-08-12 8:45 ` [PATCH 2/3] filter-branch: fix ref rewriting with --subdirectory-filter Thomas Rast @ 2008-08-12 8:45 ` Thomas Rast 3 siblings, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-12 8:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin, Jan Wielemaker Use rev-list --simplify-merges everywhere. This changes the behaviour of --subdirectory-filter in cases such as O -- A -\ \ \ \- B -- M where A and B bring the same changes to the subdirectory: It now keeps both sides of the merge. Previously, the history would have been simplified to 'O -- A'. Merges of unrelated side histories that never touch the subdirectory are still removed. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- git-filter-branch.sh | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index a140337..2688254 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -232,11 +232,11 @@ mkdir ../map || die "Could not create map/ directory" case "$filter_subdir" in "") git rev-list --reverse --topo-order --default HEAD \ - --parents "$@" + --parents --simplify-merges "$@" ;; *) git rev-list --reverse --topo-order --default HEAD \ - --parents "$@" -- "$filter_subdir" + --parents --simplify-merges "$@" -- "$filter_subdir" esac > ../revs || die "Could not get the commits" commits=$(wc -l <../revs | tr -d " ") @@ -326,7 +326,8 @@ then do sha1=$(git rev-parse "$ref"^0) test -f "$workdir"/../map/$sha1 && continue - ancestor=$(git rev-list -1 $ref -- "$filter_subdir") + ancestor=$(git rev-list --simplify-merges -1 \ + $ref -- "$filter_subdir") test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1 done < "$tempdir"/heads fi -- 1.6.0.rc2.30.gb6bda ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-09 1:25 ` Junio C Hamano 2008-08-09 9:25 ` Thomas Rast 2008-08-10 14:02 ` [PATCH] filter-branch: use --simplify-merges Thomas Rast @ 2008-08-12 8:18 ` Petr Baudis 2008-08-12 18:33 ` Junio C Hamano 2 siblings, 1 reply; 40+ messages in thread From: Petr Baudis @ 2008-08-12 8:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, Thomas Rast, git, Jan Wielemaker On Fri, Aug 08, 2008 at 06:25:05PM -0700, Junio C Hamano wrote: > Perhaps --full-history is needed to the rev-list call (and the recent > invention --simplify-merges that will hopefully appear sometime after > 1.6.0)? See recent discussion of --full-history and the default merge > simplification between Linus and Roman Zippel. I suspect that back when > the original cg-rewritehistory was written, not many people understood the > issues explained in that thread. Just as a historical note, --subdirectory-filter was actually not part of cg-admin-rewritehist. Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-12 8:18 ` [RFH] filter-branch: ancestor detection weirdness Petr Baudis @ 2008-08-12 18:33 ` Junio C Hamano 0 siblings, 0 replies; 40+ messages in thread From: Junio C Hamano @ 2008-08-12 18:33 UTC (permalink / raw) To: Petr Baudis; +Cc: Johannes Schindelin, Thomas Rast, git, Jan Wielemaker Petr Baudis <pasky@suse.cz> writes: > On Fri, Aug 08, 2008 at 06:25:05PM -0700, Junio C Hamano wrote: >> Perhaps --full-history is needed to the rev-list call (and the recent >> invention --simplify-merges that will hopefully appear sometime after >> 1.6.0)? See recent discussion of --full-history and the default merge >> simplification between Linus and Roman Zippel. I suspect that back when >> the original cg-rewritehistory was written, not many people understood the >> issues explained in that thread. > > Just as a historical note, --subdirectory-filter was actually not part > of cg-admin-rewritehist. Ok, that sounds more plausible. Recent addition whose wrinkles have not been ironed out. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-09 0:16 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 2008-08-09 1:25 ` Junio C Hamano @ 2008-08-09 10:00 ` Thomas Rast 2008-08-12 21:33 ` Junio C Hamano 1 sibling, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-09 10:00 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1443 bytes --] Johannes Schindelin wrote: > But hey, if other people agree with you, and this kind of thinking ends > up in Git proper, I can still resort to other DVCSes. BTW, the following is fairly ironic. (It was later rewritten in 813b473 to the current one-shot 'rev-list --parents' form.) commit 685ef546b62d063c72b401cd38b83a879301aac4 Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> Date: Fri Jun 8 01:30:35 2007 +0100 Teach filter-branch about subdirectory filtering With git-filter-branch --subdirectory-filter <subdirectory> you can get at the history, as seen by a certain subdirectory. The history of the rewritten branch will only contain commits that touched that subdirectory, and the subdirectory will be rewritten to be the new project root. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> diff --git a/git-filter-branch.sh b/git-filter-branch.sh [...snip...] @@ -224,7 +228,13 @@ set_ident () { # list all parent's object names for a given commit get_parents () { - git-rev-list -1 --parents "$1" | sed "s/^[0-9a-f]*//" + case "$filter_subdir" in + "") + git-rev-list -1 --parents "$1" + ;; + *) + git-rev-list -1 --parents "$1" -- "$filter_subdir" + esac | sed "s/^[0-9a-f]*//" } tempdir=.git-rewrite [...snip...] -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-09 10:00 ` Thomas Rast @ 2008-08-12 21:33 ` Junio C Hamano 2008-08-12 22:15 ` Thomas Rast 0 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2008-08-12 21:33 UTC (permalink / raw) To: Thomas Rast; +Cc: Johannes Schindelin, git Thomas Rast <trast@student.ethz.ch> writes: > Johannes Schindelin wrote: >> But hey, if other people agree with you, and this kind of thinking ends >> up in Git proper, I can still resort to other DVCSes. > > BTW, the following is fairly ironic. (It was later rewritten in > 813b473 to the current one-shot 'rev-list --parents' form.) Hmm, Dscho, perhaps we should take Thomas's patch as a "revert to 685ef54 to fix breakage introduced by 813b473", and demonstrate the breakage with one of the new tests in his series? I think it is Ok to use the "view --parents for all branches, instead of looping with -1" approach when there is no path limiter, and that might be faster, but if it complicates the logic too much, it probably is not worth it. I also _suspect_ that if you use --simplify-merges, the optimization made by 813b473 would still be usable even with path limiter. By the way, I am not sure if using --simplify-merges unconditionally is necessarily a good thing to do. The user who filters the branches may be interested in a full history (where using --simplify-merges is the right thing to do), or may be interested in getting one simplest possible explanation of the end result, similar to what you get from rev-list without the option. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFH] filter-branch: ancestor detection weirdness 2008-08-12 21:33 ` Junio C Hamano @ 2008-08-12 22:15 ` Thomas Rast 0 siblings, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-12 22:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, git [-- Attachment #1: Type: text/plain, Size: 1687 bytes --] Junio C Hamano wrote: > Hmm, Dscho, perhaps we should take Thomas's patch as a "revert to 685ef54 > to fix breakage introduced by 813b473", and demonstrate the breakage with > one of the new tests in his series? Now you've lost me. If you're saying 813b473 is at fault: it is not. The code I'm trying to fix came about in dfd05e38. To see that the change in 813b473 is ok, you can simply run the following in git.git: diff -u <(git rev-list --reverse --parents --topo-order HEAD -- gitk) \ <(git rev-list --reverse --topo-order HEAD -- gitk | while read commit do echo $(git rev-list -1 --parents $commit -- gitk); done) The one thing that breaks down is (04c6e9e:git-filter-branch.sh:331) for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") | git rev-list $ref --boundary --stdin | sed -n "s/^-//p") > I also _suspect_ that if you use --simplify-merges, the optimization > made by 813b473 would still be usable even with path limiter. It is always usable, if we are careful enough to use the same limiting arguments in all rev-lists involved. > By the way, I am not sure if using --simplify-merges unconditionally is > necessarily a good thing to do. I think filter-branch would need a generic mechanism to pass arguments that affect commit selection. Passing '-- -- file' or '-- ^commit' to filter-branch --subdirectory-filter will probably break a few things, so it either needs to recognize those arguments itself or have a mechanism to specify them, if we want to support it. This also goes for the simplification mode. - Thomas -- Thomas Rast trast@student.ethz.ch [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: git filter-branch --subdirectory-filter, still a mistery 2008-08-07 23:48 ` Thomas Rast 2008-08-07 23:50 ` [PATCH] filter-branch: be more helpful when an annotated tag changes Thomas Rast 2008-08-07 23:54 ` [RFH] filter-branch: ancestor detection weirdness Thomas Rast @ 2008-08-08 7:44 ` Jan Wielemaker 2008-08-08 11:25 ` Jan Wielemaker 3 siblings, 0 replies; 40+ messages in thread From: Jan Wielemaker @ 2008-08-08 7:44 UTC (permalink / raw) To: Thomas Rast; +Cc: git Hi Thomas, Thanks for looking into this! On Friday 08 August 2008 01:48:05 Thomas Rast wrote: > Jan Wielemaker wrote: > > Ref 'refs/tags/V5.6.50' was rewritten > > error: Ref refs/tags/V5.6.50 is at > > 8678b32f71178019c06aefa40e2d3fb9a2e8ef25 but > > expected 2e8aef64e2fed088720a19ac2ffa2481e5bc7806 > > fatal: Cannot lock the ref 'refs/tags/V5.6.50'. > > Could not rewrite refs/tags/V5.6.50 > > [...] > > > Now, if I look in .git/packed-refs [...] and I changed all these to > > `lightweight' tags > > This appears to be a bug. I've whipped up a patch that will follow > and should fix the bug. It has nothing to do with packed-refs; the > current filter-branch chokes on annotated tags during > --subdirectory-filter, even though there is support for tag rewriting. > > However, to enable tag rewriting, you need to say --tag-name-filter > cat. Great. I knew a more fundamental approach was asked for, but I bet my simple-minded work-around gives the same result, no? > > Now it runs to the end. Unfortunagtely the history is completely > > screwed up :-(: > > > > * There are a lot of commits that are not related to the dir > > * Commits start long before the directory came into existence, > > Looks like it just shows the whole project at this place. > > For some reason the ancestor detection does not work right. I'm also > following up with an RFH patch that significantly improves the success > rate (in terms of branches and tags successfully mapped to a rewritten > commit) in the case of your repository. I doubt more staring at the > code would yield any more ideas at this hour, so ideas would be > appreciated. Thanks. As I'm using the GIT version anyway, I'll apply these patches and see what happens. The trouble is related to tags and possibly to branches. I get completely correct result if I delete all branches and tags before filtering. That at least helps for this particular subproject (though some of the tags are useful). I didn't further investigate branches (I think the packages/chr directory is not involved in any branch; if you are interested, the boot directory should show traces of the V57X branch). I did see that (all/some?) tags that involve changes to the packages/chr directory nicely end up in its history, but others do not appear on the filtered master branch and give access to the complete project. See for example V5.6.59 (the latest release tag). Try (in the filtered branch) git diff V5.6.59.. That should only show some small changes, but it diffs the entire project against the subdir ... > The rest is just the other commits/tags showing a lot of the history. > I don't know of any built-in way to prune the branches and tags that > aren't part of the new master, but > > git branch -a --no-merged master > > can tell you which branches aren't ancestors of master. Thanks for the tip. Cheers --- Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: git filter-branch --subdirectory-filter, still a mistery 2008-08-07 23:48 ` Thomas Rast ` (2 preceding siblings ...) 2008-08-08 7:44 ` git filter-branch --subdirectory-filter, still a mistery Jan Wielemaker @ 2008-08-08 11:25 ` Jan Wielemaker 3 siblings, 0 replies; 40+ messages in thread From: Jan Wielemaker @ 2008-08-08 11:25 UTC (permalink / raw) To: Thomas Rast; +Cc: git Hi Thomas, On Friday 08 August 2008 01:48:05 Thomas Rast wrote: > This appears to be a bug. I've whipped up a patch that will follow > and should fix the bug. It has nothing to do with packed-refs; the > current filter-branch chokes on annotated tags during > --subdirectory-filter, even though there is support for tag rewriting. > > However, to enable tag rewriting, you need to say --tag-name-filter > cat. That works! > > Now it runs to the end. Unfortunagtely the history is completely > > screwed up :-(: > > > > * There are a lot of commits that are not related to the dir > > * Commits start long before the directory came into existence, > > Looks like it just shows the whole project at this place. > > For some reason the ancestor detection does not work right. I'm also > following up with an RFH patch that significantly improves the success > rate (in terms of branches and tags successfully mapped to a rewritten > commit) in the case of your repository. I doubt more staring at the > code would yield any more ideas at this hour, so ideas would be > appreciated. > > The rest is just the other commits/tags showing a lot of the history. > I don't know of any built-in way to prune the branches and tags that > aren't part of the new master, but > > git branch -a --no-merged master > > can tell you which branches aren't ancestors of master. I retried with your two patches. That looks a *lot* better. After using the above and deleting the reported branches there are still some branches left, but at least switching to them doesn't bring the complete project back. Now there are a few weird tags left, some of these may well be the result of weird things in the repository. The repository was on CVS until about a year ago and was converted (using SVN as intermediate). The big problem is anything that relates to the days before the filtered directory was part of the project. There are lots of tags there and switching to them brings back the old project. I'd guess the correct behaviour is that either all these tags refer to an empty tree or (which I would prefer) all such tags are deleted. Is this a bug? Is there a trick here? git clone --depth doesn't seem appropriate. Cheers --- Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Documentation: filter-branch: document how to filter all refs 2008-08-07 7:50 ` Thomas Rast 2008-08-07 10:14 ` Jan Wielemaker @ 2008-08-07 14:04 ` Thomas Rast 2008-08-07 14:16 ` [PATCH v2] " Thomas Rast 1 sibling, 1 reply; 40+ messages in thread From: Thomas Rast @ 2008-08-07 14:04 UTC (permalink / raw) To: git Document the '--' option that can be used to pass rev-list options (not just arguments), and give an example usage of '-- --all'. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- [This went out to Jan and Junio already, but I forgot to CC the list. Sorry.] Somehow I'm imagining this is a FAQ. Either way, I remember figuring out this exact example by accident when I first needed it. Documentation/git-filter-branch.txt | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index a518ba6..1f0fcec 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -13,7 +13,7 @@ SYNOPSIS [--msg-filter <command>] [--commit-filter <command>] [--tag-name-filter <command>] [--subdirectory-filter <directory>] [--original <namespace>] [-d <directory>] [-f | --force] - [<rev-list options>...] + [--] [<rev-list options>...] DESCRIPTION ----------- @@ -196,6 +196,17 @@ git filter-branch --index-filter 'git rm --cached filename' HEAD Now, you will get the rewritten history saved in HEAD. +To rewrite the repository to look as if 'foodir/' had been its project +root, and discard all other history: + +------------------------------------------------------- +git filter-branch --subdirectory-filter foodir -- --all +------------------------------------------------------- + +Thus you can, e.g., turn a library subdirectory into a repository of +its own. Note the '--' that separates 'filter-branch' options from +revision options, and the '--all' to rewrite all branches and tags. + To set a commit (which typically is at the tip of another history) to be the parent of the current initial commit, in order to paste the other history behind the current history: -- 1.6.0.rc1.106.g98a7 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2] Documentation: filter-branch: document how to filter all refs 2008-08-07 14:04 ` [PATCH] Documentation: filter-branch: document how to filter all refs Thomas Rast @ 2008-08-07 14:16 ` Thomas Rast 0 siblings, 0 replies; 40+ messages in thread From: Thomas Rast @ 2008-08-07 14:16 UTC (permalink / raw) To: Jan Wielemaker, git; +Cc: gitster, Johannes Schindelin Document the '--' option that can be used to pass rev-list options (not just arguments), and give an example usage of '-- --all'. Remove reference to "the new branch name"; filter-branch takes arbitrary arguments to rev-list since dfd05e3. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- At second glance, it turned out the documentation was actually older than the code. So rewrite the documentation of <rev-list options>. Documentation/git-filter-branch.txt | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index a518ba6..31d3cae 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -13,7 +13,7 @@ SYNOPSIS [--msg-filter <command>] [--commit-filter <command>] [--tag-name-filter <command>] [--subdirectory-filter <directory>] [--original <namespace>] [-d <directory>] [-f | --force] - [<rev-list options>...] + [--] [<rev-list options>...] DESCRIPTION ----------- @@ -168,10 +168,10 @@ to other tags will be rewritten to point to the underlying commit. 'refs/original/', unless forced. <rev-list options>...:: - When options are given after the new branch name, they will - be passed to 'git-rev-list'. Only commits in the resulting - output will be filtered, although the filtered commits can still - reference parents which are outside of that set. + Arguments for 'git-rev-list'. All positive refs included by + these options are rewritten. You may also specify options + such as '--all', but you must use '--' to separate them from + the 'git-filter-branch' options. Examples @@ -196,6 +196,17 @@ git filter-branch --index-filter 'git rm --cached filename' HEAD Now, you will get the rewritten history saved in HEAD. +To rewrite the repository to look as if 'foodir/' had been its project +root, and discard all other history: + +------------------------------------------------------- +git filter-branch --subdirectory-filter foodir -- --all +------------------------------------------------------- + +Thus you can, e.g., turn a library subdirectory into a repository of +its own. Note the '--' that separates 'filter-branch' options from +revision options, and the '--all' to rewrite all branches and tags. + To set a commit (which typically is at the tip of another history) to be the parent of the current initial commit, in order to paste the other history behind the current history: -- 1.6.0.rc1.106.g98a7 ^ permalink raw reply related [flat|nested] 40+ messages in thread
end of thread, other threads:[~2008-09-14 16:31 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-06 13:39 git filter-branch --subdirectory-filter, still a mistery Jan Wielemaker 2008-08-07 7:13 ` Jan Wielemaker 2008-08-07 7:50 ` Thomas Rast 2008-08-07 10:14 ` Jan Wielemaker 2008-08-07 23:48 ` Thomas Rast 2008-08-07 23:50 ` [PATCH] filter-branch: be more helpful when an annotated tag changes Thomas Rast 2008-08-08 20:10 ` [TOY PATCH] filter-branch: add option --delete-unchanged Thomas Rast 2008-08-09 0:35 ` Johannes Schindelin 2008-08-11 10:43 ` Jan Wielemaker 2008-09-14 16:29 ` Felipe Contreras 2008-08-07 23:54 ` [RFH] filter-branch: ancestor detection weirdness Thomas Rast 2008-08-08 11:42 ` Johannes Schindelin 2008-08-08 14:14 ` Thomas Rast 2008-08-08 14:16 ` [PATCH] filter-branch: fix ancestor discovery for --subdirectory-filter Thomas Rast 2008-08-08 14:39 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 2008-08-08 18:37 ` Thomas Rast 2008-08-08 18:39 ` [PATCH v2] filter-branch: fix ref rewriting with --subdirectory-filter Thomas Rast 2008-08-09 0:16 ` [RFH] filter-branch: ancestor detection weirdness Johannes Schindelin 2008-08-09 1:25 ` Junio C Hamano 2008-08-09 9:25 ` Thomas Rast 2008-08-09 9:35 ` Thomas Rast 2008-08-10 14:02 ` [PATCH] filter-branch: use --simplify-merges Thomas Rast 2008-08-12 1:54 ` Junio C Hamano 2008-08-12 2:13 ` Junio C Hamano 2008-08-12 5:47 ` Thomas Rast 2008-08-12 6:59 ` Junio C Hamano 2008-08-12 8:45 ` [PATCH 0/3] filter-branch --subdirectory-filter improvements Thomas Rast 2008-08-12 12:11 ` Jan Wielemaker 2008-08-12 8:45 ` [PATCH 1/3] filter-branch: Extend test to show rewriting bug Thomas Rast 2008-08-12 8:45 ` [PATCH 2/3] filter-branch: fix ref rewriting with --subdirectory-filter Thomas Rast 2008-08-12 8:45 ` [PATCH 3/3] filter-branch: use --simplify-merges Thomas Rast 2008-08-12 8:18 ` [RFH] filter-branch: ancestor detection weirdness Petr Baudis 2008-08-12 18:33 ` Junio C Hamano 2008-08-09 10:00 ` Thomas Rast 2008-08-12 21:33 ` Junio C Hamano 2008-08-12 22:15 ` Thomas Rast 2008-08-08 7:44 ` git filter-branch --subdirectory-filter, still a mistery Jan Wielemaker 2008-08-08 11:25 ` Jan Wielemaker 2008-08-07 14:04 ` [PATCH] Documentation: filter-branch: document how to filter all refs Thomas Rast 2008-08-07 14:16 ` [PATCH v2] " Thomas Rast
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.