From: Matthew Ogilvie <mmogilvi_git@miniinfo.net>
To: git@vger.kernel.org
Cc: greened@obbligato.org, amdmi3@amdmi3.ru, john@keeping.me.uk,
techlivezheng@gmail.com, apenwarr@gmail.com,
Matthew Ogilvie <mmogilvi_git@miniinfo.net>
Subject: [PATCH/BAD 4/4] subtree: poor bugfix for split new commits with parents before previous split
Date: Sat, 7 Dec 2013 11:21:25 -0700 [thread overview]
Message-ID: <1386440485-3092-4-git-send-email-mmogilvi_git@miniinfo.net> (raw)
In-Reply-To: <1386440485-3092-1-git-send-email-mmogilvi_git@miniinfo.net>
Bug description: Unless you use --ignore-joins, "git subtree split"'s
optimization to avoid re-scanning all of history can trim too much.
Any new merged branches that have parents before the previous "split"
will not be re-attached properly in the split-off subtree.
In the extreme case (if all the changes occurred on such
branches), I've seen the old subtree head might not be an
ancestor of the new head at all.
This "fix" is only for illustration. It probably should not be
included as-is; it probably scales worse than using --ignore-joins
for anything much beyond trivial cases.
I'm not sure it is possible to speed it up much (while remaining
correct) without switching to a better language than shell script.
I'm not sure how to explain the constraint clearly (or even
precisely define the "minimal" constraint). One attempt:
Exclude from "unrevs" any rejoin commit X for which ANY new commit
(not just revs itself) has some other common ancestor (merge base)
besides X. ("New commit" is one that is not an ancestor of
some previous rejoin commit.)
It would probably be fairly straightforward to adapt
graph traversal algorithms to do this efficiently, but as near
as I can tell, none of the existing options in git-rev-list or
git-merge-base really does anything useful for optimizing this
in shell script...
The simplest traversal technique to fix this might
be if there was a way to make history traversal only stop
at SPECIFIC unrevs, instead of any ancestor of any unrevs.
Other workarounds besides this patch:
* Use --ignore-joins and wait for the really slow processing...
* Plan ahead and delay: After using git subtree split --rejoin,
leave the rejoin merge commit off on a side branch until such time
that all other pre-split branches have been merged. Then
"merge the merge" at that later time. Only do rejoins fairly
rarely, based on the schedule of merging other branches.
* Ignore the problem: Allow the occasional falsely-disconnected
branch roots to be generated by split. Of course, this
means --ignore-joins would no longer generate a consistent
subtree history, it is hard to examine what actually changed
in the false roots, etc.
* Perhaps manually use temporary grafts or similar to hide rejoins
that would throw it off when doing later splits.
[Intentionally not signed off: Poor performance.]
---
Testing: Below I include a script I've been using to help with
mostly-manual testing. I regularly modify it based on
what I'm testing at the moment. It basically
creates and manipulates a subtree out of git's own "contrib"
directory.
It may be convenient to use this on a copy of git's
repository instead of the copy in which you are messing with
git-subtree (avoiding changing code out from under the script).
It may also be convenient to symlink git-subtree to the top-level
directory, rather than copy it, so you don't need to keep
re-copying it.
------CUT------
#!/bin/sh
die()
{ echo "$@" 1>&2
exit 1
}
GIT=../git-sandbox/bin-wrappers/git
#GIT=git
DBG=
#DBG=-d
EDIT=
#EDIT=--edit
git branch -D br-contrib
git checkout 73bbc0796b4ce65bfb1a12b47a0edc27845ecf50 || die "checkout"
"$GIT" subtree split -P contrib \
--branch br-contrib --squash --rejoin $EDIT || die "subtree 1"
git tag -f step1
git merge -m 'MY MERGE1' 68a65f5fe54c2b21bfe16ef3a0b48956ecf5658a ||
die "merge 1"
"$GIT" subtree split -P contrib \
--branch br-contrib --squash --rejoin || die "subtree 2"
git tag -f step2
git merge -m 'MY MERGE2' 15f7221686eac053902b906c278680b485c865ce || \
die "merge 2"
"$GIT" subtree split -P contrib \
--branch br-contrib --squash --rejoin $EDIT || die "subtree 3"
#####
if [ -n "$EDIT" ]; then
exit 0
fi
git tag -f step3
git merge -m 'MY MERGE3' 26145c9c73ed51bbd8261949d31899d6507519d5 || \
die "merge 3"
"$GIT" subtree split -P contrib \
$DBG --branch br-contrib --squash --rejoin || die "subtree 4"
git tag -f step4
git merge -m 'MY MERGE4' 583736c0bcf09adaa5621b142d8e43c22354041b || \
die "merge 4"
"$GIT" subtree split -P contrib \
--branch br-contrib --squash --rejoin || die "subtree 5"
git tag -f step5
------CUT------
contrib/subtree/git-subtree.sh | 61 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 55 insertions(+), 6 deletions(-)
diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index ac82b4d..6ff4362 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -36,6 +36,8 @@ PATH=$PATH:$(git --exec-path)
require_work_tree
+nl='
+'
quiet=
branch=
debug=
@@ -224,6 +226,13 @@ try_remove_previous()
fi
}
+try_remove()
+{
+ if rev_exists "$1"; then
+ echo "$1"
+ fi
+}
+
find_latest_squash()
{
debug "Looking for latest squash ($dir)..."
@@ -293,8 +302,8 @@ find_existing_splits()
debug " Prior: $main -> $sub"
cache_set $main $sub
cache_set $sub $sub
- try_remove_previous "$main"
- try_remove_previous "$sub"
+ try_remove "$main"
+ try_remove "$sub"
fi
main=
sub=
@@ -303,6 +312,40 @@ find_existing_splits()
done
}
+filter_out_needed_existing()
+{
+ revs="$1"
+ existing="$2"
+ base=
+ git rev-list --merges --parents $revs --not $existing | \
+ sed -e 's/^[^ ]* //' -e "y/ /\\$nl/" | sort | uniq | \
+ {
+ debug "filter_out_needed_existing"
+ debug " revs: $revs"
+ debug " existing: $existing"
+ while read needed; do
+ debug "loop: $needed"
+ nextexisting=
+ for exist in $existing ; do
+ tmp="$(git merge-base $exist $needed)"
+ if [ x"$tmp" = x"$exist" -o -z "$tmp" ]; then
+ nextexisting="$nextexisting $exist"
+ fi
+ done
+ existing=$nextexisting
+ done
+ echo "$existing"
+ }
+}
+
+existing_to_unrevs()
+{
+ existing="$1"
+ for exist in $existing ; do
+ try_remove_previous $exist
+ done
+}
+
copy_commit()
{
# We're going to set some environment vars here, so
@@ -599,10 +642,16 @@ cmd_split()
done
fi
- if [ -n "$ignore_joins" ]; then
- unrevs=
- else
- unrevs="$(find_existing_splits "$dir" "$revs")"
+ unrevs=
+ if [ -z "$ignore_joins" ]; then
+ existing="$(find_existing_splits "$dir" "$revs")"
+ if [ -n "$existing" ]; then
+ debug "existing: $existing"
+ existing="$(filter_out_needed_existing "$revs" "$existing")"
+ unrevs="$(existing_to_unrevs "$existing")"
+ debug "base: $base"
+ debug "unrevs: $unrevs"
+ fi
fi
# We can't restrict rev-list to only $dir here, because some of our
--
1.8.3.2
next prev parent reply other threads:[~2013-12-07 18:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-23 20:18 [PATCH] subtree: add squash handling for split and push Pierre Penninckx
2013-11-28 18:23 ` Matthew Ogilvie
2013-11-28 22:58 ` Pierre Penninckx
2013-12-07 18:21 ` [PATCH 1/4] subtree: support split --rejoin --squash Matthew Ogilvie
2013-12-07 18:21 ` [PATCH 2/4] subtree: allow --squash and --message with push Matthew Ogilvie
2013-12-07 18:21 ` [PATCH 3/4] subtree: add --edit option Matthew Ogilvie
2013-12-07 18:21 ` Matthew Ogilvie [this message]
2013-12-10 22:46 ` [PATCH 1/4] subtree: support split --rejoin --squash Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1386440485-3092-4-git-send-email-mmogilvi_git@miniinfo.net \
--to=mmogilvi_git@miniinfo.net \
--cc=amdmi3@amdmi3.ru \
--cc=apenwarr@gmail.com \
--cc=git@vger.kernel.org \
--cc=greened@obbligato.org \
--cc=john@keeping.me.uk \
--cc=techlivezheng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).