git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Ogilvie <mmogilvi_git@miniinfo.net>
To: git@vger.kernel.org
Cc: greened@obbligato.org, amdmi3@amdmi3.ru, john@keeping.me.uk,
	techlivezheng@gmail.com, apenwarr@gmail.com,
	Matthew Ogilvie <mmogilvi_git@miniinfo.net>
Subject: [PATCH/BAD 4/4] subtree: poor bugfix for split new commits with parents before previous split
Date: Sat,  7 Dec 2013 11:21:25 -0700	[thread overview]
Message-ID: <1386440485-3092-4-git-send-email-mmogilvi_git@miniinfo.net> (raw)
In-Reply-To: <1386440485-3092-1-git-send-email-mmogilvi_git@miniinfo.net>

Bug description: Unless you use --ignore-joins, "git subtree split"'s
optimization to avoid re-scanning all of history can trim too much.
Any new merged branches that have parents before the previous "split"
will not be re-attached properly in the split-off subtree.
In the extreme case (if all the changes occurred on such
branches), I've seen the old subtree head might not be an
ancestor of the new head at all.

This "fix" is only for illustration.  It probably should not be
included as-is; it probably scales worse than using --ignore-joins
for anything much beyond trivial cases.

I'm not sure it is possible to speed it up much (while remaining
correct) without switching to a better language than shell script.

I'm not sure how to explain the constraint clearly (or even
precisely define the "minimal" constraint).  One attempt:
Exclude from "unrevs" any rejoin commit X for which ANY new commit
(not just revs itself) has some other common ancestor (merge base)
besides X.  ("New commit" is one that is not an ancestor of
some previous rejoin commit.)

It would probably be fairly straightforward to adapt
graph traversal algorithms to do this efficiently, but as near
as I can tell, none of the existing options in git-rev-list or
git-merge-base really does anything useful for optimizing this
in shell script...

The simplest traversal technique to fix this might
be if there was a way to make history traversal only stop
at SPECIFIC unrevs, instead of any ancestor of any unrevs.

Other workarounds besides this patch:
   * Use --ignore-joins and wait for the really slow processing...
   * Plan ahead and delay: After using git subtree split --rejoin,
     leave the rejoin merge commit off on a side branch until such time
     that all other pre-split branches have been merged.  Then
     "merge the merge" at that later time.  Only do rejoins fairly
     rarely, based on the schedule of merging other branches.
   * Ignore the problem: Allow the occasional falsely-disconnected
     branch roots to be generated by split.  Of course, this
     means --ignore-joins would no longer generate a consistent
     subtree history, it is hard to examine what actually changed
     in the false roots, etc.
   * Perhaps manually use temporary grafts or similar to hide rejoins
     that would throw it off when doing later splits.

[Intentionally not signed off: Poor performance.]
---

Testing: Below I include a script I've been using to help with
mostly-manual testing.  I regularly modify it based on
what I'm testing at the moment.  It basically
creates and manipulates a subtree out of git's own "contrib"
directory.

It may be convenient to use this on a copy of git's
repository instead of the copy in which you are messing with
git-subtree (avoiding changing code out from under the script).
It may also be convenient to symlink git-subtree to the top-level
directory, rather than copy it, so you don't need to keep
re-copying it.

------CUT------
#!/bin/sh

die()
{ echo "$@" 1>&2
  exit 1
}

GIT=../git-sandbox/bin-wrappers/git
#GIT=git

DBG=
#DBG=-d

EDIT=
#EDIT=--edit

git branch -D br-contrib

git checkout 73bbc0796b4ce65bfb1a12b47a0edc27845ecf50 || die "checkout"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin $EDIT || die "subtree 1"

git tag -f step1

git merge -m 'MY MERGE1' 68a65f5fe54c2b21bfe16ef3a0b48956ecf5658a ||
        die "merge 1"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin || die "subtree 2"

git tag -f step2

git merge -m 'MY MERGE2' 15f7221686eac053902b906c278680b485c865ce || \
        die "merge 2"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin $EDIT || die "subtree 3"

#####
if [ -n "$EDIT" ]; then
    exit 0
fi

git tag -f step3

git merge -m 'MY MERGE3' 26145c9c73ed51bbd8261949d31899d6507519d5 || \
        die "merge 3"

"$GIT" subtree split -P contrib \
         $DBG --branch br-contrib --squash --rejoin || die "subtree 4"

git tag -f step4

git merge -m 'MY MERGE4' 583736c0bcf09adaa5621b142d8e43c22354041b || \
        die "merge 4"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin || die "subtree 5"

git tag -f step5
------CUT------



 contrib/subtree/git-subtree.sh | 61 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index ac82b4d..6ff4362 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -36,6 +36,8 @@ PATH=$PATH:$(git --exec-path)
 
 require_work_tree
 
+nl='
+'
 quiet=
 branch=
 debug=
@@ -224,6 +226,13 @@ try_remove_previous()
 	fi
 }
 
+try_remove()
+{
+	if rev_exists "$1"; then
+		echo "$1"
+	fi
+}
+
 find_latest_squash()
 {
 	debug "Looking for latest squash ($dir)..."
@@ -293,8 +302,8 @@ find_existing_splits()
 					debug "  Prior: $main -> $sub"
 					cache_set $main $sub
 					cache_set $sub $sub
-					try_remove_previous "$main"
-					try_remove_previous "$sub"
+					try_remove "$main"
+					try_remove "$sub"
 				fi
 				main=
 				sub=
@@ -303,6 +312,40 @@ find_existing_splits()
 	done
 }
 
+filter_out_needed_existing()
+{
+	revs="$1"
+	existing="$2"
+	base=
+	git rev-list --merges --parents $revs --not $existing | \
+		sed -e 's/^[^ ]* //' -e "y/ /\\$nl/" | sort | uniq | \
+	{
+		debug "filter_out_needed_existing"
+		debug "  revs: $revs"
+		debug "  existing: $existing"
+		while read needed; do
+			debug "loop: $needed"
+			nextexisting=
+			for exist in $existing ; do
+				tmp="$(git merge-base $exist $needed)"
+				if [ x"$tmp" = x"$exist" -o -z "$tmp" ]; then
+					nextexisting="$nextexisting $exist"
+				fi
+			done
+			existing=$nextexisting
+		done
+		echo "$existing"
+	}
+}
+
+existing_to_unrevs()
+{
+	existing="$1"
+	for exist in $existing ; do
+		try_remove_previous $exist
+	done
+}
+
 copy_commit()
 {
 	# We're going to set some environment vars here, so
@@ -599,10 +642,16 @@ cmd_split()
 		done
 	fi
 	
-	if [ -n "$ignore_joins" ]; then
-		unrevs=
-	else
-		unrevs="$(find_existing_splits "$dir" "$revs")"
+	unrevs=
+	if [ -z "$ignore_joins" ]; then
+		existing="$(find_existing_splits "$dir" "$revs")"
+		if [ -n "$existing" ]; then
+			debug "existing: $existing"
+			existing="$(filter_out_needed_existing "$revs" "$existing")"
+			unrevs="$(existing_to_unrevs "$existing")"
+			debug "base: $base"
+			debug "unrevs: $unrevs"
+		fi
 	fi
 	
 	# We can't restrict rev-list to only $dir here, because some of our
-- 
1.8.3.2

  parent reply	other threads:[~2013-12-07 18:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-23 20:18 [PATCH] subtree: add squash handling for split and push Pierre Penninckx
2013-11-28 18:23 ` Matthew Ogilvie
2013-11-28 22:58   ` Pierre Penninckx
2013-12-07 18:21     ` [PATCH 1/4] subtree: support split --rejoin --squash Matthew Ogilvie
2013-12-07 18:21       ` [PATCH 2/4] subtree: allow --squash and --message with push Matthew Ogilvie
2013-12-07 18:21       ` [PATCH 3/4] subtree: add --edit option Matthew Ogilvie
2013-12-07 18:21       ` Matthew Ogilvie [this message]
2013-12-10 22:46       ` [PATCH 1/4] subtree: support split --rejoin --squash Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1386440485-3092-4-git-send-email-mmogilvi_git@miniinfo.net \
    --to=mmogilvi_git@miniinfo.net \
    --cc=amdmi3@amdmi3.ru \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=greened@obbligato.org \
    --cc=john@keeping.me.uk \
    --cc=techlivezheng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).