[RFC][PATCH] Branch history

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] Branch history
@ 2006-08-04 19:24 Eric W. Biederman
  2006-08-05  3:18 ` Shawn Pearce
  0 siblings, 1 reply; 3+ messages in thread
From: Eric W. Biederman @ 2006-08-04 19:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The problem:
git-rebase, stgit and the like destructively edit the commit history
on a branch.  Making it a challenge to go back to a known good point.

revlog and the like sort of help this but they don't address the
issues that they capture irrelevant points and are not git-prune safe.

With current git the best technique I have found is to always make
a new branch before I would call git-rebase.

After thinking about the problem some more I believe I have found
a rather simple solution to the problem of keeping branch history.

For each branch you want to keep the history of keep 2 branches.
A normal working branch, and a second archive branch that records
the history of the branch you are editing.

The history can be kept simply by placing an additional commit on the
top of each branch.  The new commit on top of each branch will point
to the same tree object as the previous top commit on the branch but
it will have 2 parent commit objects.  The first parent commit object
is the previous top commit object of the branch.  The second parent
commit object is the commit object on top of the previous version of
this branch.

The work flow is you edit a branch to your hearts comment then when
you get to an interesting point you commit the branch to your archive
branch so you can keep track of things.

To gitk and friends the archive branch looks like a series of branch
merges where one input branch is always the same as the merge result.
So all of the git tools work normally.

The implementation is trivial.

The neat thing is that it gives an immutable history of a branch that
is actively being edited.  So if you export your archive branch people
will never see time roll backward.

Below is my patch to implement this idea.  Currently I am storing
the archive branch in .git/refs/archive/$branchname.  And calling
the command to commit a branch git-archive-branch.

I think my initial naming is most likely lacking so suggestions
for something better would be appreciated.

Comments?

Eric

diff --git a/Makefile b/Makefile
index 700c77f..411ae95 100644
--- a/Makefile
+++ b/Makefile
@@ -150,7 +150,7 @@ SCRIPT_SH = \
 	git-applymbox.sh git-applypatch.sh git-am.sh \
 	git-merge.sh git-merge-stupid.sh git-merge-octopus.sh \
 	git-merge-resolve.sh git-merge-ours.sh \
-	git-lost-found.sh git-quiltimport.sh
+	git-lost-found.sh git-quiltimport.sh git-archive-branch.sh

 SCRIPT_PERL = \
 	git-archimport.perl git-cvsimport.perl git-relink.perl \
diff --git a/git-archive-branch.sh b/git-archive-branch.sh
new file mode 100755
index 0000000..00638de
--- /dev/null
+++ b/git-archive-branch.sh
@@ -0,0 +1,131 @@
+#!/bin/sh
+
+USAGE='[-m <message> | -F logfile] [-e]'
+
+. git-sh-setup
+
+headref=$(git-symbolic-ref HEAD | sed -e 's|^refs/heads/||')
+headsha1=$(git-rev-parse "$headref")
+archiveref="refs/archive/$headref"
+
+
+logfile=
+edit_flag=
+no_edit=
+log_given=
+log_message=
+while case "$#" in 0) break;; esac
+do
+  case "$1" in
+  -F|--F|-f|--f|--fi|--fil|--file)
+      case "$#" in 1) usage ;; esac
+      shift
+      no_edit=t
+      log_given=t$log_given
+      logfile="$1"
+      shift
+      ;;
+  -F*|-f*)
+      no_edit=t
+      log_given=t$log_given
+      logfile=`expr "z$1" : 'z-[Ff]\(.*\)'`
+      shift
+      ;;
+  --F=*|--f=*|--fi=*|--fil=*|--file=*)
+      no_edit=t
+      log_given=t$log_given
+      logfile=`expr "z$1" : 'z-[^=]*=\(.*\)'`
+      shift
+      ;;
+  -e|--e|--ed|--edi|--edit)
+      edit_flag=t
+      shift
+      ;;
+  -m|--m|--me|--mes|--mess|--messa|--messag|--message)
+      case "$#" in 1) usage ;; esac
+      shift
+      log_given=m$log_given
+      if test "$log_message" = ''
+      then
+          log_message="$1"
+      else
+          log_message="$log_message
+
+$1"
+      fi
+      no_edit=t
+      shift
+      ;;
+  -m*)
+      log_given=m$log_given
+      if test "$log_message" = ''
+      then
+          log_message=`expr "z$1" : 'z-m\(.*\)'`
+      else
+          log_message="$log_message
+
+`expr "z$1" : 'z-m\(.*\)'`"
+      fi
+      no_edit=t
+      shift
+      ;;
+  --m=*|--me=*|--mes=*|--mess=*|--messa=*|--messag=*|--message=*)
+      log_given=m$log_given
+      if test "$log_message" = ''
+      then
+          log_message=`expr "z$1" : 'z-[^=]*=\(.*\)'`
+      else
+          log_message="$log_message
+
+`expr "z$1" : 'zq-[^=]*=\(.*\)'`"
+      fi
+      no_edit=t
+      shift
+      ;;
+  esac
+done
+case "$edit_flag" in t) no_edit= ;; esac
+
+if test "$log_message" != ""
+then
+	echo "$log_message"
+elif test "$logfile" != ""
+then
+	if test "$logfile" = -
+	then
+		test -t 0 &&
+		echo >&2 "(read log message from standard input)"
+		cat
+	else
+		cat <"$logfile"
+	fi
+fi | git-stripspace > "$GIT_DIR"/COMMIT_EDITMSG
+
+case "$no_edit" in
+'')
+	case "${VISUAL:-$EDITOR},$TERM" in
+	,dumb)
+		echo >&2 "Terminal is dumb but no VISUAL nor EDITOR defined."
+		echo >&2 "Please supply the commit log message using either"
+		echo >&2 "-m or -F option.  A boilerplate log message has"
+		echo >&2 "been prepared in $GIT_DIR/COMMIT_EDITMSG"
+		exit 1
+		;;
+	esac
+	git-var GIT_AUTHOR_IDENT > /dev/null || die
+	git-var GIT_COMMITTER_IDENT > /dev/null || die
+	${VISUAL:-${EDITOR:-vi}} "$GIT_DIR/COMMIT_EDITMSG"
+	;;
+esac
+
+cat $GIT_DIR/COMMIT_EDITMSG | git-stripspace > "$GIT_DIR"/COMMIT_MSG
+
+parents="-p $headsha1"
+if git-rev-parse --verify $archiveref > /dev/null 2> /dev/null; then
+	parents="$parents -p $(git-rev-parse $archiveref)"
+fi
+
+tree=$(git-cat-file commit $headsha1 | sed -n -e 's/^tree \(.*\)$/\1/p') &&
+commit=$(cat $GIT_DIR/COMMIT_MSG | git-commit-tree $tree $parents) 
+git-update-ref "$archiveref" $commit 
+rm -f "$GIT_DIR/COMMIT_MSG" "$GIT_DIR/COMMIT_EDITMSG"

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH] Branch history
  2006-08-04 19:24 [RFC][PATCH] Branch history Eric W. Biederman
@ 2006-08-05  3:18 ` Shawn Pearce
  2006-08-05  9:30   ` Eric W. Biederman
  0 siblings, 1 reply; 3+ messages in thread
From: Shawn Pearce @ 2006-08-05  3:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: git

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> The problem:
> git-rebase, stgit and the like destructively edit the commit history
> on a branch.  Making it a challenge to go back to a known good point.
> 
> revlog and the like sort of help this but they don't address the
> issues that they capture irrelevant points and are not git-prune safe.

How are the points irrelevant?  Each commit/rebase/am/update-ref
is recorded.  That's each change to the branch head.  It appears
as though you are mainly interested in tracking across rebases,
which a reflog would do, assuming you filtered the events down to
only those caused by rebase and ignored the others.

But yea, a reflog is not prune-safe, but it wouldn't be hard to
modify git-prune to also consider the reflog associated with
a ref if its using that ref as a root that must be preserved.
Assuming anyone really wants that as a feature...

> After thinking about the problem some more I believe I have found
> a rather simple solution to the problem of keeping branch history.
> 
> For each branch you want to keep the history of keep 2 branches.
> A normal working branch, and a second archive branch that records
> the history of the branch you are editing.

It would appear as though you are really only tracking rebase events,
as everything else done on the branch is preserved since the work
branch is itself parent #1 for the archive branch commit.  So the
archive branch shows every commit ever done along the main branch,
but also shows itself joining back quite frequently.  Further if you
archive away the work branch without during a rebase since the last
archive then there's really nothing happening except saving a tag
(but as a commit!) on the archive branch.

This creates for a rather messy history, and is more-or-less what
pg does when patches get pushed onto a stack and they can't be
pushed by a simple fast-forward operation.  Reading this history
in gitk is "interesting" at best.  This is the main reason I've
been trying to write `tb` (a topic branch manager, fashioned after
Junio's TO script) but I can't seem to find enough time to get it
finished.

> The neat thing is that it gives an immutable history of a branch that
> is actively being edited.  So if you export your archive branch people
> will never see time roll backward.

Right.  That's an interesting way of handling it, but that branch
is also quite messy as its full of merge commits.  Although it may
be useful to export its going to carry along with it all of the bad
edits and prior rebases made on that branch.  You probably wouldn't
want to merge that branch into a mainline, which means that branch
is likely to be discarded at some point in the future.  When that
happens then nobody can track it anymore and that immutable history
just got mutated out of existance.

I think the right way to deal with these types of branches is to
publicly publish whether or not the branch is going to be expected
to roll backwards in time (due to a rebase type of event) then
let clients always update those branches during pulls, rather
than needing to explicitly mark them with '+' on the client side.

Further good remege tools (git-rerere on steriods) would help
re-resolve conflicts resulting from continous rebasing.  This would
make it easier to maintain such a branch and carry the thing forward;
or to leave it on its original base but to continously remerge
it and the current mainline into a temporary working branch for
testing purposes.

This is largely the policy that Junio uses for the `pu` and
the `next` branches, as well as for the topic branches that he
carries for everyone else doing GIT development.  It appears to be
working rather well, but it certainly could be streamlined better.
My git-rerere2 and tb tools are an attempt to do this, but sadly
they aren't in a useful state yet.  Maybe because they are both
far more complex then what you are doing here.  :-)

Nonethless it is an interesting contribution.  Thank you for taking
the time to send it.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH] Branch history
  2006-08-05  3:18 ` Shawn Pearce
@ 2006-08-05  9:30   ` Eric W. Biederman
  0 siblings, 0 replies; 3+ messages in thread
From: Eric W. Biederman @ 2006-08-05  9:30 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> The problem:
>> git-rebase, stgit and the like destructively edit the commit history
>> on a branch.  Making it a challenge to go back to a known good point.
>> 
>> revlog and the like sort of help this but they don't address the
>> issues that they capture irrelevant points and are not git-prune safe.
>
> How are the points irrelevant?  Each commit/rebase/am/update-ref
> is recorded.  That's each change to the branch head.  It appears
> as though you are mainly interested in tracking across rebases,
> which a reflog would do, assuming you filtered the events down to
> only those caused by rebase and ignored the others.

It tracks each change, it does not track the changes that humans find
interesting.  That can easily be a lot of noise.

I don't want to see every time a head is updated any more than I want
single keystroke level version control.  Way too much uninteresting
detail.

> But yea, a reflog is not prune-safe, but it wouldn't be hard to
> modify git-prune to also consider the reflog associated with
> a ref if its using that ref as a root that must be preserved.
> Assuming anyone really wants that as a feature...

I do.  I also want history I can clone between repositories.
I have times I have had to look 9 months back to see where I accidentally
dropped a patch.

>> After thinking about the problem some more I believe I have found
>> a rather simple solution to the problem of keeping branch history.
>> 
>> For each branch you want to keep the history of keep 2 branches.
>> A normal working branch, and a second archive branch that records
>> the history of the branch you are editing.
>
> It would appear as though you are really only tracking rebase events,
> as everything else done on the branch is preserved since the work
> branch is itself parent #1 for the archive branch commit.  So the
> archive branch shows every commit ever done along the main branch,
> but also shows itself joining back quite frequently.  Further if you
> archive away the work branch without during a rebase since the last
> archive then there's really nothing happening except saving a tag
> (but as a commit!) on the archive branch.

True.  But that is largely the wrong way to think about it.  I am
saving away a branch at times it is interesting to a human being.
There are also other tools and other methods of editing a branch
besides git-rebase.

> This creates for a rather messy history, and is more-or-less what
> pg does when patches get pushed onto a stack and they can't be
> pushed by a simple fast-forward operation.  Reading this history
> in gitk is "interesting" at best.  This is the main reason I've
> been trying to write `tb` (a topic branch manager, fashioned after
> Junio's TO script) but I can't seem to find enough time to get it
> finished.

I just took a quick look at pg, and while the mechanism may be
similar I believe the goals are fundamentally different.  I am
trying to record the history at points human beings care about,
pg seems to do something automatically behind the scenes, with
the existing model.

The points I am recording the history are points at which I want a
human commit message, because these are points in time meaningful to
me.  The ideal companion would be something that could just walk my
branch history and pull it out.  So when generating an overview
message I could easily generate a summary of how I had been editing my
patches.

>> The neat thing is that it gives an immutable history of a branch that
>> is actively being edited.  So if you export your archive branch people
>> will never see time roll backward.
>
> Right.  That's an interesting way of handling it, but that branch
> is also quite messy as its full of merge commits.  Although it may
> be useful to export its going to carry along with it all of the bad
> edits and prior rebases made on that branch.  You probably wouldn't
> want to merge that branch into a mainline, which means that branch
> is likely to be discarded at some point in the future.  When that
> happens then nobody can track it anymore and that immutable history
> just got mutated out of existance.

Yes.  But it is interesting until it gets merged into mainline, and
keeping around in the developers own archives.  Mistakes can be
interesting.  I don't expect that there will be a need for keeping
the mistakes after a branch is perfected and merged into mainline.
Until the branch is perfected though I fully expect there to be bad
branch history edits that need to be fixed.

The point at which the immutable history goes out of existence is
the point where the branch stops being interesting as an entity
in it's own right.  So I think that is exactly the right behavior.

> I think the right way to deal with these types of branches is to
> publicly publish whether or not the branch is going to be expected
> to roll backwards in time (due to a rebase type of event) then
> let clients always update those branches during pulls, rather
> than needing to explicitly mark them with '+' on the client side.

Not if part of the problem is distributing the work of coming up
with a perfect patch set.  If you don't distribute the history
it is hard to see what someone has really changed.  You can't help
me undo a branch editing mistake if you don't have the previous
version of the branch.  It is hard to verify I actually fixed what
you are concerned about if you don't have the old version to compare
against.

> Further good remege tools (git-rerere on steriods) would help
> re-resolve conflicts resulting from continous rebasing.  This would
> make it easier to maintain such a branch and carry the thing forward;
> or to leave it on its original base but to continously remerge
> it and the current mainline into a temporary working branch for
> testing purposes.

Rebase is not the primary operation.  I have one basic branch that
I have 10 copies of against v2.6.18-rc3.  Refactoring, debugging,
and perfecting patches is a much more interesting event than rebasing.
Although rebasing does happen as well.

If you look at the -mm tree it tends to have 2-3 releases before
getting rebased.

> This is largely the policy that Junio uses for the `pu` and
> the `next` branches, as well as for the topic branches that he
> carries for everyone else doing GIT development.  It appears to be
> working rather well, but it certainly could be streamlined better.
> My git-rerere2 and tb tools are an attempt to do this, but sadly
> they aren't in a useful state yet.  Maybe because they are both
> far more complex then what you are doing here.  :-)

To some extent I have a very interesting subset of kernel development.
Most of my changes are to systems that I am not a maintainer of.
Most of my changes are substantial, and scary because they touch
fundamental things.  Most of my change involve many interdependent
patches, so topic branches cannot solve my problems.

For edits stgit git-rebase certainly can help, and I clearly
anticipate better tools in that vein, as well as better tools
for dealing with topic branches.

But that isn't the problem I am trying to solve here.  I am trying
to implement version control for branch edits, (with maximum
capability with the existing git).

> Nonethless it is an interesting contribution.  Thank you for taking
> the time to send it.

Welcome. 

I think by making branch edit history something fundamental, we
achieve some fairly substantial things.
- We don't care about how the operations to edit a branch are
  implemented, making them simpler to write.
- We begin to allow distributed branch editing.
- Branches become primary objects we can work with.

Hopefully I have stirred up the pot enough to allow some interesting
things.

Eric

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-08-05  9:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-04 19:24 [RFC][PATCH] Branch history Eric W. Biederman
2006-08-05  3:18 ` Shawn Pearce
2006-08-05  9:30   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).