How to structure a project distributed with varyingly interdependent feature branches?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How to structure a project distributed with varyingly interdependent feature branches?
@ 2007-12-31 22:20 Matt McCutchen
  2008-01-02 22:12 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Matt McCutchen @ 2007-12-31 22:20 UTC (permalink / raw)
  To: git

Dear git people,

The rsync source repository has recently been migrated from CVS to git
at my encouragement ( http://git.samba.org/rsync.git/ ), and I am trying
to figure out a better way to structure the repository (to propose to
the maintainer, Wayne Davison, or to use myself).

Rsync presents a unique challenge because it has a number of long-term
feature patches that are maintained and distributed with the main source
code.  Furthermore, some of the patches depend on other patches.  With
CVS, the patch files were just stored in the repository; this worked,
but browsing and developing the branched versions represented by the
patches required a lot of extra work.  My hope now is to represent these
branches and their dependencies properly in git so that the git tools
handle them with the same elegance with which they handle everything
else.

The feature branches seem to present two fundamental problems:

1. How to properly represent the history of an individual branch and
update it when the trunk (or the branch on which it depends) changes.
Right now, Wayne updates the branch by rebasing; unfortunately, if the
trunk changes in such a way that one of the intermediate commits no
longer makes sense, it is impossible to update the branch while
preserving a record that the intermediate commit once existed.  Merging
gives the right history and is very well-supported by the git tools, but
it has one fatal flaw: when a dependency on another branch is removed,
there is no way to undo the merge of the other branch.  One can revert
the tree changes, but the other branch is still present in the ancestry,
which causes problems if the dependency is added again.  It seems to me
that overcoming this flaw would require significant changes to git.  Is
there some other workflow that does what is needed?  Ideally "git blame"
would work correctly.

2. How to manage the large number of branches in a practical way.  The
branch dependency information should be pullable from the main
repository to minimize the need for each developer to configure his/her
repository manually on initial cloning and when dependencies change.
Also, for an rsync release, it should be possible to tag the trunk and
all branches as a unit so that developers can ask questions like "which
branches changed between these two pre-releases?".  My feeling is that
the solution will involve a taggable, pullable master tree object that
contains the trunk and branches as submodules and also some
metainformation (including the dependencies).  However, special scripts
will then be needed to conveniently check out and update the submodules
using a single rsync source/build tree.

I'm sure the rsync project could make do without a fully general
solution for the feature branches, but it would be awesome (and a
testament to git's flexibility) if one existed.  Also, Linux
distributions maintain packages with evolving and potentially
interdependent packages, and a good solution would make it possible for
them to do all their package maintenance and patching with git.  Ubuntu
[1] and Fedora [2] are thinking about this.  From my perspective, such a
setup would make modifying distribution packages much less painful.

[1] https://wiki.ubuntu.com/NoMoreSourcePackages/GitImpl
[2] http://fcp.surfsite.org/modules/newbb/viewtopic.php?topic_id=38716&start=0#forumpost175743

I would appreciate any ideas toward a solution.

I am not subscribed to the list (I once was but the traffic overwhelmed
me), so please be sure to CC me in replies.

Matt

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to structure a project distributed with varyingly interdependent feature branches?
  2007-12-31 22:20 How to structure a project distributed with varyingly interdependent feature branches? Matt McCutchen
@ 2008-01-02 22:12 ` Junio C Hamano
  2008-01-15 19:02   ` Matt McCutchen
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2008-01-02 22:12 UTC (permalink / raw)
  To: Matt McCutchen; +Cc: git

Matt McCutchen <matt@mattmccutchen.net> writes:

> Rsync presents a unique challenge because it has a number of long-term
> feature patches that are maintained and distributed with the main source
> code.  Furthermore, some of the patches depend on other patches.
> ...
> 1. How to properly represent the history of an individual branch and
> update it when the trunk (or the branch on which it depends) changes.
> Right now, Wayne updates the branch by rebasing; unfortunately, if the
> trunk changes in such a way that one of the intermediate commits no
> longer makes sense, it is impossible to update the branch while
> preserving a record that the intermediate commit once existed.

I take this to mean a situation like this:

 * There is a series of patch X Y Z that implements some nicety
   not present in the mainline yet.  This set applies to older
   codebase at point A.

 * Newer codebase B does things differently from codebase A and
   patch X is no longer needed --- IOW, what X achieves on top
   of A has already been incorporated somewhere between A and B.
   Applying Y and Z suffices to obtain that nice feature on top
   of B.

               .---X---Y---Z       .---Y'--Z' 
              /                   /
      ---o---A---o---.....---o---B

This is natively expressed with git, if your history were like
this:

               .----------X------Y-------Z
              /            \
      ---o---A---o---....---M---....---B

and you publish Z.  People who would want to apply the series to
their codebase simply merge Z.  The patch dependency of X is
captured in the mainline history because the effect of X is
incorporated to the mainline as a merge at M.  If the point they
want to apply the series is older than point M (where patch X
becomes unnecessary because of the mainline change), X, Y and Z
will be brought in, while people basing on commits newer than M
will pull in only Y and Z (because they already have X).

But that works only in theory and with a developer with perfect
vision.  Nobody would be able to forsee that X would become such
a crucial change when it is developed.  Often you would just
create a regular commit M somewhere between A and B that has the
effect of X, and later regret that you had made a separate X
forked from an earlier A and recorded M as a merge.

People use quilt (or guilt) to manage patch stacks for things
like this.

Having said that, I think you can maintain a single patch series
(say the above X-Y-Z) and publish it as a few variants for
downloaders as a prefabricated set of branches.

Say if you have this (the first picture), you do not create Y'-Z'
history (again, M is the point patch X becomes unnecessary, but
we live in the real world and it was not recorded as a merge of
X into the mainline):

                 .---X---Y---Z
                /
               /
              /
      ---o---A---o---.....---M---...---o---B

and X-Y-Z _is_ what you would maintain and keep tweaking.  You
can publish Z and people who would want to "apply" the series
can cleanly apply X-Y-Z (or merge Z if they are git users).

But people who would want to "apply" the series to codebase
newer than point M would individually need to resolve conflicts
or know X is unneeded for them.  To alleviate this, you would
publish _another_ commit:

                 .---X---Y---Z
                /             \
               /               Z'
              /               /
      ---o---A---o---.....---M---...---o---B

That is, you merge Z to newer mainline for them to create Z',
and:

 * publish Z' for git users --- they can merge this to their
   codebase that is newer than M and they do not have to resolve
   conflicts you needed to resolve, arising from the overlap
   between X and M, when you created Z'.

 * publish diff between M-Z', for non git users who would want
   to apply it with "patch".

You suggested that you would keep rebasing X--Y--Z while
improving the long living patch series.  You would recreate Z'
whenever the series is updated (and rerere will hopefully help
you re-resolve the conflicts between X and M automatically).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to structure a project distributed with varyingly interdependent feature branches?
  2008-01-02 22:12 ` Junio C Hamano
@ 2008-01-15 19:02   ` Matt McCutchen
  0 siblings, 0 replies; 3+ messages in thread
From: Matt McCutchen @ 2008-01-15 19:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio, thanks for the response.  I am finally getting around to
following up.

On Wed, 2008-01-02 at 14:12 -0800, Junio C Hamano wrote: 
> > 1. How to properly represent the history of an individual branch and
> > update it when the trunk (or the branch on which it depends) changes.
> > Right now, Wayne updates the branch by rebasing; unfortunately, if the
> > trunk changes in such a way that one of the intermediate commits no
> > longer makes sense, it is impossible to update the branch while
> > preserving a record that the intermediate commit once existed.
> 
> I take this to mean a situation like this:
> 
>  * There is a series of patch X Y Z that implements some nicety
>    not present in the mainline yet.  This set applies to older
>    codebase at point A.
> 
>  * Newer codebase B does things differently from codebase A and
>    patch X is no longer needed --- IOW, what X achieves on top
>    of A has already been incorporated somewhere between A and B.
>    Applying Y and Z suffices to obtain that nice feature on top
>    of B.

Actually, I was thinking of a change to the mainline that causes a
conflict with the patch series.  For example, my repository of git was
at point A when I made the first draft X of my "gitweb: snapshot
cleanups & support for offering multiple formats" change.  Then I
updated my repository and got commit B, "gitweb.perl - Optionally send
archives as .zip files", among others.  When I rebased X on top of the
new master C, there was a conflict, which I resolved to produce X':

     X                   X'
    /                   /
---A---...---B---...---C

But now, with refs only to C and X', I have lost the information that
the previous incarnation of X' was X.

Essentially, my objection to rebasing is that I want to keep a history
for the patch series containing all of the patched versions that I have
released (here X and X'), especially when they differ in interesting
ways (i.e., conflict resolutions), and this history should be a
first-class object that others can pull from me via the git remote
system.

I only want to use a separate patch management tool if it is integrated
with git.  I have some familiarity with StGIT, and I assume guilt is
similar.  StGIT uses rebasing and keeps an additional "patch changelog"
viewable by "stg log" which might seem to be what I want.  The trouble
is that this changelog behaves more like a reflog than an orderly
history, and "stg refresh" does not support storing a user-entered
message describing *the change to the patch* in the patch changelog.
One option I am considering is to use StGIT and track some subset of the
StGIT area itself (.git/patches) in git.

The other approach is to maintain the feature patches/branches by
merging instead of rebasing.  This has two significant advantages: patch
history is naturally kept and the full power of git's distributed merge
is available.  However, it also has two significant disadvantages: the
complaint by Linus about "useless merges" mentioned in the git-rerere
manpage applies, and it's impossible to fully revert a merge (the
ancestry remains and will cause trouble if the merge is redone later).

Matt

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-01-15 19:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-31 22:20 How to structure a project distributed with varyingly interdependent feature branches? Matt McCutchen
2008-01-02 22:12 ` Junio C Hamano
2008-01-15 19:02   ` Matt McCutchen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).