Dividing up a large merge.

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Dividing up a large merge.
@ 2009-07-14 23:32 davidb
  2009-07-15  0:16 ` Bryan Donlan
  0 siblings, 1 reply; 13+ messages in thread
From: davidb @ 2009-07-14 23:32 UTC (permalink / raw)
  To: Git Mailing List

I'm trying to figure out a better way of dividing up the effort
involved in a merge amongst a group of people.  Right now, I
basically describe the merge to each of them, and ask them to
merge their part, and then 'git checkout HEAD' the other parts.
They tell me about the commits, along with the files that they've
merged correctly.  When everybody is done, I make a real merge
commit, and pull in all of their files.  It's a lot for me to
track, and confusing for each person.

I'd like to create a branch we can all push to that we gradually
work to become the result of a resolved merge.  Not only does git
not want to help me do the merge, but seems to actively be
fighting against me doing this.

What I thought of was something like telling people to do:

  $ git merge v2.6.30
  resolve some files
  $ git checkout HEAD ...rest of files...
  $ git commit; git push

but that 'rest of files' is fairly large and complicated.  I can
think of two ideas:

  - Something that basically does a partial 'git reset --hard
    HEAD' to put many of the files back.

  - The ability to specify subpaths on the 'git merge' to do the
    merge work but limited to a directory or set of files.

Obviously, either case will require someone to still track the
overall effort and make sure the final state of the tree really
represents the total merge.

Is there anything that can parse the output of 'git merge-tree'?
Even just splitting this up and then applying parts of it would
be helpful.  Would it be useful to write something that can apply
the results output of 'git merge-tree'?

Thanks,
David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-14 23:32 Dividing up a large merge davidb
@ 2009-07-15  0:16 ` Bryan Donlan
  2009-07-15  0:29   ` davidb
  0 siblings, 1 reply; 13+ messages in thread
From: Bryan Donlan @ 2009-07-15  0:16 UTC (permalink / raw)
  To: davidb; +Cc: Git Mailing List

On Tue, Jul 14, 2009 at 7:32 PM, <davidb@quicinc.com> wrote:
> I'm trying to figure out a better way of dividing up the effort
> involved in a merge amongst a group of people.  Right now, I
> basically describe the merge to each of them, and ask them to
> merge their part, and then 'git checkout HEAD' the other parts.
> They tell me about the commits, along with the files that they've
> merged correctly.  When everybody is done, I make a real merge
> commit, and pull in all of their files.  It's a lot for me to
> track, and confusing for each person.

What do you mean by describing a merge? git is designed to have all
the information needed for a merge inherent in the repository history.

> I'd like to create a branch we can all push to that we gradually
> work to become the result of a resolved merge.  Not only does git
> not want to help me do the merge, but seems to actively be
> fighting against me doing this.
>
> What I thought of was something like telling people to do:
>
>  $ git merge v2.6.30
>  resolve some files
>  $ git checkout HEAD ...rest of files...
>  $ git commit; git push
>
> but that 'rest of files' is fairly large and complicated.  I can
> think of two ideas:
>
>  - Something that basically does a partial 'git reset --hard
>    HEAD' to put many of the files back.
>
>  - The ability to specify subpaths on the 'git merge' to do the
>    merge work but limited to a directory or set of files.
>
> Obviously, either case will require someone to still track the
> overall effort and make sure the final state of the tree really
> represents the total merge.
>
> Is there anything that can parse the output of 'git merge-tree'?
> Even just splitting this up and then applying parts of it would
> be helpful.  Would it be useful to write something that can apply
> the results output of 'git merge-tree'?

I'm having a hard time understanding the situation here - why can't you just:
$ git checkout -b mergebranch v2.6.30
$ git merge developer1/topic
# Fix conflicts
$ git merge developer2/topic
# Fix conflicts
# etc

Why are there so many conflicts to make this an issue?

If the commits are isolated to small changes, rebasing the developer
topic branches instead of merging may help, by allowing you to take
conflicts one commit at a time. For example, if your problems are
primarily conflicts between developer branches and upstream:

$ git checkout -b mergebranch-dev1 developer1/topic
$ git rebase v2.6.30
# Fix conflicts on a commit-by-commit basis
$ git checkout -b mergebranch-dev2 developer2/topic
$ git rebase v2.6.30
# Fix conflicts on a commit-by-commit basis
$ git checkout -b mergebranch
$ git merge mergebranch-dev1
# Fix any remaining conflicts

If your problems are because of conflicts between developer branches
and each other:
$ git checkout -b mergebranch-dev1 developer1/topic
$ git rebase v2.6.30
# Fix conflicts on a commit-by-commit basis
$ git checkout -b mergebranch-dev2 developer2/topic
$ git rebase mergebranch-dev1
# Fix conflicts on a commit-by-commit basis

These rebasing approaches will change the commit IDs, so your
developers will need to rebase any further work upon these new commit
IDs, but if things are as bad as you say, a commit-by-commit merge
that rebase allows you may be much simpler.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  0:16 ` Bryan Donlan
@ 2009-07-15  0:29   ` davidb
  2009-07-15  0:34     ` Avery Pennarun
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: davidb @ 2009-07-15  0:29 UTC (permalink / raw)
  To: Bryan Donlan; +Cc: Git Mailing List

On Tue, Jul 14, 2009 at 05:16:54PM -0700, Bryan Donlan wrote:

> What do you mean by describing a merge? git is designed to have all
> the information needed for a merge inherent in the repository history.

Yes, provided you can actually do the merge all at once.

> Why are there so many conflicts to make this an issue?

Because I have to work in the "real world".

> If the commits are isolated to small changes, rebasing the developer
> topic branches instead of merging may help, by allowing you to take
> conflicts one commit at a time. For example, if your problems are
> primarily conflicts between developer branches and upstream:

No real developer branches with conflicts (I make those be
fixed), but several upstreams.  We have many developers busily
doing work, and one or more other companies is also working on
the same code.  Meanwhile, the mainline kernel advances at it's
own astounding rate.

Unfortunately, paying customers will always get priority of work,
even when that position is actually somewhat shortsighted and it
makes for a lot of merge effort later.

The real issue is that there isn't any single individual who
understands all of the code that conflicts.  It has to be divided
up somehow, I'm just trying to figure out a better way of doing
it.

Thanks,
David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  0:29   ` davidb
@ 2009-07-15  0:34     ` Avery Pennarun
  2009-07-15  1:19       ` davidb
  2009-07-15 12:28     ` Theodore Tso
  2009-07-15 18:57     ` Daniel Barkalow
  2 siblings, 1 reply; 13+ messages in thread
From: Avery Pennarun @ 2009-07-15  0:34 UTC (permalink / raw)
  To: davidb; +Cc: Bryan Donlan, Git Mailing List

On Tue, Jul 14, 2009 at 8:29 PM, <davidb@quicinc.com> wrote:
> The real issue is that there isn't any single individual who
> understands all of the code that conflicts.  It has to be divided
> up somehow, I'm just trying to figure out a better way of doing
> it.

How about having one person do the merge, then commit it (including
conflict markers), then have other people just make a series of
commits removing the conflict markers?

Avery

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  0:34     ` Avery Pennarun
@ 2009-07-15  1:19       ` davidb
  2009-07-15  1:29         ` Douglas Campos
  2009-07-15  1:32         ` Avery Pennarun
  0 siblings, 2 replies; 13+ messages in thread
From: davidb @ 2009-07-15  1:19 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Bryan Donlan, Git Mailing List

On Tue, Jul 14, 2009 at 05:34:26PM -0700, Avery Pennarun wrote:

> How about having one person do the merge, then commit it (including
> conflict markers), then have other people just make a series of
> commits removing the conflict markers?

I guess this helps in some sense, but the intermediate result
isn't going to build, and things like mergetool aren't going to
work.  It's helpful for the individuals to have the full merge
conflict available, or at least the stages of the files in
question.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  1:19       ` davidb
@ 2009-07-15  1:29         ` Douglas Campos
  2009-07-15  1:32         ` Avery Pennarun
  1 sibling, 0 replies; 13+ messages in thread
From: Douglas Campos @ 2009-07-15  1:29 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Avery Pennarun, Bryan Donlan

Merging the peer branches before doesn't help it?

On Tue, Jul 14, 2009 at 10:19 PM, <davidb@quicinc.com> wrote:
> On Tue, Jul 14, 2009 at 05:34:26PM -0700, Avery Pennarun wrote:
>
>> How about having one person do the merge, then commit it (including
>> conflict markers), then have other people just make a series of
>> commits removing the conflict markers?
>
> I guess this helps in some sense, but the intermediate result
> isn't going to build, and things like mergetool aren't going to
> work.  It's helpful for the individuals to have the full merge
> conflict available, or at least the stages of the files in
> question.
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Douglas Campos
Theros Consulting
+55 11 7626 5959
+55 11 3020 8168

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  1:19       ` davidb
  2009-07-15  1:29         ` Douglas Campos
@ 2009-07-15  1:32         ` Avery Pennarun
  1 sibling, 0 replies; 13+ messages in thread
From: Avery Pennarun @ 2009-07-15  1:32 UTC (permalink / raw)
  To: davidb; +Cc: Bryan Donlan, Git Mailing List

On Tue, Jul 14, 2009 at 9:19 PM, <davidb@quicinc.com> wrote:
> On Tue, Jul 14, 2009 at 05:34:26PM -0700, Avery Pennarun wrote:
>> How about having one person do the merge, then commit it (including
>> conflict markers), then have other people just make a series of
>> commits removing the conflict markers?
>
> I guess this helps in some sense, but the intermediate result
> isn't going to build, and things like mergetool aren't going to
> work.  It's helpful for the individuals to have the full merge
> conflict available, or at least the stages of the files in
> question.

It sounds like you're going in circles a bit here.  You want the full
merge conflict available - but you want it to be able to build.

It sounds like the "git reset the unwanted subdirs" solution suggested
earlier is the only option that will really work.  You could simplify
life for your co-workers by writing a script to automate the steps, I
suppose.

You probably want all the individuals to use merge --squash, so that
you don't mark the history as merged until you're done.  Then you
combine all their work at the end and mark the commit as done using
'git merge -s ours'.

Avery

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  0:29   ` davidb
  2009-07-15  0:34     ` Avery Pennarun
@ 2009-07-15 12:28     ` Theodore Tso
  2009-07-15 13:39       ` Jakub Narebski
  2009-07-15 14:47       ` Larry D'Anna
  2009-07-15 18:57     ` Daniel Barkalow
  2 siblings, 2 replies; 13+ messages in thread
From: Theodore Tso @ 2009-07-15 12:28 UTC (permalink / raw)
  To: davidb; +Cc: Bryan Donlan, Git Mailing List

On Tue, Jul 14, 2009 at 05:29:26PM -0700, davidb@quicinc.com wrote:
> No real developer branches with conflicts (I make those be
> fixed), but several upstreams.  We have many developers busily
> doing work, and one or more other companies is also working on
> the same code.  Meanwhile, the mainline kernel advances at it's
> own astounding rate.

If you hare maintaining a large number of changes over a long-term
(which in the case of the kernel can be measured in a month or two),
it's often much easier to maintain things as a series of patches.

That way you can merge each patch one at a time.

If you already have everything in a git tree, I'd suggest pulling it
apart into separate patches, by using "git format-patch".  Note that
if you have multiple merges into tree, this will go much more smoothly
if you can separate things into a single linear stream.

This is also a good reason why if you have partial work that is
complete enough to be merged into mainline, it is ***much*** better to
try pushing patches to mainline earlier rather than later.  Waiting
until you are 100% done and the work is completely certified involves
a large number of risks; for example, what if people complain about
work that was done early on?  Or if the design was fundamentally
flawed from the get-go?  At the minimum, you will save a huge amount
of effort if you post a request-for-comment version of the patches up
front.

And, if you believe your release cycle is going to run for more than,
say, 2-3 months, I suggest that you keep things in a single linear
patch stream.  You can keep the patch series under git control, and
then rebase periodically; I'd suggest rebasing once a mainline release
happens (i.e., when 2.6.X is released), and then again after most of
the major changes have been merged in and the tree has settled down
(i.e., after 2.6.X-rc2 or 2.6.X-rc3).

> The real issue is that there isn't any single individual who
> understands all of the code that conflicts.  It has to be divided
> up somehow, I'm just trying to figure out a better way of doing
> it.

Yeah, that's another prime argument for maintaining your changes as a
patch queue.  I use a combination of quilt plus git.  So the rebasing
methodology becomes:

# pop all patches
guilt pop -a			
# update the base of the patches
git pull origin			
# start trying to apply each of the patches, one at a time
# next_patch:
guilt push -a
# when you get a failure, the push will stop and tell you it can't 
# apply a patch; so force apply the patch:
guilt push -f
# 
# this will leave some patch .rej files; resolve the patch failures
# for all of the files.    Use "git add" once the patches have been resolved
# also make sure that any files that were added by the patch that was 
# force applied are also manually marked as needing added using "git add".
# Once you are sure the patch is properly merged, do this:
guilt refresh --diffstat
# Check the changes made to the patch; I normally create a symlink from
# .git/patches/<work-branch-for-quilt> to patches in the top level, i.e.
# "ln -s .git/patches/master patches"; if you can't remember the name of the
# patch, you can get it via the command "guilt applied | tail -1"
(cd patches; git diff name-of-patch)
# now repeat with the next set of patches by going back to next_patch, above

I normally keep an indication of the version that the patch series is
based upon via a comment in the first line of the series file, like
this: "# BASE v2.6.30-rc3" or sometimes like this "# BASE 6ab2792".
This can be useful when creating automated scripts to test the patch
series, since they know what version to apply the patches against.

In your case, the first person to start the rebase should change the
"# BASE" comment, and then apply those patches which he/she is most
familiar with.  When you hit a point where you need someone else's
expertise, you can do a "(cd patches; git commit -a)" to commit all of
the changes in the patch queue so far, and then let someone else take
over.  

They would then do:

# Pop all of the patches off the next developers work directory
guilt pop -a
# Update the patch queue
(cd patches; guilt pull)
# Now we need to make sure we have the latest kernel patches from mainline
git fetch
# Now update the work directory to the version specified by the patch
# series file
git merge $(head patches/series | sed -e 's/# BASE //')
# Now resume trying to apply patches, one at a time...
# next_patch
guilt push -a
# if there is a failed patch, force apply it and resolve patch rejects
guilt push -f
# refresh the patch
guilt refresh --diffstat
# .... and so on

My biggest suggestion, though, is to try to merge partial work earlier
rather than later.  I'd try getting a partially functioning device
driver merged first, and then try to get the optimizations applied
earlier.  If you don't want people using it in production, that's what
the EXPERIMENTAL tag is for...

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15 12:28     ` Theodore Tso
@ 2009-07-15 13:39       ` Jakub Narebski
  2009-07-15 16:07         ` Theodore Tso
  2009-07-15 14:47       ` Larry D'Anna
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2009-07-15 13:39 UTC (permalink / raw)
  To: Theodore Tso; +Cc: davidb, Bryan Donlan, Git Mailing List

Theodore Tso <tytso@mit.edu> writes:

> Yeah, that's another prime argument for maintaining your changes as a
> patch queue.  I use a combination of quilt plus git.

Why not StGit, or Guilt, or TopGit?

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15 12:28     ` Theodore Tso
  2009-07-15 13:39       ` Jakub Narebski
@ 2009-07-15 14:47       ` Larry D'Anna
  1 sibling, 0 replies; 13+ messages in thread
From: Larry D'Anna @ 2009-07-15 14:47 UTC (permalink / raw)
  To: Theodore Tso; +Cc: davidb, Bryan Donlan, Git Mailing List

* Theodore Tso (tytso@mit.edu) [090715 08:28]:
> And, if you believe your release cycle is going to run for more than,
> say, 2-3 months, I suggest that you keep things in a single linear
> patch stream.  You can keep the patch series under git control, and
> then rebase periodically; I'd suggest rebasing once a mainline release
> happens (i.e., when 2.6.X is released), and then again after most of
> the major changes have been merged in and the tree has settled down
> (i.e., after 2.6.X-rc2 or 2.6.X-rc3).

or use TopGit

http://repo.or.cz/w/topgit.git

        --larry

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15 13:39       ` Jakub Narebski
@ 2009-07-15 16:07         ` Theodore Tso
  0 siblings, 0 replies; 13+ messages in thread
From: Theodore Tso @ 2009-07-15 16:07 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: davidb, Bryan Donlan, Git Mailing List

On Wed, Jul 15, 2009 at 06:39:46AM -0700, Jakub Narebski wrote:
> Theodore Tso <tytso@mit.edu> writes:
> 
> > Yeah, that's another prime argument for maintaining your changes as a
> > patch queue.  I use a combination of quilt plus git.
> 
> Why not StGit, or Guilt, or TopGit?

Sorry, typo; that should have read "guilt".  The example workflow I
included used guilt commands.

					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15  0:29   ` davidb
  2009-07-15  0:34     ` Avery Pennarun
  2009-07-15 12:28     ` Theodore Tso
@ 2009-07-15 18:57     ` Daniel Barkalow
  2009-07-15 21:01       ` davidb
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel Barkalow @ 2009-07-15 18:57 UTC (permalink / raw)
  To: davidb; +Cc: Bryan Donlan, Git Mailing List

On Tue, 14 Jul 2009, davidb@quicinc.com wrote:

> On Tue, Jul 14, 2009 at 05:16:54PM -0700, Bryan Donlan wrote:
> 
> > What do you mean by describing a merge? git is designed to have all
> > the information needed for a merge inherent in the repository history.
> 
> Yes, provided you can actually do the merge all at once.
> 
> > Why are there so many conflicts to make this an issue?
> 
> Because I have to work in the "real world".
> 
> > If the commits are isolated to small changes, rebasing the developer
> > topic branches instead of merging may help, by allowing you to take
> > conflicts one commit at a time. For example, if your problems are
> > primarily conflicts between developer branches and upstream:
> 
> No real developer branches with conflicts (I make those be
> fixed), but several upstreams.  We have many developers busily
> doing work, and one or more other companies is also working on
> the same code.  Meanwhile, the mainline kernel advances at it's
> own astounding rate.
> 
> Unfortunately, paying customers will always get priority of work,
> even when that position is actually somewhat shortsighted and it
> makes for a lot of merge effort later.
> 
> The real issue is that there isn't any single individual who
> understands all of the code that conflicts.  It has to be divided
> up somehow, I'm just trying to figure out a better way of doing
> it.

It sounds to me like you're maintaining an internal version that everybody 
merges their stuff into, and you periodically merge that with the mainline 
kernel (generating a lot of conflicts which have to be resolved at the 
same time). Instead of merging the branch that contains a lot of merges, 
it would probably be easier to merge into a clone of mainline each of the 
things that was merged before. That is, instead of merging less than all 
of two trees, you'd merge commits which are not the newest commit on the 
branch, choosing ones that individuals can resolve.

This also has the advantage where, if two of the changes affect an API 
that's used in various different places, one person will get the 
responsibility of resolving each of those conflicts, despite them being in 
the middle of code they don't really understand, because they understand 
what happened with the API and therefore what has to be done in that 
little spot. Dividing the merge up by parts of the content would split 
this work among people who aren't looking at the conflict in the 
definition of the API.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Dividing up a large merge.
  2009-07-15 18:57     ` Daniel Barkalow
@ 2009-07-15 21:01       ` davidb
  0 siblings, 0 replies; 13+ messages in thread
From: davidb @ 2009-07-15 21:01 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Bryan Donlan, Git Mailing List

On Wed, Jul 15, 2009 at 11:57:59AM -0700, Daniel Barkalow wrote:

> It sounds to me like you're maintaining an internal version that everybody 
> merges their stuff into, and you periodically merge that with the mainline 
> kernel (generating a lot of conflicts which have to be resolved at the 
> same time). Instead of merging the branch that contains a lot of merges, 
> it would probably be easier to merge into a clone of mainline each of the 
> things that was merged before. That is, instead of merging less than all 
> of two trees, you'd merge commits which are not the newest commit on the 
> branch, choosing ones that individuals can resolve.

That's part of it, although I have a pretty good handle on that
part.

The place where this comes up is that people in company X are
working on an internal version and company Y are working on a
similar internal version.  We need to share back and forth
between these more frequently than stuff gets into the mainline.

We do rebase at various points, but it takes quite a bit of work,
and it's fairly different work than the conflicts I'm concerned
with here.

Thanks,
David

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-07-15 21:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-14 23:32 Dividing up a large merge davidb
2009-07-15  0:16 ` Bryan Donlan
2009-07-15  0:29   ` davidb
2009-07-15  0:34     ` Avery Pennarun
2009-07-15  1:19       ` davidb
2009-07-15  1:29         ` Douglas Campos
2009-07-15  1:32         ` Avery Pennarun
2009-07-15 12:28     ` Theodore Tso
2009-07-15 13:39       ` Jakub Narebski
2009-07-15 16:07         ` Theodore Tso
2009-07-15 14:47       ` Larry D'Anna
2009-07-15 18:57     ` Daniel Barkalow
2009-07-15 21:01       ` davidb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).