"git merge" merges too much!

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* "git merge" merges too much!
@ 2009-11-29  3:21 Greg A. Woods
  2009-11-29  5:14 ` Jeff King
  2009-11-29  5:15 ` "git merge" merges too much! Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-11-29  3:21 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2828 bytes --]

I'm trying to learn to use Git to manage local changes.  I'm very new to
Git (hi all!), but not new at all to version tracking tools in general.

Hope I've found the right list on which to ask potentially naive
questions!  I've been doing _lots_ of reading about Git, but I can't
seem to find anything about the problems I relate below.

One task I'm working on is to try to find the best way to merge changes
made from one branch to another, eg. to propagate local fixes from one
release to another.

However in at least one simple case "git merge" merges too much.

I have something like this started from a remote-cloned repository where
BL1.2 is a branch from the remote master HEAD, which happens to
correspond to a tag "TR1.2", the release-1.2 tag, and I've made three
local commits to my local BL1.2 branch: A, B, and C:

                                         BL1.2 - A - B - C  <- BL1.2 HEAD
                                        /
master 1 - 2 - TR1.1 - 3 - 4 - 5 - TR1.2  <- master HEAD

(there are no "release" branches in this project, just tags on the
master branch to represent release points -- is there any way to get
"git log" to show which tags are associated with a given commit and/or
branch?  The real project is freedesktop.org's xinit repo, but the real
tree is too messy to diagram here -- hopefully I've extracted the
essence of the problem correctly)

I now want to create a branch "BL1.1" and merge commits A, B, and C to it
in order to back-port my local fixes to the TR1.1 release.  "TR1.1" is
simply a tag on the origin/master trunk.

I do the following:

	git checkout -b BL1.1 TR1.1
	git merge BL1.2

However this seems to merge all of 3, 4, and 5, as well as A, B, and C.

I think I can (barely) understand why it's doing what it's doing, but
that's not what I want it to do.  However it looks like Git doesn't have
the same idea of a branch "base" point as I think I do.

Running "git log TR1.2..BL1.2" does show me exactly the changes I wish
to propagate, but "git merge TR1.2..BL1.2" says "not something we can
merge".  Sigh.

How can I get it to merge just the changes from the "base" of the BL1.2
branch to its head?

Is using either git-cherry-pick or "git log -p | git-am", the only way
to do this?  Which way best preserves Git's ability to realize if a
change has already been included on the target branch, if any?

Is this the kind of "problem" that drove the creators of Stacked-Git to
invent their tools?

Is there any way to get "git log --graph" (and/or gitk) to show me all
the branch heads, not just the current/specified one?

-- 
						Greg A. Woods

+1 416 218-0098                VE3TCP          RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>      Secrets of the Weird <woods@weird.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-29  3:21 "git merge" merges too much! Greg A. Woods
@ 2009-11-29  5:14 ` Jeff King
  2009-11-30 18:12   ` Greg A. Woods
  2009-11-29  5:15 ` "git merge" merges too much! Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-11-29  5:14 UTC (permalink / raw)
  To: The Git Mailing List

On Sat, Nov 28, 2009 at 10:21:25PM -0500, Greg A. Woods wrote:

> Hope I've found the right list on which to ask potentially naive
> questions!  I've been doing _lots_ of reading about Git, but I can't
> seem to find anything about the problems I relate below.

Yep, you're in the right place.

> master branch to represent release points -- is there any way to get
> "git log" to show which tags are associated with a given commit and/or

Try "git log --decorate".

>                                          BL1.2 - A - B - C  <- BL1.2 HEAD
>                                         /
> master 1 - 2 - TR1.1 - 3 - 4 - 5 - TR1.2  <- master HEAD
>
> [...]
> 
> 	git checkout -b BL1.1 TR1.1
> 	git merge BL1.2
> 
> However this seems to merge all of 3, 4, and 5, as well as A, B, and C.
> 
> I think I can (barely) understand why it's doing what it's doing, but
> that's not what I want it to do.  However it looks like Git doesn't have
> the same idea of a branch "base" point as I think I do.

Yes. Git doesn't really view history as branches in the way you are
thinking. History is simply a directed graph, and when you merge two
nodes in the graph, it takes into account _everything_ that happened
to reach those two points since the last time they diverged (which in
your case is simply TR1.1, as BL1.2 is a strict superset).

There is no way in the history graph to represent "we have these
commits, but not this subsequence". You have to create new commits A',
B', and C' which introduce more or less the same changes as their
counterparts (and they may even be _exactly_ the same except for the
parentage, but then again, they may not if the changes they make do not
apply in the same way on top of TR1.1).

> Running "git log TR1.2..BL1.2" does show me exactly the changes I wish
> to propagate, but "git merge TR1.2..BL1.2" says "not something we can
> merge".  Sigh.
> 
> How can I get it to merge just the changes from the "base" of the BL1.2
> branch to its head?
> 
> Is using either git-cherry-pick or "git log -p | git-am", the only way
> to do this?  Which way best preserves Git's ability to realize if a
> change has already been included on the target branch, if any?

Yes, you must cherry-pick or use rebase (which is a more featureful
version of the pipeline you mentioned). Either way will produce an
equivalent set of commits (cherry-pick is useful when you are picking a
couple of commits; rebase is useful for rewriting a whole stretch of
history. It sounds like you want to do the latter).

The resulting commits will have different commit ids, but git generally
does a good job at merging such things, because it looks only at the
result state and not the intermediate commits.  If both sides have made
an equivalent change, then there is no conflict.

> Is there any way to get "git log --graph" (and/or gitk) to show me all
> the branch heads, not just the current/specified one?

Try "--all" with either gitk or "git log". Or if you want a subset of
heads, just name them.

Hope that helps,
-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-29  5:14 ` Jeff King
@ 2009-11-30 18:12   ` Greg A. Woods
  2009-11-30 19:22     ` Dmitry Potapov
  2009-12-02 20:09     ` Jeff King
  0 siblings, 2 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-11-30 18:12 UTC (permalink / raw)
  To: Jeff King; +Cc: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 3740 bytes --]

Thank you very much for confirming my understanding of why "git merge"
was merging more changes than I had desired it to merge.

The way "git merge" works does concern me somewhat though as I try to
figure out how I might use "topic" branches to develop local features
and then merge them onto each supported release branch.  Some guides
I've read suggest this methodology, but I'm sure how well this will
work, even when the remote project uses release branches to manage
official releases.  Perhaps I really should look at StGit, but I'm not
sure about it either.

At Sun, 29 Nov 2009 00:14:27 -0500, Jeff King <peff@peff.net> wrote:
Subject: Re: "git merge" merges too much!
> 
> On Sat, Nov 28, 2009 at 10:21:25PM -0500, Greg A. Woods wrote:
> > 
> > master branch to represent release points -- is there any way to get
> > "git log" to show which tags are associated with a given commit and/or
> 
> Try "git log --decorate".

Excellent!  That's exactly what I was looking for.

(From a first pass through the documentation I would never have guessed
that "tags" were also a form of "refs".  All these different names for
things in the Git vs.  many other VCS's, like "ref names" _really_
confusing to anyone like me with too much experience using those other
revision control systems.  Part of the problem is that Git documentation
seems to be two-or-more minded about several of its concepts and
features.  Even the gitglossary(7) is somewhat inconsistent on how it
uses "ref" and "refs".  Perhaps all that's needed is some firm editing
and clean-up of the manuals and documentation by a good strong technical
editor.)

> Yes, you must cherry-pick or use rebase (which is a more featureful
> version of the pipeline you mentioned).

"git rebase" will not work for me unless it grows a "copy" option ,
i.e. one which does not delete the original branch (i.e. avoids the
"reset" phase of its operation).  This option would likely only make
sense when used with the "--onto" option, I would guess.

"git rebase -i" would certainly give me all the control I could possibly
want when copying changes from branch to branch.

It likely wouldn't make sense to base this new "copy" feature directly
on "git rebase" though, especially in light of all the warnings about
how "git rebase" isn't friendly when applied to already published
branches.  I think in theory this "copy" feature won't cause problems
for already-published branches.

Perhaps it should be a whole new top-level command such as "git copy",
but then it's so much like "git merge" I'm not sure....  Ideally if "git
merge" could be taught how to work with just the changes from the base
of a branch then it would do what I'm looking for directly.

> The resulting commits will have different commit ids, but git generally
> does a good job at merging such things, because it looks only at the
> result state and not the intermediate commits.  If both sides have made
> an equivalent change, then there is no conflict.

> > Is there any way to get "git log --graph" (and/or gitk) to show me all
> > the branch heads, not just the current/specified one?
> 
> Try "--all" with either gitk or "git log". Or if you want a subset of
> heads, just name them.

Awsome!  Those options provide just what I wanted to see!

(git-log(1) is worse than ls(1) for having too many options, but worst
of all in the release I'm still using it doesn't respond sensibly nor
consistently with other commands when given the "-?" option.)

-- 
						Greg A. Woods

+1 416 218-0098                VE3TCP          RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>      Secrets of the Weird <woods@weird.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-30 18:12   ` Greg A. Woods
@ 2009-11-30 19:22     ` Dmitry Potapov
  2009-12-01 18:52       ` Greg A. Woods
  2009-12-02 20:09     ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Potapov @ 2009-11-30 19:22 UTC (permalink / raw)
  To: The Git Mailing List; +Cc: Jeff King

On Mon, Nov 30, 2009 at 01:12:31PM -0500, Greg A. Woods wrote:
> 
> The way "git merge" works does concern me somewhat though as I try to
> figure out how I might use "topic" branches to develop local features
> and then merge them onto each supported release branch.

The basic idea of using topic branches is development is done on
separate branches are merged to the release branch only when they are
ready to be released. These branches are based on the oldest branch in
what they may be included. It means that fixes are normally based on the
stable branch and new feature are based on the master branch, i.e. the
branch that contains changes for the next new feature release. Not all
branches got merged immediately into master. For instance, the git
project has 'pu' (proposed updates) and 'next' branches. Only when a new
feature proved itself to be useful and reliable, it is "graduated" to
the master branch. Thus the master branch is rather stable and it is
released on regular intervals (no need for a long stabilization period).

The key difference comparing to what you may got used is that branches
are normally based on the oldest branch in what this feature may be
included. Thus normally changes are not backported to old branches,
because you can merge them directly.

> > Yes, you must cherry-pick or use rebase (which is a more featureful
> > version of the pipeline you mentioned).
> 
> "git rebase" will not work for me unless it grows a "copy" option ,
> i.e. one which does not delete the original branch (i.e. avoids the
> "reset" phase of its operation).

There is no reset phase... It is just reassigning the head of branch to
point to a different commit-id. If you want to copy a branch instead of
rebasing the old one, you create a new branch (a new name) that points
to the same commit as the branch that you want to copy, after that you
rebase this new branch. You can do that like this:

$ git branch new-foo foo

$ git rebase --onto newbase oldbase new-foo

> It likely wouldn't make sense to base this new "copy" feature directly
> on "git rebase" though, especially in light of all the warnings about
> how "git rebase" isn't friendly when applied to already published
> branches.  I think in theory this "copy" feature won't cause problems
> for already-published branches.

The "copy" does not have the problem of rebase, but it has a different
problem: You have two series of commits instead of one. If you found
a bug in one of those commits, you will have to patch each series
separately. Also, git merge may produce additional conflicts... So,
copying commits is not something that I would recommend to do often.

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-30 19:22     ` Dmitry Potapov
@ 2009-12-01 18:52       ` Greg A. Woods
  2009-12-01 20:50         ` Dmitry Potapov
  2009-12-02 10:20         ` Nanako Shiraishi
  0 siblings, 2 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-12-01 18:52 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4904 bytes --]

At Mon, 30 Nov 2009 22:22:12 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: "git merge" merges too much!
> 
> The key difference comparing to what you may got used is that branches
> are normally based on the oldest branch in what this feature may be
> included. Thus normally changes are not backported to old branches,
> because you can merge them directly.

Hmmm... the idea of creating topic branches based on the oldest branch
where the feature might be used is indeed neither intuitive, nor is it
mentioned anywhere I've so far read about using topic branches in Git.

To use topic branches effectively this way, especially in managing local
and custom changes to a large remote project where separate working
directories are needed for long-running builds, I think some additional
software configuration management tool must be used to create
"configuration" branches where all the desired change sets (topic
branches) are merged.

I spent half my dreaming time early this morning running through
scenarios of how to use topic branches, with true merging (not
re-basing), in a usable work-flow.

At the moment I'm leaning towards a process where the configuration
branch is re-created for every build -- i.e. the merges are redone from
every topic branch to a freshly configured branch forked from the
locally supported release branch, hopefully making use of git-rerere to
solve most conflicts in as automated a fashion as is possible.

This may not be a sane thing to do though -- it may be too much work to
do for every fix.  It somewhat goes against the current natural trend in
many of the projects I work on to develop changes on the trunk and then
back-port (some of) them to release branches.

Perhaps Stacked-Git really is the best answer.  I will have to
investigate more.

> > > Yes, you must cherry-pick or use rebase (which is a more featureful
> > > version of the pipeline you mentioned).
> > 
> > "git rebase" will not work for me unless it grows a "copy" option ,
> > i.e. one which does not delete the original branch (i.e. avoids the
> > "reset" phase of its operation).
> 
> There is no reset phase...

By "reset phase" I meant this part, from git-rebase(1):

       The current branch is reset to <upstream>, or <newbase> if the --onto
       option was supplied. This has the exact same effect as git reset --hard
       <upstream> (or <newbase>).

> It is just reassigning the head of branch to
> point to a different commit-id. If you want to copy a branch instead of
> rebasing the old one, you create a new branch (a new name) that points
> to the same commit as the branch that you want to copy, after that you
> rebase this new branch. You can do that like this:
> 
> $ git branch new-foo foo
> 
> $ git rebase --onto newbase oldbase new-foo

Hmmm.... I'll have to think about that.  It makes some sense, but I
don't intuitively read the command-line parameters well enough to
predict the outcome in all of the scenarios I'm interested in.

what is "oldbase" there?  I'm guessing it means "base of foo" (and for
the moment, "new-foo" too)?

It's confusing because the manual page uses the word "upstream" to
describe this parameter.

From my experiments it looks like what I might want to do to copy a
local branch to port its changes from one release branch to another is
something like this (where local-v2.0 is a branch with local changes
forked from release branch REL-v2.0, and I want to back-port these
changes to a new local branch forked from the release branch REL-v1.0):

	$ git branch local-base-v1.0 REL-v1.0	# mark base of new branch
	$ git branch local-v1.0 local-v2.0	# dup head of src branch
	$ git rebase --onto local-base-v1.0 REL-v2.0 local-v1.0
	$ git branch -d local-base-v1.0

The first and last steps may not be necessary if REL-v1.0 really is a
branch, but in my play project it is just a tag on the trunk.  In the
case that it were really already a branch then hopefully this would do:

	$ git branch local-v1.0 local-v2.0	# dup head of src branch
	$ git rebase --onto REL-v1.0 REL-v2.0 local-v1.0

The trick here seems to be to invent the name of the new branch based on
where it's going to be rebased to.

I think this does suffice very nicely as a "git copy" operation!

> The "copy" does not have the problem of rebase, but it has a different
> problem: You have two series of commits instead of one. If you found
> a bug in one of those commits, you will have to patch each series
> separately. Also, git merge may produce additional conflicts... So,
> copying commits is not something that I would recommend to do often.

Indeed.

-- 
						Greg A. Woods

+1 416 218-0098                VE3TCP          RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>      Secrets of the Weird <woods@weird.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-12-01 18:52       ` Greg A. Woods
@ 2009-12-01 20:50         ` Dmitry Potapov
  2009-12-01 21:58           ` Greg A. Woods
  2009-12-02 10:20         ` Nanako Shiraishi
  1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Potapov @ 2009-12-01 20:50 UTC (permalink / raw)
  To: The Git Mailing List

On Tue, Dec 01, 2009 at 01:52:18PM -0500, Greg A. Woods wrote:
> At Mon, 30 Nov 2009 22:22:12 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Subject: Re: "git merge" merges too much!
> > 
> > The key difference comparing to what you may got used is that branches
> > are normally based on the oldest branch in what this feature may be
> > included. Thus normally changes are not backported to old branches,
> > because you can merge them directly.
> 
> Hmmm... the idea of creating topic branches based on the oldest branch
> where the feature might be used is indeed neither intuitive, nor is it
> mentioned anywhere I've so far read about using topic branches in Git.

Most things that we consider "intuitive" are those that we got used to.
Git is different in many aspect than other VCSes (such as CVS/SVN), and
the workflow that good for those VCSes may not be optimal for Git. There
is a good description that provide basic knowledge how to use Git:

man gitworkflows

or online:

http://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html

If you do not base your changes on the oldest branch then you will not
be able to merge changes, which implies you will have to cherry-pick
manually without ability automatic to track what changes were merged
and what were not, this is a recipe for a disaster...

> At the moment I'm leaning towards a process where the configuration
> branch is re-created for every build -- i.e. the merges are redone from
> every topic branch to a freshly configured branch forked from the
> locally supported release branch, hopefully making use of git-rerere to
> solve most conflicts in as automated a fashion as is possible.

I am not quite sure that I fully understood your idea of configuration
branches, but I want to warn you about one serious limitations of
git-rerere -- it stores conflict resolution per-file basis. This means
that if resolution of some conflict implies some change to another file
then git-rerere will not help you here. So, it handles maybe 80-90%
cases, but not all of them.

> 
> Perhaps Stacked-Git really is the best answer.  I will have to
> investigate more.

There is also TopGit. I have never used any of them, but if you are
interested in patch management system, you probably should look at both
of them. StGit is modelled after quilt, while TopGit is aimed to be
better integrated with Git and better fit to work in distributed
environment. But as I said, I do not have any first hand experience
with any of them. (Personally, I would look at TopGit first, but maybe
I am biased here).

> > 
> > $ git branch new-foo foo
> > 
> > $ git rebase --onto newbase oldbase new-foo
> 
> Hmmm.... I'll have to think about that.  It makes some sense, but I
> don't intuitively read the command-line parameters well enough to
> predict the outcome in all of the scenarios I'm interested in.
> 
> what is "oldbase" there?  I'm guessing it means "base of foo" (and for
> the moment, "new-foo" too)?

You have:

 o---o---o---o---o  newbase
       \
        o---o---o---o---o  oldbase
                         \
                          o---o---o  foo

and you want this:

 o---o---o---o---o  newbase
     |            \
     |             o´--o´--o´  new-foo
      \
       o---o---o---o---o  oldbase
                         \
                          o---o---o  foo

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-12-01 20:50         ` Dmitry Potapov
@ 2009-12-01 21:58           ` Greg A. Woods
  2009-12-02  0:22             ` Dmitry Potapov
  0 siblings, 1 reply; 33+ messages in thread
From: Greg A. Woods @ 2009-12-01 21:58 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 5801 bytes --]

At Tue, 1 Dec 2009 23:50:57 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: "git merge" merges too much!
> 
> > > 
> > > $ git branch new-foo foo
> > > 
> > > $ git rebase --onto newbase oldbase new-foo
> > 
> > Hmmm.... I'll have to think about that.  It makes some sense, but I
> > don't intuitively read the command-line parameters well enough to
> > predict the outcome in all of the scenarios I'm interested in.
> > 
> > what is "oldbase" there?  I'm guessing it means "base of foo" (and for
> > the moment, "new-foo" too)?
> 
> You have:
> 
>  o---o---o---o---o  newbase
>        \
>         o---o---o---o---o  oldbase
>                          \
>                           o---o---o  foo

Yes, sort of -- in the ideal situation, but not in my particular example
where "oldbase" is just a tag, not a real branch.

So yes, "oldbase" is in fact "base of foo".  Trickier still is when the
"oldbase" branch has one or more commits newer then "base of foo".  Does
Git not have a symbolic name for the true base of a branch?  I.e. is
there not some form of symbolic name for "N" in the following?

   o---o---o---o---o---o---o---o  master
            \
             o---o---N---o---o  release-1
                      \
                       o---o---o  local-release-1

(now of course if it is discovered that "release-1" has progressed since
the base of "foo" then "foo" should be rebased first, but perhaps there
is not time to do this before the other release has to be supported)

> and you want this:
> 
>  o---o---o---o---o  newbase
>      |            \
>      |             o'--o'--o'  new-foo
>       \
>        o---o---o---o---o  oldbase
>                          \
>                           o---o---o  foo

Yes, sort of I suppose, if you trim all the non-relevant branches.

What I really want, I think, is something like this where at least the
non-relevant "master" branch is still shown:

                   1'--2'--3'  new-foo
                  /
         o---o---o  newbase
        /
   o---o---o---o---o---o---o---o  master
                \
                 o---o---o  oldbase
                          \
                           1---2---3  foo

Here's part of my confusion -- "newbase" as used above is actually older
than "oldbase".  :-) so ideally "oldbase" should always be described in
terms of "foo", not just given an arbitrary unrelated name.

Of course that doesn't rule out the following scenario either where
"newbase" really is newer than "oldbase" -- in my world a given project
might become locally supported first on either a newer release, or an
older release, so both above and below might happen:

                           1'--2'--3'  new-foo
                          /
                 o---o---o  newbase
                /
   o---o---o---o---o---o---o---o  master
        \
         o---o---o  oldbase
                  \
                   1---2---3  foo

And eventually I want to also merge whatever is still relevant from foo
to a "local" branch off master so that those changes can be sent
(usually as patches) upstream.

Sometimes I want to do development on a topic branch as close to the tip
of "master" so that it can most easily be pushed upstream, and then
back-port those changes to older release branches.

In fact the latter is exactly how I picture release branches to work in
normal development, and this is how several of the big projects I'd like
to get using Git are doing development (now usually with CVS).

Note too that in these kinds of projects "topic" branches are _always_
forked from the current tip of "master", long-running ones sometimes
rebased to keep up with "master", small fixes and changes are made
directly to the master branch; and small fixes, as well as relevant
features, sometimes those developed on "topic" branches, are back-ported
to release branches.

Note I'm not talking about ideals of best practises specific for Git
here -- I'm talking about actual working operational practises that
people are _very_ familiar with and which have been well proven using a
vast wide variety of different VCS's in the past.  For example I
seriously doubt any of the developers of the projects I'm thinking of
that I'd like to switch to using Git are ever going to want to fork
their topic branches from the oldest release branch base that they
intend to support, and many such projects will necessarily always have
at least a few long-running topic branches that will have to be
frequently rebased to keep up with the trunk so that their eventual
merging will go as smoothly as possible, and yet once any of these topic
branches is finally "closed" their changes may also have to be
back-ported to release branches.

To me the natural way to do these kinds of back-porting "merges" is to
restrict the merge to select only the commits on the branch, i.e. from
its base to its tip, thus the motivation for the topic of my thread (and
I think the motivation for the "What is the best way to backport a
feature?" thread as well).  I think if Git could do this kind of
"partial" merging directly without having to "copy" deltas with "rebase"
or "cherry-pick" or "am" or whatever, and thus create separate histories
for them, then it would be much better at supporting this traditional
practice of using branches to manage releases.  Without such ability it
truly does look as though some form of "patch" management tool is also a
necessary thing(evil?), as "rebase" and "cherry-pick" could quickly get
way out of control and be way too much work otherwise.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-12-01 21:58           ` Greg A. Woods
@ 2009-12-02  0:22             ` Dmitry Potapov
  0 siblings, 0 replies; 33+ messages in thread
From: Dmitry Potapov @ 2009-12-02  0:22 UTC (permalink / raw)
  To: The Git Mailing List

On Tue, Dec 01, 2009 at 04:58:34PM -0500, Greg A. Woods wrote:
> At Tue, 1 Dec 2009 23:50:57 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Subject: Re: "git merge" merges too much!
> > 
> > > > 
> > > > $ git branch new-foo foo
> > > > 
> > > > $ git rebase --onto newbase oldbase new-foo
> > > 
> > > Hmmm.... I'll have to think about that.  It makes some sense, but I
> > > don't intuitively read the command-line parameters well enough to
> > > predict the outcome in all of the scenarios I'm interested in.
> > > 
> > > what is "oldbase" there?  I'm guessing it means "base of foo" (and for
> > > the moment, "new-foo" too)?
> > 
> > You have:
> > 
> >  o---o---o---o---o  newbase
> >        \
> >         o---o---o---o---o  oldbase
> >                          \
> >                           o---o---o  foo
> 
> Yes, sort of -- in the ideal situation, but not in my particular example
> where "oldbase" is just a tag, not a real branch.

It does not matter whether it is tag or branch or just SHA-1. You can
use any two reference as newbase and oldbase. They specify two points
in DAG. The only thing that has to be a branch in my example is new-foo.

> 
> So yes, "oldbase" is in fact "base of foo".  Trickier still is when the
> "oldbase" branch has one or more commits newer then "base of foo".  Does
> Git not have a symbolic name for the true base of a branch?  I.e. is
> there not some form of symbolic name for "N" in the following?
> 
>    o---o---o---o---o---o---o---o  master
>             \
>              o---o---N---o---o  release-1
>                       \
>                        o---o---o  local-release-1

You can always find SHA-1 for N using the following command:

  git merge-base release-1 local-release-1

but you do not have to do that to rebase your changes. You just can run:

   # create a copy of local-release-1, so it will not disappear
   git branch copy-release-1 local-release-1

   # rebase the branch to master
   git rebase --onto master release-1 copy-release-1


Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-12-01 18:52       ` Greg A. Woods
  2009-12-01 20:50         ` Dmitry Potapov
@ 2009-12-02 10:20         ` Nanako Shiraishi
  1 sibling, 0 replies; 33+ messages in thread
From: Nanako Shiraishi @ 2009-12-02 10:20 UTC (permalink / raw)
  To: The Git Mailing List; +Cc: Dmitry Potapov

Quoting "Greg A. Woods" <woods@planix.com> writes:

> At Mon, 30 Nov 2009 22:22:12 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Subject: Re: "git merge" merges too much!
>> 
>> The key difference comparing to what you may got used is that branches
>> are normally based on the oldest branch in what this feature may be
>> included. Thus normally changes are not backported to old branches,
>> because you can merge them directly.
>
> Hmmm... the idea of creating topic branches based on the oldest branch
> where the feature might be used is indeed neither intuitive, nor is it
> mentioned anywhere I've so far read about using topic branches in Git.

You may want to add the result of googling 

  "Fun with" site:gitster.livejournal.com

to the list of Git documents you read. "Fork from the oldest 
branch" is one of the techniques Junio teaches often and many 
of his other techiniques are built upon.

He not just teaches useful techniques but explains a lot about 
the reasoning behind them in his Git book. His blog articles 
have the same explanations on many topics I saw in his book 
but not in other places. It is a useful substitute until his 
book gets translated to English for people who don't read 
Japanese.

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-30 18:12   ` Greg A. Woods
  2009-11-30 19:22     ` Dmitry Potapov
@ 2009-12-02 20:09     ` Jeff King
  2009-12-03  1:21       ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-02 20:09 UTC (permalink / raw)
  To: The Git Mailing List

On Mon, Nov 30, 2009 at 01:12:31PM -0500, Greg A. Woods wrote:

> (From a first pass through the documentation I would never have guessed
> that "tags" were also a form of "refs".  All these different names for

I find git is much simpler to use and understand if you start "at the
bottom" with the basic concepts (because for the most part, git is
really a set of tools for manipulating the few basic data structures).
For a short intro, try:

  http://eagain.net/articles/git-for-computer-scientists/

I think Scott Chacon's "Pro Git" book also takes a similar approach, but
I confess that I have not actually read it carefully. At this point, I
know enough about git to make reading it not very interesting. :) You
can find it online at:

  http://progit.org/book/

> features.  Even the gitglossary(7) is somewhat inconsistent on how it
> uses "ref" and "refs".  Perhaps all that's needed is some firm editing
> and clean-up of the manuals and documentation by a good strong technical
> editor.)

I skimmed it and didn't see any inconsistency. If you have something
specific in mind, please point it out so we can fix it.

> "git rebase" will not work for me unless it grows a "copy" option ,
> i.e. one which does not delete the original branch (i.e. avoids the
> "reset" phase of its operation).  This option would likely only make
> sense when used with the "--onto" option, I would guess.

I think Dmitry already mentioned this, but you probably want to create a
new branch to hold your rebased history if you don't want to modify the
existing branch.

> (git-log(1) is worse than ls(1) for having too many options, but worst
> of all in the release I'm still using it doesn't respond sensibly nor
> consistently with other commands when given the "-?" option.)

$ ls -?
ls: invalid option -- '?'
Try `ls --help' for more information.

$ ls --help ;# or ls -h
[copious usage information]

$ git log -?
fatal: unrecognized argument: -?

$ git log --help
[the man page]

$ git log -h
usage: git log [<options>] [<since>..<until>] [[--] <path>...]
   or: git show [options] <object>...

$ cd /outside/of/git/repo
$ git log -?
fatal: Not a git repository (or any of the parent directories): .git

So "-?" is bogus for both ls and git. But there are two failings I see:

  1. Outside of a repository, "git log" does not even get to the
     argument-parsing phase to see that "-?" is bogus. We short-circuit
     "-h" and "--help" to avoid actually looking for a git repository,
     but obviously cannot do so for every "--bogus" argument we see.
     We could potentially also short-circuit "-?" (and probably map it
     to "-h" if we were going to do that). However, I didn't think "-?"
     was in common use.

  2. "git log -h" doesn't mention any of the options specifically,
     though other git commands do (e.g., try "git archive -h"). This is
     because the option list is generated by our parseopt library, but
     the revision and diff options (which are the only ones that "git
     log" takes) do not use parseopt. Maybe we should point to "--help"
     for the full list in that case.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency (was: "git merge" merges too much!)
  2009-12-02 20:09     ` Jeff King
@ 2009-12-03  1:21       ` Greg A. Woods
  2009-12-03  1:34         ` Git documentation consistency Junio C Hamano
  2009-12-03  2:07         ` Git documentation consistency (was: "git merge" merges too much!) Jeff King
  0 siblings, 2 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-12-03  1:21 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 3500 bytes --]

At Wed, 2 Dec 2009 15:09:04 -0500, Jeff King <peff@peff.net> wrote:
Subject: Re: "git merge" merges too much!
> 
> I find git is much simpler to use and understand if you start "at the
> bottom" with the basic concepts (because for the most part, git is
> really a set of tools for manipulating the few basic data structures).

I think that's the problem actually -- I don't really want to know too
much about how it works under the hood (yet), I just want to use it in
the most effective way for my purposes.

There's lots of talk about using Git as the basis for a true high-level
VCS and SCM system, yet it doesn't look to me that anyone has created
such a VCS or SCMS using Git.

> I skimmed it and didn't see any inconsistency. If you have something
> specific in mind, please point it out so we can fix it.

I think anyone who's been participating on this list for any significant
amount of time is far too close to the subject to be able to serve as a
candid independent technical editor who could really help clean things
up and make the documentation much more consistent.  Obviously such an
editor would also require the help of experts at all the details too. :-)

Unfortunately I'm not a very good technical editor, and I don't really
have time to devote to doing such editing of documentation either.

> > (git-log(1) is worse than ls(1) for having too many options, but worst
> > of all in the release I'm still using it doesn't respond sensibly nor
> > consistently with other commands when given the "-?" option.)
> 
> $ ls -?
> ls: invalid option -- '?'
> Try `ls --help' for more information.

Please keep in mind all the world is not GNU:

	$ ls -?
	ls: unknown option -- ?
	usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...]

My point was that _most_ other Git sub-commands already do respond to
"-?" sensibly with real, helpful, information; usually a summary of the
command options and parameters.

I.e. this is yet another form of inconsistency in "documentation" in
Git.  :-)

The reference to "ls" was just as a comparison with it's somewhat
extensive variety of options.  In fact "git log" is way more complex
than "ls" because its parameters are not all just simple flags like
those for "ls" -- they often have their own parameters too.

> $ git log -h
> usage: git log [<options>] [<since>..<until>] [[--] <path>...]
>    or: git show [options] <object>...

Indeed, so why the heck can't it do something similar with '-?'.  That's
just sloppy programming, no?  Most other commands know '-?', and despite
the silliness with GNU Ls, use of '-?' to request summary usage
information is pretty much a de facto standard for unix commands.

Your point about mentioning "--help" in the summary usage information is
a good one though -- especially for a command with a very complex set of
command-line parameters.  However that alone isn't sufficient -- users
still need the summary as that alone helps trigger associations and may
be sufficient to allow a user to proceed quickly to get the command to
do what they want.

(the whole "fatal: not a git repository" error for "git foo -[h?]"
handling is also a rather silly one -- but I guess when something grows
quickly and from many inputs there's not always time to keep some of
these basic things clean and consistent)

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency
  2009-12-03  1:21       ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods
@ 2009-12-03  1:34         ` Junio C Hamano
  2009-12-03  7:22           ` Greg A. Woods
  2009-12-03  2:07         ` Git documentation consistency (was: "git merge" merges too much!) Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-12-03  1:34 UTC (permalink / raw)
  To: Greg A. Woods; +Cc: The Git Mailing List

"Greg A. Woods" <woods@planix.com> writes:

> 	$ ls -?
> 	ls: unknown option -- ?
> 	usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...]
> ...
> Most other commands know '-?', and despite
> the silliness with GNU Ls, use of '-?' to request summary usage
> information is pretty much a de facto standard for unix commands.

I think you are showing ignorance here, as -? is *not* even close to
standard, nor even widely used practice at all.  I somehow doubt your ls
would respond to "ls -X" any differently from "ls -?", but is giving the
same canned response to any unknown option.

The "usage: ls [-AaBbC...] [file...]" indeed is much better than abstract
"usage: frotz <options> <args>" that does not list what <options> are, but
that is a totally different thing.  On that point, I think Peff already
made a good suggestion of giving the full help text in such a case.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency
  2009-12-03  1:34         ` Git documentation consistency Junio C Hamano
@ 2009-12-03  7:22           ` Greg A. Woods
  2009-12-03  7:45             ` Jeff King
  2009-12-03 16:22             ` Marko Kreen
  0 siblings, 2 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-12-03  7:22 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]

At Wed, 02 Dec 2009 17:34:01 -0800, Junio C Hamano <gitster@pobox.com> wrote:
Subject: Re: Git documentation consistency
> 
> I think you are showing ignorance here, as -? is *not* even close to
> standard, nor even widely used practice at all.

I think I should know something about Unix command line and option
parsers, having used them for some 25 years or so now.  In fact I've
used most every kind of unix that ever was, and I've worked on the
source to more than a few.

>  I somehow doubt your ls
> would respond to "ls -X" any differently from "ls -?", but is giving the
> same canned response to any unknown option.

Indeed.  Whether it is useful for "-?" to give a more detailed summary
than the basic one-line usage message often depends on how complex the
command-line features are.  My preference is for '-?' (and if possible
'-h' as well) to give detailed usage info, though hopefully limited to
just one screen full if possible (though with Git's free use of $PAGER
in many contexts it might be OK to feed longer usage info to $PAGER
too).

So, the basic usage messages for command-line "syntax" problems should
be a one-line summary (following the error message).

If more detail could be useful then '-?' (and if possible '-h') should
display a more detailed usage message.  (and if one line suffices then
'-?' and '-h' don't really have to be treated specially)

> The "usage: ls [-AaBbC...] [file...]" indeed is much better than abstract
> "usage: frotz <options> <args>" that does not list what <options> are, but
> that is a totally different thing.  On that point, I think Peff already
> made a good suggestion of giving the full help text in such a case.

Indeed the abstract content-free "<options>" style message is almost
useless in most cases.

It would also be useful to reflect in the usage message what the driver
program saw as its argv[0], not what the internal command saw:

$ git rebase -X
Usage: /usr/pkg/libexec/git-core//git-rebase [--interactive | -i] [-v] [--onto <newbase>] <upstream> [<branch>]

Or perhaps this could be cleaned up with something as simple as always
stripping off the pathname with the likes of:

     argv0 = (argv0 = strrchr(argv[0], '/')) ? argv0 + 1 : argv[0];

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency
  2009-12-03  7:22           ` Greg A. Woods
@ 2009-12-03  7:45             ` Jeff King
  2009-12-03 15:24               ` Uri Okrent
  2009-12-03 16:22             ` Marko Kreen
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2009-12-03  7:45 UTC (permalink / raw)
  To: The Git Mailing List

On Thu, Dec 03, 2009 at 02:22:41AM -0500, Greg A. Woods wrote:

> > I think you are showing ignorance here, as -? is *not* even close to
> > standard, nor even widely used practice at all.
> 
> I think I should know something about Unix command line and option
> parsers, having used them for some 25 years or so now.  In fact I've
> used most every kind of unix that ever was, and I've worked on the
> source to more than a few.

I don't want to nitpick, because the main thrust of your point (that
"git foo --bogus" should show a more useful message) is not affected by
this subpoint at all.  But if we are considering special-casing "-?", I
would like to see some evidence that it is actually in use.

I can't seem to find it respected anywhere, as shown below (and note
that yes, some of this output does show a useful help message, but that
is because --bogus would show the same message, and I am not disputing
that we should handle that case):

# My linux box
$ uname -sr
Linux 2.6.31-1-686
$ ls -?
ls: invalid option -- '?'
Try `ls --help' for more information.

# Solaris
$ uname -sr
SunOS 5.8
$ /bin/ls -?
/bin/ls: illegal option -- ?
usage: ls -1RaAdCxmnlogrtucpFbqisfL [files]
$ /usr/ucb/ls -? ;# appears to ignore bogus options entirely?
foo
$ /usr/ucb/fold -?
/usr/ucb/fold: illegal option -- ?
Usage: fold [-bs] [-w width | -width ] [file...]
$ /usr/xpg4/bin/ls -?
/usr/xpg4/bin/ls: illegal option -- ?
usage: ls -1RaAdCxmnlogrtucpFbqisfL [files]
$ /usr/xpg6/bin/ls -?
bash: /usr/xpg6/bin/ls: No such file or directory

# AIX
$ uname -svr
AIX 1 6
$ /bin/ls -?
/bin/ls: illegal option -- ?
usage: ls [-1ACFHLNRabcdefgilmnopqrstuxEUX] [File...]

So what systems _do_ treat "-?" specially?

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency
  2009-12-03  7:45             ` Jeff King
@ 2009-12-03 15:24               ` Uri Okrent
  0 siblings, 0 replies; 33+ messages in thread
From: Uri Okrent @ 2009-12-03 15:24 UTC (permalink / raw)
  To: Jeff King; +Cc: The Git Mailing List

On Wed, Dec 2, 2009 at 11:45 PM, Jeff King <peff@peff.net> wrote:
> So what systems _do_ treat "-?" specially?
>
> -Peff

Windows seems to (of course you need to use '/' instead of '-'):

Microsoft Windows [Version 6.0.6001]

c:\>dir -h
 Volume in drive C has no label.
 Volume Serial Number is B09A-B49F

 Directory of c:\

File Not Found

c:\>dir /h
Invalid switch - "h".

c:\>dir /help
Invalid switch - "help".

c:\>dir /?
Displays a list of files and subdirectories in a directory.

DIR [drive:][path][filename] [/A[[:]attributes]] [/B] [/C] [/D] [/L] [/N]
  [/O[[:]sortorder]] [/P] [/Q] [/R] [/S] [/T[[:]timefield]] [/W] [/X] [/4]

  [drive:][path][filename]
              Specifies drive, directory, and/or files to list.

  /A          Displays files with specified attributes.
  attributes   D  Directories                R  Read-only files
               H  Hidden files               A  Files ready for archiving
               S  System files               I  Not content indexed files
               L  Reparse Points             -  Prefix meaning not
  /B          Uses bare format (no heading information or summary).
  /C          Display the thousand separator in file sizes.  This is the
              default.  Use /-C to disable display of separator.
  /D          Same as wide but files are list sorted by column.
  /L          Uses lowercase.
  /N          New long list format where filenames are on the far right.
  /O          List by files in sorted order.
  sortorder    N  By name (alphabetic)       S  By size (smallest first)
               E  By extension (alphabetic)  D  By date/time (oldest first)
               G  Group directories first    -  Prefix to reverse order
  /P          Pauses after each screenful of information.
  /Q          Display the owner of the file.
  /R          Display alternate data streams of the file.
  /S          Displays files in specified directory and all subdirectories.
  /T          Controls which time field displayed or used for sorting
  timefield   C  Creation
              A  Last Access
              W  Last Written
  /W          Uses wide list format.
  /X          This displays the short names generated for non-8dot3 file
              names.  The format is that of /N with the short name inserted
              before the long name. If no short name is present, blanks are
              displayed in its place.
  /4          Displays four-digit years

Switches may be preset in the DIRCMD environment variable.  Override
preset switches by prefixing any switch with - (hyphen)--for example, /-W.

c:\>

-- 
   Uri

Please consider the environment before printing this message.
http://www.panda.org/how_you_can_help/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency
  2009-12-03  7:22           ` Greg A. Woods
  2009-12-03  7:45             ` Jeff King
@ 2009-12-03 16:22             ` Marko Kreen
  2009-12-09 19:56               ` Greg A. Woods
  1 sibling, 1 reply; 33+ messages in thread
From: Marko Kreen @ 2009-12-03 16:22 UTC (permalink / raw)
  To: The Git Mailing List

On 12/3/09, Greg A. Woods <woods@planix.com> wrote:
> At Wed, 02 Dec 2009 17:34:01 -0800, Junio C Hamano <gitster@pobox.com> wrote:
>  Subject: Re: Git documentation consistency
>  > I think you are showing ignorance here, as -? is *not* even close to
>  > standard, nor even widely used practice at all.
>
>  I think I should know something about Unix command line and option
>  parsers, having used them for some 25 years or so now.  In fact I've
>  used most every kind of unix that ever was, and I've worked on the
>  source to more than a few.

'?' is what getopt(3) is supposed to return for unknown options.

-- 
marko

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency
  2009-12-03 16:22             ` Marko Kreen
@ 2009-12-09 19:56               ` Greg A. Woods
  0 siblings, 0 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-12-09 19:56 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]

At Thu, 3 Dec 2009 18:22:27 +0200, Marko Kreen <markokr@gmail.com> wrote:
Subject: Re: Git documentation consistency
> 
> On 12/3/09, Greg A. Woods <woods@planix.com> wrote:
> > At Wed, 02 Dec 2009 17:34:01 -0800, Junio C Hamano <gitster@pobox.com> wrote:
> >  Subject: Re: Git documentation consistency
> >  > I think you are showing ignorance here, as -? is *not* even close to
> >  > standard, nor even widely used practice at all.
> >
> >  I think I should know something about Unix command line and option
> >  parsers, having used them for some 25 years or so now.  In fact I've
> >  used most every kind of unix that ever was, and I've worked on the
> >  source to more than a few.
> 
> '?' is what getopt(3) is supposed to return for unknown options.

Indeed, which is why it cannot ever, in general, be used as a valid
option with some command-specific meaning, and so why the one-line form
of the usage message should always be displayed when the user explicitly
gives a '-?' option (i.e. in the same way as if any unknown option is
given).

If the one-line usage message is too terse to explain more complex
command line syntax then '-h' (and --help) should display a short
multi-line summary of the command's usage.

I guess what I should have suggested in the first place is that all Git
sub-commands should respond with a one-line[*] usage message when they
encounter an unknown option, such as '-?', and that they should (only)
display a more detailed multi-line help summary when given '-h' or
'--help' options.

I was most surprised when I didn't get a one-line usage summary from
"git log -?", just getopt(3)'s error message.

[*] commands which have multiple distinct modes of operation with
separate and unique command-line syntax for each mode, should of course
display a one-line summary for each command mode, such as in this
example:

void
usage(void)
{
	fprintf(stderr, "Usage:  %s [-abcdef]\n", getprogname());
	fprintf(stderr, "        %s [-l]\n", getprogname());
	exit(2);
}

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Git documentation consistency (was: "git merge" merges too much!)
  2009-12-03  1:21       ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods
  2009-12-03  1:34         ` Git documentation consistency Junio C Hamano
@ 2009-12-03  2:07         ` Jeff King
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff King @ 2009-12-03  2:07 UTC (permalink / raw)
  To: The Git Mailing List

On Wed, Dec 02, 2009 at 08:21:39PM -0500, Greg A. Woods wrote:

> > I find git is much simpler to use and understand if you start "at the
> > bottom" with the basic concepts (because for the most part, git is
> > really a set of tools for manipulating the few basic data structures).
> 
> I think that's the problem actually -- I don't really want to know too
> much about how it works under the hood (yet), I just want to use it in
> the most effective way for my purposes.
>
> There's lots of talk about using Git as the basis for a true high-level
> VCS and SCM system, yet it doesn't look to me that anyone has created
> such a VCS or SCMS using Git.

Sure, I can understand that. And I invite you (or anyone) to work on
such a VCS, and I am sure I am not alone on the list in sincerely hoping
you succeed and offering to help support however core git developers
can. But we have seen people try this in the past, and it never quite
seems to work.

All of the cogito people ended up migrating to git. I was one of them.
In my case, the git tools offered better access to the fundamental
operations, which is what I found interesting and powerful about it. I
suspect some others migrated for the same reasons, though perhaps many
did simply because cogito did not keep up with core git in terms of
features.

There is also "eg" these days, which attempts to do what you're saying.
I don't know how big a userbase it has; I've never been personally
interested in it.

> I think anyone who's been participating on this list for any significant
> amount of time is far too close to the subject to be able to serve as a
> candid independent technical editor who could really help clean things
> up and make the documentation much more consistent.  Obviously such an
> editor would also require the help of experts at all the details too. :-)

Sure, I think an outsider doing a really nice job of overhauling the
documentation would be nice. There are some git books, some by insiders,
and some not. For the same reason that you mention, it would hard for me
to assess their quality with too much objectivity. :)

My question was more of a "leaving aside overhauling the documentation,
did you see something obvious that we can fix right now" kind of thing.

> > $ ls -?
> > ls: invalid option -- '?'
> > Try `ls --help' for more information.
> 
> Please keep in mind all the world is not GNU:
> 
> 	$ ls -?
> 	ls: unknown option -- ?
> 	usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...]

Right, but my point is unchanged. Neither ls actually _recognizes_
"-?". They do the same for "--bogosity".

> My point was that _most_ other Git sub-commands already do respond to
> "-?" sensibly with real, helpful, information; usually a summary of the
> command options and parameters.

Yes, for the same reason that "ls" does: they don't recognize it. But if
you are asking for "git log" to produce the short usage message, then
that is part of my issue (1) from the last message: "log" doesn't use
the same parseopt library as (most of) the rest of git[1].

Yes, it's inconsistent. Those inconsistencies were introduced over time
(before we had parseopt), and we are slowly fixing them over time.
Patches welcome. :)

[1] There are other inconsistencies because of this, too. You can't say
"git log -pz", but must say "git log -p -z".

> > $ git log -h
> > usage: git log [<options>] [<since>..<until>] [[--] <path>...]
> >    or: git show [options] <object>...
> 
> Indeed, so why the heck can't it do something similar with '-?'.  That's
> just sloppy programming, no?  Most other commands know '-?', and despite
> the silliness with GNU Ls, use of '-?' to request summary usage
> information is pretty much a de facto standard for unix commands.

Nothing "knows" -?, but it is true that the parseopt-ified commands
behave differently (and IMHO, better). You can call it sloppy, I guess;
it is really an artifact of commands being written and changing over
time. As I said, we are slowly converging on consistency.

And no, I don't want to get into a big debate over whether it is better
to plan your software up front, or to let it evolve over time. I have
opinions that do not necessarily line up with how git came into being,
but the end product is useful enough that I like to use it and hack on
it. :)

> (the whole "fatal: not a git repository" error for "git foo -[h?]"
> handling is also a rather silly one -- but I guess when something grows
> quickly and from many inputs there's not always time to keep some of
> these basic things clean and consistent)

The problem here is that there are two chunks of code: the "git"
wrapper, and the "log" command. The wrapper knows a few things about
each command like "does it need to be in a git repository?", and checks
that before we even look at the command-line options. There is an
explicit "check for --help" hack. Fixing that startup procedure to be
more sane would be possible, but there are a lot of hidden demons
lurking in changing the order of the startup sequence. Again, patches
welcome. :)

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-29  3:21 "git merge" merges too much! Greg A. Woods
  2009-11-29  5:14 ` Jeff King
@ 2009-11-29  5:15 ` Junio C Hamano
  2009-11-30 18:40   ` Greg A. Woods
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2009-11-29  5:15 UTC (permalink / raw)
  To: Greg A. Woods; +Cc: The Git Mailing List

In order to make things smoother and easier in the future, you may want to
learn "topic branch" workflows (found in many git tutorial material).

But it is too late for the history you already created; "cherry-pick" is
your friend to recover from the shape of your existing history.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-29  5:15 ` "git merge" merges too much! Junio C Hamano
@ 2009-11-30 18:40   ` Greg A. Woods
  2009-11-30 20:50     ` Junio C Hamano
  2009-11-30 21:17     ` Dmitry Potapov
  0 siblings, 2 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-11-30 18:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4105 bytes --]

At Sat, 28 Nov 2009 21:15:05 -0800, Junio C Hamano <gitster@pobox.com> wrote:
Subject: Re: "git merge" merges too much!
> 
> In order to make things smoother and easier in the future, you may want to
> learn "topic branch" workflows (found in many git tutorial material).

I was thinking hard about topic branches too, but I'm still having a
hard time figuring out how they might work best for my purposes.

One hard problem is how to create "clean" topic branches and use them
effectively when the local working environment requires all or many of
the whole set of local changes.  Bootstrapping these local changes as
topic branches may be one thing; but going further with topic branches
to create local features or fixes, some of which are to be submitted
upstream for eventual inclusion in the origin/master branch, and others
of which will only ever be ported forward to future official releases,
is quite another thing all together.

These new-feature topic branches will have to be worked on from the
main (most well supported) local branch, which will be forked from the
main release branch (or based on the trunk at the main release tag), but
yet care will have to be taken to make sure the merge doesn't include
dependencies on local-only changes (i.e. so that it can be safely
submitted upstream).

Some projects also only wish to receive patches that work against their
trunk branch, so a given local topic branch will have to be merged onto
yet another branch forking from the trunk where it may likely encounter
conflicts (since the topic branch is based on older code from an
existing release).  This isn't really a Git problem I suppose, except
for the fact that it means the lack of easy support for multiple working
directories that track different branches makes this kind of development
somewhat more difficult to do with Git than with, say, CVS.

While it may be quite convenient in small projects to quickly move a
single working directory from one branch to another and do various
builds and tests from the result, large projects (say where a compile
takes the better part of a working day or more and where testing
requires multi-day processes) demand that working directories remain
"stable", and multiple lines of development therefore demand multiple
working directories.  Developing procedures around Git to manage this
with "push" and "pull" into multiple local copies of the repository,
each with their own working directory, is of course possible (though not
necessarily easy), but once again if the repositories are similarly huge
then it may not be possible to support multiple repo copies for each
developer in a given working environment.

So, ideally it seems from my understanding at this point that I want to
be able to repeatedly merge topic branch changes (as they are worked on)
into multiple local configuration branches (each of which perhaps
includes multiple local topics and other changes), each of which pushes
its changes out into possibly multiple working directories.

> But it is too late for the history you already created; "cherry-pick" is
> your friend to recover from the shape of your existing history.

The problem is that it's not _my_ existing history -- it's from the
remote project I'm trying to work with.

I think you'll agree there nothing wrong with a project using tags alone
to manage its releases.

However this means I've got to work out how to do merges of my local
changes onto multiple locally created branches which fork off from these
tags.  Perhaps using this cherry-pick tool is truly the best I can do in
this situation.

(it makes me worry though about how I might manage a super-project which
pulls from many remote sub-projects (and which includes large amounts of
its own code) where some of those remote projects use release branches,
and some just use tags, etc.)

-- 
						Greg A. Woods

+1 416 218-0098                VE3TCP          RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>      Secrets of the Weird <woods@weird.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-30 18:40   ` Greg A. Woods
@ 2009-11-30 20:50     ` Junio C Hamano
  2009-11-30 21:17     ` Dmitry Potapov
  1 sibling, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2009-11-30 20:50 UTC (permalink / raw)
  To: The Git Mailing List; +Cc: Junio C Hamano

"Greg A. Woods" <woods@planix.com> writes:

> ....  This isn't really a Git problem I suppose, except
> for the fact that it means the lack of easy support for multiple working
> directories that track different branches makes this kind of development
> somewhat more difficult to do with Git than with, say, CVS.

You want to google for git-new-workdir then; it is found in
contrib/workdir and fairly widely used.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-30 18:40   ` Greg A. Woods
  2009-11-30 20:50     ` Junio C Hamano
@ 2009-11-30 21:17     ` Dmitry Potapov
  2009-12-01  0:24       ` Greg A. Woods
  1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Potapov @ 2009-11-30 21:17 UTC (permalink / raw)
  To: The Git Mailing List; +Cc: Junio C Hamano

On Mon, Nov 30, 2009 at 01:40:38PM -0500, Greg A. Woods wrote:
> 
> While it may be quite convenient in small projects to quickly move a
> single working directory from one branch to another and do various
> builds and tests from the result, large projects (say where a compile
> takes the better part of a working day or more and where testing
> requires multi-day processes) demand that working directories remain
> "stable", and multiple lines of development therefore demand multiple
> working directories.

It depends on the project and what tools are used, but using ccache and
proper dependencies help a lot to reduce the cost of switching. In fact,
it may be faster to switch to another branch and have to recompile a few
files than to go into another working directory, because when you go to
another working directory, you hit cold cache and things get very slow.

And then if a project is huge and takes a lot of time to compile and
test everything, I do not think, it is a good idea to build that in your
work tree. Instead, you make a shanshot using git-archive and then run
full build and test on it. In this way, you know that you test exactly
what you have committed (you can amend any commit later until you
publish it).

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-11-30 21:17     ` Dmitry Potapov
@ 2009-12-01  0:24       ` Greg A. Woods
  2009-12-01  5:47         ` Dmitry Potapov
  0 siblings, 1 reply; 33+ messages in thread
From: Greg A. Woods @ 2009-12-01  0:24 UTC (permalink / raw)
  To: The Git Mailing List; +Cc: Dmitry Potapov

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

At Tue, 1 Dec 2009 00:17:44 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: "git merge" merges too much!
> 
> It depends on the project and what tools are used, but using ccache and
> proper dependencies help a lot to reduce the cost of switching. In fact,
> it may be faster to switch to another branch and have to recompile a few
> files than to go into another working directory, because when you go to
> another working directory, you hit cold cache and things get very slow.

perhaps, sometimes, but at least with some tools that can more often
than not just end up with a confusing mess if you happen to change
something at exactly the wrong time, and with Git it seems almost too
easy to wildly change many files in the working directory, even if you
can get them back into their previous state relatively quickly

Things get even weirder if you happen to be playing with older branches
too -- most build tools don't have ability to follow files that go back
in time as they assume any product files newer than the sources are
already up-to-date, no matter how much older the sources might become on
a second build.

From a good software hygiene perspective the only safe way I can see to
build a Git working directory after manipulating any branches with Git
is to do a complete "make clean && make" cycle.  That is until Git also
incorporates, or integrates with, really good build tools....  :-)

> And then if a project is huge and takes a lot of time to compile and
> test everything, I do not think, it is a good idea to build that in your
> work tree. Instead, you make a shanshot using git-archive and then run
> full build and test on it. In this way, you know that you test exactly
> what you have committed (you can amend any commit later until you
> publish it).

I think this "git-new-workdir" script is the thing to try.  It probably
means keeping separate "configuration" branches, one for each build
working directory, but I think that's OK.

These source trees are big enough that one doesn't just go throwing
around entire copies of them willy-nilly.  Disk bandwidth is also a
limited resource we are very concerned about, not just disk space.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: "git merge" merges too much!
  2009-12-01  0:24       ` Greg A. Woods
@ 2009-12-01  5:47         ` Dmitry Potapov
  2009-12-01 17:59           ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Potapov @ 2009-12-01  5:47 UTC (permalink / raw)
  To: The Git Mailing List

On Mon, Nov 30, 2009 at 07:24:14PM -0500, Greg A. Woods wrote:
> 
> Things get even weirder if you happen to be playing with older branches
> too -- most build tools don't have ability to follow files that go back
> in time as they assume any product files newer than the sources are
> already up-to-date, no matter how much older the sources might become on
> a second build.

No, files do not go back in time when you switch between branches. The
timestamp on files is the time when they are written to your working
tree (and it is so with other VCSes that I worked with, so I do not
know where you get idea of file timestamps going back in time). Thus any
building tool such as 'make' should work fine provided you have correct
dependencies. I have switched between widely diversed branches, and I
have never had any problem with that.

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01  5:47         ` Dmitry Potapov
@ 2009-12-01 17:59           ` Greg A. Woods
  2009-12-01 18:51             ` Dmitry Potapov
  0 siblings, 1 reply; 33+ messages in thread
From: Greg A. Woods @ 2009-12-01 17:59 UTC (permalink / raw)
  To: The Git Mailing List; +Cc: Dmitry Potapov

[-- Attachment #1: Type: text/plain, Size: 2257 bytes --]

At Tue, 1 Dec 2009 08:47:34 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: "git merge" merges too much!
> 
> On Mon, Nov 30, 2009 at 07:24:14PM -0500, Greg A. Woods wrote:
> > 
> > Things get even weirder if you happen to be playing with older branches
> > too -- most build tools don't have ability to follow files that go back
> > in time as they assume any product files newer than the sources are
> > already up-to-date, no matter how much older the sources might become on
> > a second build.
> 
> No, files do not go back in time when you switch between branches. The
> timestamp on files is the time when they are written to your working
> tree

Hmmm, I didn't really say anything in particular about file timestamps
-- I meant the file content may go back in time.  More correctly I
should have said that the file content may become inconsistent with the
state of other files that have just been compiled.

If the timestamps do not get set back to commit time, but rather are
simply updated to move the last modify time to the time each change is
made to a working file (which is as you said, to be expected),
regardless of whether its content goes back in time or not, then this
may or may not help a currently running build to figure out what really
needs to be re-compiled.  Likely it won't even for a recursive-make
style update build, but certainly not for one where all build actions
are pre-determined before any of them are started.

If the content of one or more files goes back in time to an earlier
state while the compile is happening then ultimately the result must be
considered to be undefined.  The best you can hope for is a break in the
compile.

This is why I agreed with you that a build should never be done in a
working directory where any file editing or VCS action is occurring
simultaneously.

I just disagreed that "git archive" was a reasonable alternative to
leaving the working directory alone during the entire time of the build.
It is not really reasonable for large projects any more than stopping
all work on the sources is reasonable.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01 17:59           ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods
@ 2009-12-01 18:51             ` Dmitry Potapov
  2009-12-01 18:58               ` Greg A. Woods
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Potapov @ 2009-12-01 18:51 UTC (permalink / raw)
  To: The Git Mailing List

On Tue, Dec 01, 2009 at 12:59:58PM -0500, Greg A. Woods wrote:
> At Tue, 1 Dec 2009 08:47:34 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Subject: Re: "git merge" merges too much!
> > 
> > On Mon, Nov 30, 2009 at 07:24:14PM -0500, Greg A. Woods wrote:
> > > 
> > > Things get even weirder if you happen to be playing with older branches
> > > too -- most build tools don't have ability to follow files that go back
> > > in time as they assume any product files newer than the sources are
> > > already up-to-date, no matter how much older the sources might become on
> > > a second build.
> > 
> > No, files do not go back in time when you switch between branches. The
> > timestamp on files is the time when they are written to your working
> > tree
> 
> Hmmm, I didn't really say anything in particular about file timestamps
> -- I meant the file content may go back in time.  More correctly I
> should have said that the file content may become inconsistent with the
> state of other files that have just been compiled.

There is no difference of content going back in time or forth. If a file
is changed, any decent build system should recompile the corresponding
files. If the build does not handle dependencies properly, you can end
up with inconsistent state just by editing some files.

> If the timestamps do not get set back to commit time, but rather are
> simply updated to move the last modify time to the time each change is
> made to a working file (which is as you said, to be expected),

More precisely, Git does not anything about modification time during
checkout. The system automatically updates the modification time when
a file is written, and Git does not mess with it.

> regardless of whether its content goes back in time or not, then this
> may or may not help a currently running build to figure out what really
> needs to be re-compiled.

Obviously, switching branches while running build may produce very
confusing results, but it is not any different than editing files by
hands during built -- any concurrent modification may confuse the build
system.

> I just disagreed that "git archive" was a reasonable alternative to
> leaving the working directory alone during the entire time of the build.

Using "git archive" allows you avoid running long time procedure such as
full clean build and testing in the working tree. Also, it is guaranteed
that you test exactly what you put in Git and some other garbage in your
working tree does not affect the result. But my point was that switching
between branches and recompile a few changed files may be faster than
going to another working tree.

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01 18:51             ` Dmitry Potapov
@ 2009-12-01 18:58               ` Greg A. Woods
  2009-12-01 21:18                 ` Dmitry Potapov
  0 siblings, 1 reply; 33+ messages in thread
From: Greg A. Woods @ 2009-12-01 18:58 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]

At Tue, 1 Dec 2009 21:51:14 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: multiple working directories for long-running builds (was:	"git merge" merges too much!)
> 
> Obviously, switching branches while running build may produce very
> confusing results, but it is not any different than editing files by
> hands during built -- any concurrent modification may confuse the build
> system.

That's what I said.  This is why multiple working directories is an
essential feature for any significantly large project.


> > I just disagreed that "git archive" was a reasonable alternative to
> > leaving the working directory alone during the entire time of the build.
> 
> Using "git archive" allows you avoid running long time procedure such as
> full clean build and testing in the working tree. Also, it is guaranteed
> that you test exactly what you put in Git and some other garbage in your
> working tree does not affect the result.

Sure, but let's be very clear here:  "git archive" is likely even more
impossible for some large projects to use than "git clone" would be to
use to create build directories.

Disk bandwidth is almost always more expensive than disk space.

>   But my point was that switching
> between branches and recompile a few changed files may be faster than
> going to another working tree.

That's possibly going to generate even more unnecessary churn in the
working directory, and thus even more unnecessary re-compiles.

Multiple working directories are really the only sane solution
sometimes.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01 18:58               ` Greg A. Woods
@ 2009-12-01 21:18                 ` Dmitry Potapov
  2009-12-01 22:25                   ` Jeff Epler
                                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Dmitry Potapov @ 2009-12-01 21:18 UTC (permalink / raw)
  To: The Git Mailing List

On Tue, Dec 01, 2009 at 01:58:05PM -0500, Greg A. Woods wrote:
> 
> > > I just disagreed that "git archive" was a reasonable alternative to
> > > leaving the working directory alone during the entire time of the build.
> > 
> > Using "git archive" allows you avoid running long time procedure such as
> > full clean build and testing in the working tree. Also, it is guaranteed
> > that you test exactly what you put in Git and some other garbage in your
> > working tree does not affect the result.
> 
> Sure, but let's be very clear here:  "git archive" is likely even more
> impossible for some large projects to use than "git clone" would be to
> use to create build directories.

AFAIK, "git archive" is cheaper than git clone. I do not say it is fast
for huge project, but if you want to run a process such as clean build
and test that takes a long time anyway, it does not add much to the
total time.

> 
> Disk bandwidth is almost always more expensive than disk space.

Disk bandwidth is certainly more expensive than disk space, and the
whole point was to avoid a lot of disk bandwidth by using hot cache.
If you have two working tree then it is likely that only one will be
in the hot cache, that is why you can switch faster (and to recompile
a few files) than going to another working tree. It has never been
about disk space, it is about disk cache and keeping it hot.

> 
> Multiple working directories are really the only sane solution
> sometimes.

Sure, sometimes... I do not know details to say what will be better in
your case, but I just wanted to say that you should weight that against
switching, because switching in Git is very fast. Much faster than with
any other VCS...

Another thing to consider is that if you put a really huge project in one
Git repo than Git may not be as fast as you may want, because Git tracks
the whole project as the whole. So, you may want to split your project in
a few relatively independent modules (See git submodule).

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01 21:18                 ` Dmitry Potapov
@ 2009-12-01 22:25                   ` Jeff Epler
  2009-12-01 22:44                   ` Greg A. Woods
  2009-12-02  2:09                   ` multiple working directories for long-running builds Junio C Hamano
  2 siblings, 0 replies; 33+ messages in thread
From: Jeff Epler @ 2009-12-01 22:25 UTC (permalink / raw)
  To: The Git Mailing List

On Wed, Dec 02, 2009 at 12:18:30AM +0300, Dmitry Potapov wrote:
> AFAIK, "git archive" is cheaper than git clone. I do not say it is fast
> for huge project, but if you want to run a process such as clean build
> and test that takes a long time anyway, it does not add much to the
> total time.

If you want to keep a separate copy of your source tree in order to get
consistent builds, "git archive" is not much cheaper in disk space or in
time, at least on this unix system:

$ find orig -exec md5sum {} + > /dev/null 2>&1 # ensure hot cache
$ time git clone orig temp-clone
Initialized empty Git repository in
/usr/local/jepler/src/temp-clone/.git/
0.6 real 0.3 user 0.6 system
$ time (GIT_DIR=orig/.git git archive --format tar --prefix temp-archive/ HEAD | tar xf -)
0.5 real 0.2 user 0.5 system
$ du -s orig temp-clone temp-archive
41880   orig
14640   temp-clone  # du excludes files already accounted for by 'orig'
14304   temp-archive

.. and the next run to bring temp-clone up to date can be even faster,
since it's just 'git pull' and will only touch changed files.

Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01 21:18                 ` Dmitry Potapov
  2009-12-01 22:25                   ` Jeff Epler
@ 2009-12-01 22:44                   ` Greg A. Woods
  2009-12-02  0:10                     ` Dmitry Potapov
  2009-12-02  2:09                   ` multiple working directories for long-running builds Junio C Hamano
  2 siblings, 1 reply; 33+ messages in thread
From: Greg A. Woods @ 2009-12-01 22:44 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

At Wed, 2 Dec 2009 00:18:30 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: multiple working directories for long-running builds (was:	"git merge" merges too much!)
> 
> AFAIK, "git archive" is cheaper than git clone.

It depends on what you mean by "cheaper"  It's clearly going to require
less disk space.  However it's also clearly going to require more disk
bandwidth, potentially a _LOT_ more disk bandwidth.

> I do not say it is fast
> for huge project, but if you want to run a process such as clean build
> and test that takes a long time anyway, it does not add much to the
> total time.

I think you need to try throwing around an archive of, say, 50,000 small
files a few times simultaneously on your system to appreciate the issue.

(i.e. consider the load on a storage subsystem, say a SAN or NAS, where
with your suggestion there might be a dozen or more developers running
"git archive" frequently enough that even three or four might be doing
it at the same time, and this on top of all the i/o bandwidth required
for the builds all of the other developers are also running at the same
time.)

> > Disk bandwidth is almost always more expensive than disk space.
> 
> Disk bandwidth is certainly more expensive than disk space, and the
> whole point was to avoid a lot of disk bandwidth by using hot cache.

Huh?  Throwing around the archive has nothing to do with the build
system in this case.

Please let me worry about optimizing the builds -- that's well under
control already and is not really yet an issue for the VCS, at least
not yet, and maybe never in many cases.

I'm just not willing to even consider using what would really be the
most simplistic and most expensive form of updating a working directory
as could ever be imagined.  "Git archive" is truly unintelligent, as-is.

Perhaps if "git archive" could talk intelligently to an rsync process
and be smart about updating an existing working directory it would be
the ideal answer, but _NEVER_ with the current method of just unpacking
an archive over an existing directory!  (Now there's a good Google SoC,
or masters, project for someone eager to learn about rsync & git
internals!)

Local filesystem "git clone" is usable in many scenarios, but it just
won't work nearly so efficiently in a scenario where users have local
repos on their workstations and use an NFS NAS to feed the build
servers.  As I understand it this 'git-new-workdir' script will work
though since it uses symlinks that can be pointed across the mount back
to the local disk on the user's workstation.  They can just mount the
build directory and go into it and run a "git checkout" and start
another build on the build server(s).

A major further advantage of multiple working directories is that this
eliminates one more point of failure -- i.e. you don't end up with
multiple copies of the repo that _should_ be effectively read-only for
everything but "push", and perhaps then only to one branch.  I don't
like giving developers too much rope, especially in all the wrong
places.  "git archive" does achieve the same even better I suppose, but
without something like a "--format=rsync" option it's completely out of
the question.

> Another thing to consider is that if you put a really huge project in one
> Git repo than Git may not be as fast as you may want, because Git tracks
> the whole project as the whole. So, you may want to split your project in
> a few relatively independent modules (See git submodule).

Indeed -- but sometimes I think this is not feasible either.

I know of at least three very real-world projects where there are tens
of thousands of small files that really must be managed as one unit, and
where running a build in that tree could take a whole day or two on even
the fastest currently available dedicated build server.  Eg. pkgsrc.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-01 22:44                   ` Greg A. Woods
@ 2009-12-02  0:10                     ` Dmitry Potapov
  2009-12-03  5:11                       ` Greg A. Woods
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Potapov @ 2009-12-02  0:10 UTC (permalink / raw)
  To: The Git Mailing List

On Tue, Dec 01, 2009 at 05:44:15PM -0500, Greg A. Woods wrote:
> At Wed, 2 Dec 2009 00:18:30 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Subject: Re: multiple working directories for long-running builds (was:	"git merge" merges too much!)
> > 
> > AFAIK, "git archive" is cheaper than git clone.
> 
> It depends on what you mean by "cheaper"

You said:

>>> "git archive" is likely even more
>>> impossible for some large projects to use than "git clone" 

My point was that I do not see why you believe "git archive" is more
expensive than "git clone". Accordingly to Jeff Epler's numbers,
"git archive" is 20% faster than "git clone"...

> It's clearly going to require
> less disk space.  However it's also clearly going to require more disk
> bandwidth, potentially a _LOT_ more disk bandwidth.

Well, it does, but as I said earlier it is not the case where I expect
things being instantaneous. If you do not a full build and test, then it
is going to take a lot of time anyway, and git-archive is not likely to
be a big overhead in relative terms.

> > I do not say it is fast
> > for huge project, but if you want to run a process such as clean build
> > and test that takes a long time anyway, it does not add much to the
> > total time.
> 
> I think you need to try throwing around an archive of, say, 50,000 small
> files a few times simultaneously on your system to appreciate the issue.
> 
> (i.e. consider the load on a storage subsystem, say a SAN or NAS, where
> with your suggestion there might be a dozen or more developers running
> "git archive" frequently enough that even three or four might be doing
> it at the same time, and this on top of all the i/o bandwidth required
> for the builds all of the other developers are also running at the same
> time.)

First of all, I do not see why it should be done frequently. The full
build and test may be run once or twice a day, and the full test and
build may take an hour. git-archive will take probably less than minute
for 50,000 small files (especially if you use tmpfs). In other words, it
is 1% overhead, but you get a clean build, which is fully tested. You
can sure that no garbage left in the worktree that could influence the
result.

> 
> > > Disk bandwidth is almost always more expensive than disk space.
> > 
> > Disk bandwidth is certainly more expensive than disk space, and the
> > whole point was to avoid a lot of disk bandwidth by using hot cache.
> 
> Huh?  Throwing around the archive has nothing to do with the build
> system in this case.

"git archive" to do full build and test, which is rarely done. Normally,
you just switch between branches, which means a few files are changed
and rebuild, and no archive is involved here.

> 
> I'm just not willing to even consider using what would really be the
> most simplistic and most expensive form of updating a working directory
> as could ever be imagined.  "Git archive" is truly unintelligent, as-is.

"git archive" is NOT for updating your working tree. You use "git
checkout" to switch between branches. "git checkout" is intelligent
enough to overwrite only those files that actually differ between
two versions.

> A major further advantage of multiple working directories is that this
> eliminates one more point of failure -- i.e. you don't end up with
> multiple copies of the repo that _should_ be effectively read-only for
> everything but "push", and perhaps then only to one branch.

Multiple copies of the same repo is never a problem (except taking some
disks space). I really do not understand why you say that some copies
should be effectively read-only... You can start to work on some feature
at one place (using one repo) and then continue in another place using
another repo. (Obviously, it will require to fetch changes from the
first repo, before you will be able to continue, but it is just one
command). In other words, I really do not understand what are you
talking about here.

> 
> I know of at least three very real-world projects where there are tens
> of thousands of small files that really must be managed as one unit, and
> where running a build in that tree could take a whole day or two on even
> the fastest currently available dedicated build server.  Eg. pkgsrc.

Tens of thousands files should not be a problem... For instance, the
Linux kernel has around 30 thousands and Git works very well in this
case. But I would consider to split if it has hundrends of thousands...

Dmitry

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
  2009-12-02  0:10                     ` Dmitry Potapov
@ 2009-12-03  5:11                       ` Greg A. Woods
  0 siblings, 0 replies; 33+ messages in thread
From: Greg A. Woods @ 2009-12-03  5:11 UTC (permalink / raw)
  To: The Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]

At Wed, 2 Dec 2009 03:10:21 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: multiple working directories for long-running builds (was:	"git merge" merges too much!)
> 
> My point was that I do not see why you believe "git archive" is more
> expensive than "git clone". Accordingly to Jeff Epler's numbers,
> "git archive" is 20% faster than "git clone"...

Really!?!?!?  You don't see it?  Why is this so hard to understand?
Sorry for my incredulity, but I thought this issue was obvious.

The slightly more expensive "git clone" happens only _ONCE_.  After that
you just run "git pull" I think (plus maybe "git reset --hard"?), but in
any case it's a heck of a lot less I/O and CPU than "git archive".

And of course you skip even the one-time "git clone" operation if you
use the even faster and simpler git-new-workdir script.

"git archive" has to be run _EVERY_ time you need to update a working
directory and it currently has no choice but to toss every bit of the
whole working directory, up from the filesystem, across a pipe, and back
down to the filesystem.  It literally couldn't be more expensive!

Sure, no matter how you do it, updating the working directory might not
always be the biggest part of the operation, but it's insane to use the
most expensive mechanism ever possible when there are far cheaper
alternatives.

BTW, there cannot, and MUST NOT, be any integrity advantage to using
"git archive" over using multiple working directories.  "git archive
branch" must, by definition, produce exactly the same result as if you
did "git checkout branch; rm -rf .git" or else it is buggy.

Note also that the build directories created with git-new-workdir can be
treated as read-only, and perhaps even forced to be read-only by mount
options or maybe just by a corporate policy directive.  (in all projects
I'm working on the source tree can be read-only -- product files are
always generated elsewhere)

> Multiple copies of the same repo is never a problem (except taking some
> disks space).

Exactly -- gigabytes of disk space per copy in the cases I'm concerned
about (i.e. where hard links are impossible).  I've heard that at least
one very large project has an 8GB repository currently.  Three of the
large projects I work on now are about a gigabyte per copy.  That's just
what's under .git too, not including the whole working directory as
well.  I can't even manage a "git clone" from HTTP of one of them
without increasing my default process limits as it is so big and uses up
too much memory.

I guess one could skip the initial more-expensive "git clone" operation
by copying the repo using low-level bit moving commands, like "cp -r" or
whatever, and then tweak the result to make it appear as if it had been
cloned, but even that requires moving gigabytes of data unnecessarily
across what is likely to be a network connection of some sort.

Are you fighting against git-new-workdir, or the concept of multiple
working directories?

> > A major further advantage of multiple working directories is that this
> > eliminates one more point of failure -- i.e. you don't end up with
> > multiple copies of the repo that _should_ be effectively read-only for
> > everything but "push", and perhaps then only to one branch.
> 
> I really do not understand why you say that some copies
> should be effectively read-only... You can start to work on some feature
> at one place (using one repo) and then continue in another place using
> another repo. (Obviously, it will require to fetch changes from the
> first repo, before you will be able to continue, but it is just one
> command). In other words, I really do not understand what are you
> talking about here.

Developers, especially more junior ones, work on code, and they (are
supposed to) spend almost all of their intellectual energy on the issues
to do with creating and modifying code -- they are not expected to be
integration engineers, nor are they expected to be VCS and SCM experts.

The more steps you put in place for them to do, and the more places you
allow them to store changes, etc., etc., etc., the more mistakes that
they will make.

Besides, in some scenarios build directories will be checked out from
integration branches which shouldn't have any direct commits made to
them, especially not to fix a problem in a build.

BTW, pkgsrc has well over 50,000 files, FreeBSD ports is over 100,000.
Neither can really be split in any rational way.

-- 
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/

[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: multiple working directories for long-running builds
  2009-12-01 21:18                 ` Dmitry Potapov
  2009-12-01 22:25                   ` Jeff Epler
  2009-12-01 22:44                   ` Greg A. Woods
@ 2009-12-02  2:09                   ` Junio C Hamano
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2009-12-02  2:09 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: The Git Mailing List

Dmitry Potapov <dpotapov@gmail.com> writes:

> On Tue, Dec 01, 2009 at 01:58:05PM -0500, Greg A. Woods wrote:
>> 
>> > > I just disagreed that "git archive" was a reasonable alternative to
>> > > leaving the working directory alone during the entire time of the build.
>> > 
>> > Using "git archive" allows you avoid running long time procedure such as
>> > full clean build and testing in the working tree. Also, it is guaranteed
>> > that you test exactly what you put in Git and some other garbage in your
>> > working tree does not affect the result.
>> 
>> Sure, but let's be very clear here:  "git archive" is likely even more
>> impossible for some large projects to use than "git clone" would be to
>> use to create build directories.
>
> AFAIK, "git archive" is cheaper than git clone. I do not say it is fast
> for huge project, but if you want to run a process such as clean build
> and test that takes a long time anyway, it does not add much to the
> total time.

I do not understand people who advocate for "git archive" to be used in
this manner at all.

I do use a set of separate build directories, and I typically run 5 to 10
full builds (in each) per day, but I rarely if ever make fix in them.
Perhaps the usage pattern expected by people who want others to use "git
archive" to prepare separate build directories may be different from how I
use them for.

I see two downsides in using "git archive":

 - "archive" piped to "tar xf -" will overwrite _all_ files every time you
   refresh the build area, causing extra work on "make" and any build
   procedure based on file timestamps.  Sure, you can work it around by
   using ccache but why make your life complicated?

 - When a build in these separate build areas fails, you would want to go
   there and try to diagnose or even fix the problem in there, not in your
   primary working area (after all, the whole point of keeping a separate
   build area is so that you do not have to switch branches too much in
   the primary working area).  A directory structure prepared by "archive"
   piped to "tar xf -" however is not a work tree, and any experimental
   changes (e.g. "debugf()") or fixes you make there need to be reverted
   or taken back manually to be placed in the primary working area.

If your build area is prepared with new-workdir, then you share the
history and you even share the ref namespace, so that "reset --hard" will
remove all the debugf() added while diagnosing, and "diff" will give you
the patch you need to take home.

You could even make a commit from your build area, but this cuts both
ways.  You need to be aware that after committing on a branch in one
repository other repositories that have the same branch checked out will
become out of sync.  It is however less of an issue in practice, because
the build areas are typically used to check out integration branches
(e.g. 'master' and 'next' in git.git) that you do not directly commit
anyway, and you will get very aware of the tentative nature of the tree,
as the update procedure for such a build area prepared with new-workdir is
always:

    cd /buildfarm/<branch>/ && git reset --hard

This will not touch any file that do not have to get updated, so your
"make" won't get confused.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-12-09 19:56 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-29  3:21 "git merge" merges too much! Greg A. Woods
2009-11-29  5:14 ` Jeff King
2009-11-30 18:12   ` Greg A. Woods
2009-11-30 19:22     ` Dmitry Potapov
2009-12-01 18:52       ` Greg A. Woods
2009-12-01 20:50         ` Dmitry Potapov
2009-12-01 21:58           ` Greg A. Woods
2009-12-02  0:22             ` Dmitry Potapov
2009-12-02 10:20         ` Nanako Shiraishi
2009-12-02 20:09     ` Jeff King
2009-12-03  1:21       ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods
2009-12-03  1:34         ` Git documentation consistency Junio C Hamano
2009-12-03  7:22           ` Greg A. Woods
2009-12-03  7:45             ` Jeff King
2009-12-03 15:24               ` Uri Okrent
2009-12-03 16:22             ` Marko Kreen
2009-12-09 19:56               ` Greg A. Woods
2009-12-03  2:07         ` Git documentation consistency (was: "git merge" merges too much!) Jeff King
2009-11-29  5:15 ` "git merge" merges too much! Junio C Hamano
2009-11-30 18:40   ` Greg A. Woods
2009-11-30 20:50     ` Junio C Hamano
2009-11-30 21:17     ` Dmitry Potapov
2009-12-01  0:24       ` Greg A. Woods
2009-12-01  5:47         ` Dmitry Potapov
2009-12-01 17:59           ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods
2009-12-01 18:51             ` Dmitry Potapov
2009-12-01 18:58               ` Greg A. Woods
2009-12-01 21:18                 ` Dmitry Potapov
2009-12-01 22:25                   ` Jeff Epler
2009-12-01 22:44                   ` Greg A. Woods
2009-12-02  0:10                     ` Dmitry Potapov
2009-12-03  5:11                       ` Greg A. Woods
2009-12-02  2:09                   ` multiple working directories for long-running builds Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).