* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Han-Wen Nienhuys @ 2006-11-16 23:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611161436230.3349@woody.osdl.org>
Linus Torvalds escreveu:
> A lot of the complaints seem to not be about the interfaces, but about
> people not _understanding_ and knowing what the interfaces do. If you were
From the point of view of a user, there is not really a difference
between the two. As a user, you form a mental model of how things
work by looking at the interface. If the interface is bad, the user
creates a faulty model in his head, and starts doing things that
are perfectly logical in the faulty model, but stupid and silly when
you consider the actual internals.
A nice book about this is "The Design of Everyday Things" by Donald
Norman.
--
^ permalink raw reply
* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Johannes Schindelin @ 2006-11-16 23:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Han-Wen Nienhuys, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611161436230.3349@woody.osdl.org>
Hi,
On Thu, 16 Nov 2006, Linus Torvalds wrote:
> On Thu, 16 Nov 2006, Johannes Schindelin wrote:
> >
> > - a terrible UI.
>
> Why? We _do_ have the temporary branch. It's called FETCH_HEAD.
It is a terrible UI, because it was not that obvious to me. And I consider
myself not a git newbie.
Besides, it is not really a temporary branch. If it was, the pull would
_not_ download all these objects again, would it?
> > _Also_, git-pull not storing the fetched branches at least temporarily
> > often annoyed me: the pull did not work, and the SHA1 was so far away I
> > could not even scroll to it.
>
> Again, why didn't you use FETCH_HEAD?
Because I am a Jar-HEAD?
> If the user doesn't give us a head to write to, we clearly MUST NOT write
> to any long-term branch. That would be a _horrible_ mistake.
I was _not_ suggesting a long-term branch. Just a way to do-what-i-want
and not waste bandwidth.
> And your "solution" is obviously totally unusable. git ABSOLUTELY MUST NOT
> overwrite any existing branches unless explicitly told to do so by the
> user.
Guess three times why I did not post the patches.
But the real problem is not necessarily the behaviour; it is the obscure
fashion of the behaviour. You may not understand that problem, because you
were there from the beginning. You saw the big-bang and how all the
quarks formed all of a sudden, and how matter and eventually planets
and suns came into being.
But others (me included) were not there. Or they did not really watch. And
now they see all these creatures, and plants, and bacteria, and they do
not understand how these are all connected, because of that. And now they
think "wow that must have been some intelligent design, and really a
miracle, and I cannot understand how it works." But that is not true
(the latter part of course).
There is something to be said about the simplicity of Mercurial. It's
inner workings may suck, but people get easily attracted by it.
I do not claim we should imitate Mercurial, or even hide the index (even
if I sometimes wonder if the index is not just a clever way to accelerate
commits, and nothing more).
> So I really don't see your point.
>
> A lot of the complaints seem to not be about the interfaces, but about
> people not _understanding_ and knowing what the interfaces do.
But the interfaces should be usable interfaces! They should _explain_ what
they do. Other software does so, it can't be _that_ hard.
> git merge "$(git fmt-merge-msg < .git/FETCH_HEAD)" HEAD FETCH_HEAD
I find that quite easy to understand. Why? Because I happen to _know_ the
syntax of -merge and -fmt-merge-msg. For similar reasons I _understand_
why -pull behaves like it does. But others don't; they will shudder and
then run.
Maybe it is not important that -pull fetches all objects all over again.
But it _is_ important to make things like merging branches (local or
remote) trivial. It _is_ important to make the user experience be fun.
Ciao,
Dscho
^ permalink raw reply
* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Han-Wen Nienhuys @ 2006-11-16 23:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611160958170.3349@woody.osdl.org>
Linus Torvalds escreveu:
>> You're misunderstanding me: the multi-repo is at git.sv.gnu.org is the
>> remote one. The example I gave was about locally creating a single
>> project repo from a remote multiproject repo.
>
> Ahh.
>
> Ok, try the patch I just sent out, and see if it works for you. It
> _should_ allow you to do exactly that
I'm leaving for a short holiday tomorrow, but will do when I come back.
>> From UI perspective it would be nice if this could also be done with clone,
>>
>> git clone . ssh+git://....
>
> The creation of a new archive tends to need special rights (with _real_
> ssh access and a shell you could do it, but "ssh+git" really means "git
> protocol over a connection that was opened with ssh, but doesn't
> necessarily have a real shell at the other end").
What happens on savannah is that the sysadmins set up an empty GIT
repo with access, and leave it to you to push the stuff. Of course,
if the initial import gets packed automatically, that's also ok.
> So I think the above syntax is actually not a good one, because it cannot
> work in the general case. It's much better to get used to setting up a
> repo first, and then pushing into it, and just accepting that it's a
> two-phase thing.
Perhaps ; from a UI viewpoint, it would be nice though, even if it
were aliased to a simple push. (Darcs has a get command analogous to
git-clone, but also a put command to which git lacks the equivalent).
>> * why are objects downloaded twice? If I do
>>
>> git --bare fetch git://git.sv.gnu.org/lilypond.git web/master
>>
>> it downloads stuff, but I don't get a branch.
> [..]
>> If I then do
>>
>> git --bare fetch git://git.sv.gnu.org/lilypond.git web/master:master
>>
>> it downloads the same stuff again.
>
> Right. So you can either
> [..]
> See?
No, I don't understand. In the fetch all the objects with their SHA1s
were already downloaded. I'd expect that the fetch with a refspec
would simply write a HEAD and a refs/heads/master, and notice that all
the actual data was already downloaded, and doesn't download it again.
--
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Linus Torvalds @ 2006-11-16 23:22 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Han-Wen Nienhuys, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.63.0611162353250.13772@wbgn013.biozentrum.uni-wuerzburg.de>
On Fri, 17 Nov 2006, Johannes Schindelin wrote:
>
> Never ever underestimate pet peeves. If we give many people an obvious
> reason (however trivial and bike-shed-coloured) to complain, they will
> complain.
I do actually think that this discussion has been informative, partly
because I never even realized that some people would ever think to do
"init-db" + "pull".
Making things like that work is easy enough, it's just that I never saw
any point until people complained. And when they complained, the initial
complaint wasn't actually obvious. Only when Han-Wen actually gave
something that didn't work, was it clear that the real issue wasn't so
much _naming_, as just expectations about the _work_flow_.
> And hopefully you also agree that enhancing the syntax of git-merge to
> grok "git-merge [-m message] <branch>" and "git-merge [-m message]
> <url-or-remote> <branch>" would be a lovely thing, luring even more
> people into using git.
I definitely think we can make "git merge" have a more pleasant syntax.
I'm just still not sure that people should actually use it ;)
My real point was/is that usually it's really not the "naming details"
that people _really_ have problems with. The real problems tend to be in
learning a new workflow.
We can make some of those workflows easier, but I would heartily recommend
that people not worry about naming of "pull" vs "fetch", because that's
almost certainly not really the issue. Instead, if you have a problem,
rather than concentrating on the names of the programs, say:
- what do you want to get done.
Most likely it's _trivial_ to do with git, it's just that somebody used
the wrong approach, and then it didn't work at all.
- give actual examples of a workflow that didn't work or was complex.
(again, the "init-db" + "pull" example).
And yes, in many cases, it might well be a case of "sure, we can make
that _other_ workflow work too". But somebody like me, who has used git
for a year and a half, and used BK before it, probably simply uses a
different workflow than somebody who comes from CVS.
For example, I suspect that your gripe with "git fetch" was just from
using it in a really awkward manner. Maybe we could make your workflow
work with git too, but maybe it really already (and always) did, you just
used a particular tool in a way that made the use be really really
painful.
Sometimes it's just a question of "ok, use it like _this_, and now it's
actually really simple". Other times it's "ok, I didn't even realize that
you wanted to use it like _that_, and yeah, that's incredibly
inconvenient, and we can change it".
I just got involved in this discussion because I thought people were
talking about all the wrong things. Command naming really can't be _that_
big of a deal. I really don't believe that we should have some people use
"gh" instead of "git" just because they think "pull" should mean not to
merge or something.
^ permalink raw reply
* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Linus Torvalds @ 2006-11-16 23:08 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Han-Wen Nienhuys, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611161436230.3349@woody.osdl.org>
On Thu, 16 Nov 2006, Linus Torvalds wrote:
>
> git merge "$(git fmt-merge-msg < .git/FETCH_HEAD)" HEAD FETCH_HEAD
Btw, I'd like to claim that this is a _great_ user interface.
Yeah, it's different from other SCM's. I don't think you'd really want to
script a merge like this in CVS, especially not using standard UNIX
pipelines etc. But it's an example of how a lot of git operations - even
the "high level ones" are pretty scriptable, using very basic and very
simple standard UNIX shell scripting.
So even though I'd not actually _do_ the above one-liner, I think it's a
great example of how git really works, and how scriptable it can be,
without a lot of huge problems.
So considering that "FETCH_HEAD" works pretty much everywhere, and that
you can also use the totally non-scripting approach of doing "standard"
SCM things like
git diff ..FETCH_HEAD
or
gitk HEAD...FETCH_HEAD
to look at what got fetched (and in the latter case look at both the
current HEAD _and_ FETCH_HEAD, and what was in one but not the other), I
really think it's unfair to say that "git fetch" does not have a nice UI.
It's just that "git fetch" can be used two totally different ways:
- "git fetch" to get something temporary: use FETCH_HEAD, and do _not_
specify a destination branch
- "git fetch" as a way to update the branches you already have, by either
using explicit branch specifiers (which would be unusual, but works),
or by just having the branch relationships listed in your .git/remotes/
file or .git/config file.
both are actually very natural things to do.
What is probably _not_ that natural is to do the explicit branch
specifier, ie
git fetch somerepo remotebranch:localbranch
which obviously works, but you wouldn't want to actually do this very
often. Either you do something once (and use FETCH_HEAD, which is actually
nicer than a real branch in some respects: it also tells you were you
fetched _from_, and it can contain data on merging from _multiple_
branches), or you set up a "real translation" in your configuration files.
So I would say that the natural thing to do is:
- "git pull somerepo"
This will _also_ fetch all the branches you've said you want to track,
of course.
- "git fetch somerepo somebranch"
Look at FETCH_HEAD, and be happy
- "git fetch somerepo"
This is kind of strange, but it can be useful if you are basically just
mirroring another repo, and want to fetch all the branches you've said
you want to track, but don't actually want to check them out.
while the "complicated" scenario like the following is something you
should generally _avoid_, because it's just confusing and complex:
- "git fetch somerepo branch1:mybranch1 branch2:mybranch2"
This works, and I'm sure it's useful, and I've even used it (usually
with just one branch, though), but let's face it - it's too damn
complicated to be anything you want to do _normally_.
So git is definitely powerful, but I think some people have looked at the
_complicated_ cases more than the simple cases (ie maybe people have
looked too much at that last case, not realizing that there really isn't
much reason to use it - and FETCH_HEAD is one big reason why you seldom
need the complicated format).
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Johannes Schindelin @ 2006-11-16 23:01 UTC (permalink / raw)
To: Richard CURNOW; +Cc: git
In-Reply-To: <20061116075153.GA29363@tigerwolf.bri.st.com>
Hi,
On Thu, 16 Nov 2006, Richard CURNOW wrote:
> In contrast to Linus's case of wanting to record where the remote merge
> came from, I expressly don't want to record that - I want the merge
> commit to describe conceptually what was being merged with what.
>
> OK, I could use probably use pull with --no-commit, but I've already
> trained my fingers to type out the merge syntax. They'd be happier with
> 'git merge -m "Merge feature foo with fixes for bar" bar" though.
For the moment, if you forget --no-commit, you can always do a "git-commit
--amend" -- even with merges.
Hth,
Dscho
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Johannes Schindelin @ 2006-11-16 23:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Han-Wen Nienhuys, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611151908130.3349@woody.osdl.org>
Hi,
On Wed, 15 Nov 2006, Linus Torvalds wrote:
> Peopel seem to believe that changign a few names or doing other totally
> _minimal_ UI changes would somehow magically make things understandable.
Never ever underestimate pet peeves. If we give many people an obvious
reason (however trivial and bike-shed-coloured) to complain, they will
complain.
If we pull (pun intended) that reason away under their collective
backsides, they will have to find another reason to complain. But by the
time they found something, they will already be happy git users!
But since you just provided a patch to make life easier on non-gitters, I
guess you agree with that already.
And hopefully you also agree that enhancing the syntax of git-merge to
grok "git-merge [-m message] <branch>" and "git-merge [-m message]
<url-or-remote> <branch>" would be a lovely thing, luring even more
people into using git.
Maybe they even start complaining about subversion and CVS calling a merge
"update", who knows?
Ciao,
Dscho
^ permalink raw reply
* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Linus Torvalds @ 2006-11-16 22:49 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Han-Wen Nienhuys, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.63.0611162315110.13772@wbgn013.biozentrum.uni-wuerzburg.de>
On Thu, 16 Nov 2006, Johannes Schindelin wrote:
>
> - a terrible UI.
Why? We _do_ have the temporary branch. It's called FETCH_HEAD.
> _Also_, git-pull not storing the fetched branches at least temporarily
> often annoyed me: the pull did not work, and the SHA1 was so far away I
> could not even scroll to it.
Again, why didn't you use FETCH_HEAD?
If the user doesn't give us a head to write to, we clearly MUST NOT write
to any long-term branch. That would be a _horrible_ mistake.
So all your complaints seem totally misplaced. The UI is both usable and
practical, and your complaint that git pull doesn't store the fetched
branches is just NOT TRUE.
And your "solution" is obviously totally unusable. git ABSOLUTELY MUST NOT
overwrite any existing branches unless explicitly told to do so by the
user.
So I really don't see your point.
A lot of the complaints seem to not be about the interfaces, but about
people not _understanding_ and knowing what the interfaces do. If you were
confused about something (like not realizing that FETCH_HEAD is there and
very much usable), how about sending in a patch to make FETCH_HEAD use
clearer in whatever docs you looked at and didn't find it mentioned in.
Now, there is no question that some of the interfaces can get a bit
"interesting" to use. For example, if you really don't want to re-fetch
for some reason, FETCH_HEAD actually does contain enough information that
you should be able to just re-do a failed merge, for example, including
the message generation. But at that point it really _does_ get a bit
complicated, and you end up doing something like
git merge "$(git fmt-merge-msg < .git/FETCH_HEAD)" HEAD FETCH_HEAD
which should _work_, but I'm not going to claim that it's all that easy to
understand.
(That said, read that one-liner a few times, and suddenly it doesn't seem
_that_ complicated any more, now does it? You can probably even guess what
it's really going to do, even if you don't know git all that well. It's
not unreadable line noise, is it?)
Of course, if I had a merge that failed (the most common reason being that
I had some uncommitted patch in a file that wanted to be updated by the
merge), I'd never actually do the above one-liner. I'd just re-do the
pull. But if networking was _really_ slow, and I _really_ cared, maybe I'd
do the above.
(And no, I didn't actually test the above one-liner. Maybe it doesn't work
for some reason. Somebody should check, just for fun).
^ permalink raw reply
* Re: multi-project repos
From: Junio C Hamano @ 2006-11-16 22:44 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.63.0611162315110.13772@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> _If_ you use git-fetch directly you virtually always want to store the
> result. I was tempted quite often to submit a patch which adds a command
> line switch --no-warn, which is passed to git-fetch by git-pull, and
> without which git-fetch complains if the branch-to-be-fetched is not
> stored right away (and refuses to go along).
>
> _Also_, git-pull not storing the fetched branches at least temporarily
> often annoyed me: the pull did not work, and the SHA1 was so far away I
> could not even scroll to it. The result: I had to pull (and fetch!) the
> whole darned objects again. Again, I was tempted quite often to submit a
> patch which makes git-pull fetch the branches into refs/fetch-temp/* and
> only throw them away when the merge succeeded.
I think the earlier write-up by Linus on magic HEADs would help
documenting FETCH_HEAD better.
^ permalink raw reply
* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Johannes Schindelin @ 2006-11-16 22:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Han-Wen Nienhuys, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611160958170.3349@woody.osdl.org>
Hi,
On Thu, 16 Nov 2006, Linus Torvalds wrote:
> On Thu, 16 Nov 2006, Han-Wen Nienhuys wrote:
> >
> > * why are objects downloaded twice? If I do
> >
> > git --bare fetch git://git.sv.gnu.org/lilypond.git web/master
> >
> > it downloads stuff, but I don't get a branch.
>
> A "fetch" by default won't actually generate a local branch unless you
> told it to.
This is actually a perfect example for
- a script that is porcelain as well as plumbing (you are supposed to use
it directly, or via pull), and for
- a terrible UI.
_If_ you use git-fetch directly you virtually always want to store the
result. I was tempted quite often to submit a patch which adds a command
line switch --no-warn, which is passed to git-fetch by git-pull, and
without which git-fetch complains if the branch-to-be-fetched is not
stored right away (and refuses to go along).
_Also_, git-pull not storing the fetched branches at least temporarily
often annoyed me: the pull did not work, and the SHA1 was so far away I
could not even scroll to it. The result: I had to pull (and fetch!) the
whole darned objects again. Again, I was tempted quite often to submit a
patch which makes git-pull fetch the branches into refs/fetch-temp/* and
only throw them away when the merge succeeded.
Ciao,
Dscho
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Petr Baudis @ 2006-11-16 22:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Carl Worth, git, Andy Whitcroft, Nicolas Pitre
In-Reply-To: <7vr6w33vv3.fsf@assigned-by-dhcp.cox.net>
On Thu, Nov 16, 2006 at 10:49:36PM CET, Junio C Hamano wrote:
> I would like to keep it that way.
I agree - I certainly don't want to infect Git with bash dependency.
> And "POSIX says shell should behave that way" is _not_ what I want to
> hear about.
Actually, which sane platforms we care about have /bin/sh that is NOT
POSIX compatible?
> Things I would want to change:
What about [ instead of test? And
if foo; then
instead of
if foo
then
?
Am I the only one who hates
case "$log_given" in
tt*)
die "Only one of -c/-C/-F can be used." ;;
*tm*|*mt*)
die "Option -m cannot be combined with -c/-C/-F." ;;
esac
instead of having this stuff in explicit variables and writing out some
explicit boolean expressions? (There _are_ few cases where the case is
cool, but they are rare.)
It would be really great if Git would have something alike the Cogito's
optparse infrastructure. I'm not sure if you can implement it in Bourne
sh with reasonable performance, though...
I think addressing these three particular points would make the scripts
hugely more coder-friendly. (And well, I usually say that coding style
is not *that* important and is frequently overemphasised. But that holds
only to a certain point. ;-)
> Things I do not want to change:
..snip all those I agree with..
> - Do not use locals.
It's a pity. :-( Which shell doesn't support them?
It's not that huge a deal, though.
> - Do not use shell arrays.
This is quite a larger deal, I think; but the portability concerns are
very real, I guess. :|
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
^ permalink raw reply
* [DRAFT] Branching and merging with git
From: linux @ 2006-11-16 22:17 UTC (permalink / raw)
To: git; +Cc: linux
I know it took me a while to get used to playing with branches, and I
still get nervous when doing something creative. So I've been trying
to get more comfortable, and wrote the following to document what I've
learned.
It's a first draft - I just finished writing it, so there are probably
some glaring errors - but I thought it might be of interest anyway.
* Branching and merging in git
In CVS, branches are difficult and awkward to use, and generally
considered an advanced technique. Many people use CVS for a long time
without departing from the trunk.
Git is very different. Branching and merging are central to effective use
of git, and if you aren't comfortable with them, you won't be comfortable
with git. In particular, they are required to share work with other
people.
The only things that are a bit confusing are some of the names.
In particular, at least when beginning:
- You create new branches with "git checkout -b".
"git branch" should only be used to list and delete branches.
- You share work with "git fetch" and "git push". These are opposites.
- You merge with "git pull", not "git merge". "git pull" can
also do a "git fetch", but that's optional. What's not optional
is the merge.
* A brief digression on command names.
Originally, all git commands were named "git-foo". When there got to
be over a hundred, people started complaining about the clutter in
/usr/bin. After some discussion, the following solution was reached:
- It's now possible to place all of the git-foo commands into a separate
directory. (Despite the complaints, not too many people are doing it
yet.)
- One option for git users is to add that directory to their $PATH.
- Another is provided by a wrapper called just "git". It's intended to
live in a public directory like /usr/bin, and knows the location of
the separate directory. When you type "git foo", it finds and executes
"git-foo".
- Some simple commands are built into the git wrapper. When you type
"git add", it just does it internally. (On the git mailing list,
you will see patches like "make git diff a builtin"; this is what
they're talking about.)
- For compatibility, for each builtin, there is a "git-add" file,
which is just a link to the "git" wrapper. It looks at the name it
was invoked as to figure out what it should do.
The one confusing thing is that, although people usually type "git foo"
in examples, they're interchangeable in practice. I go back and forth
for no good reason. The main caveat is that to get the man page, you
still need to type "man git-foo". Fortunately, there are two other ways
to get the man page:
1) "git help foo"
2) "git foo --help"
Git doesn't have a specialized built-in help system; it just shows you
the man pages.
One outstanding problem with git's man pages is that often the most detail
is in the command page that was written first, not the user-friendly
one that you should use. For example, there are a number of special
cases of the "git diff" command that were written first, and the man
pages for these commands (git-diff-index, git-diff-files, git-diff-tree,
and git-diff-stages) are considerably more informative than the page for
plain git-diff, even though that's the command that you should use 99%
of the time.
* Git's representation of history
As you recall from Git 101, there are exactly four kinds of objects in
Git's object database. All of them have globally unique 40-character hex
names made by hashing their type and contents. Blob objects record file
contents; they contain bytes. Tree objects record directory contents;
they contain file names, permissions, and the associated tree or blob
object names. Tag objects are shareable pointers to other objects;
they're generally used to store a digital signature.
And then, we come to commit objects. Every commit points to (contains
the name of) an associated tree object which records the state of the
source code at the time of the commit, and some descriptive data (time,
author, committer, commit comment) about the commit.
And most importantly, it contains a list of "parent commits", older
commits from which this one is derived. These pointers are what produce
the history graph.
Typically only one commit (the initial commit) has zero parents. It's
possible to have more than one such commit (if you merge two projects
with different history), but that's unusual.
Many commits have exactly one parent. These are made by a normal commit
after editing. From a branching and merging point of view, they're not
too exciting.
And then there are commits which have multiple parents. Two is most
common, but git allows many more. (There's a limit of sixteen in the
source code, and the most anyone's ever used in real life is 12, and
that was generally regarded as overdoing it. Google on "doedecapus"
for discussion of it.)
Finally, there are references, stored in the .git/refs directory.
These are the human-readable names associated with commits, and the
"root set" from which all other commits should be reachable.
These references are generally divided into two types, although
there is no fundamental difference:
- Tags are references that are intended to be immutable.
The "v1.2" tag is a historical record. A tag may point to
a tag object (which will hold a signature), or just to a commit
directly. The latter isn't cryptographically authenticated, but
works just fine for everyday use.
- Heads are references that are intended to be updated. "Head"
is actually synonymous with "branch", although one emphasizes the
tip more, while the other directs your attention to the entire
path that got there.
Either way, they're just a 41-byte file that contains a 40-byte hex
object ID, plus a newline. Tags are stored in .git/refs/tags, and heads
are stored in .git/refs/heads. Creating a new branch is literally just
picking a file name and writing the ID of an existing commit into it.
The git programs enforce the immutability of tags, but that's a safety
feature, not something fundamental. You can rename a tag to the heads
directory and go wild.
The only limit on branches is clutter. A number of git commands have
ways to operate on "all heads", and if you have too many, it can get
annoying. If you're not using a branch, either delete it, or move it
somewhere (like the tags directory) where it won't clutter up the list of
"currently active heads".
(Note that CVS doesn't have this all-heads default, so people tend to
use longer branch names and keep them around after they've been merged
into the trunk. Old CVS repositories converted to git generally need
an old-branch cleanup.)
Another thing that's worth mentioning is that head and tag names can
contain slashes; i.e. you're allowed to make subdirectories in the
.git/refs/heads and .git/refs/tags directories. See the name page
for "git-check-ref-format" for full details of legal names.
* Naming revisions
CVS encourages you to tag like crazy, because the only other way to
find a given revision is by date. Git makes it a lot easier, so most
revisions don't need names.
You can find a full description in the git-rev-parse man page, but here's
a summary.
First of all, every commit has a globally unique name, its 40-digit hex
object ID. It's a bit long and awkward, but always works. This is useful
for talking about a specific commit on a mailing list. You can abbreviate
it to a unique prefix; most people find about 8 digits sufficient.
(Subversion is easier yet, because it assigns a sequential number to each
commit. However, that isn't possible in a distributed system like git.)
Second, you can refer to a head or tag name. Git looks in the
following places, in order, for a head:
1) .git
2) .git/refs
3) .git/refs/heads
4) .git/refs/tags
You should avoid having e.g. a head and a tag with the same name, but
if you do, you can specify one or the other with heads/foo and tags/foo.
Third, you can specify a commit relative to another. The simplest
one is "the parent", specified by appending ^ to a name. E.g. HEAD^
or deadbeef^. If there are multiple parents, then ^ is the same as ^1,
and the others are ^2, ^3, etc.
So the last few commits you've made are HEAD, HEAD^, HEAD^^, HEAD^^^, etc.
After a while, counting carets becomes annoying, so you can abbreviate
^^^^ as ~4. Note that this only lets you specify the first parent.
If you want to follow a side branch, you have to specify something like
"master~305^2~22".
* Converting between names
Git has two helpers (programs designed mainly for use in shell scripts)
to convert between global object IDs and human-readable names.
The first is git-rev-parse. This is a general git shell script helper,
which validates the command line and converts object names to absolute
object IDs. Its man page has a detailed description of the object
name syntax.
The second is git-name-rev, which converts the other way around. It's
particularly useful for seeing which tags a given commit falls between.
* Working with branches, the trivial cases.
By convention, the local "trunk" of git development is called "master".
This is just the name of the branch it creates when you start an empty
repository. You can delete it if you don't like the name.
If you create your repository by cloning someone else's repository, the
remote "master" branch is copied to a local branch named "origin". You
get your own "master" branch which is not tied to the remote repository.
There is always a current head, known as HEAD. (This is actually a
symbolic link, .git/HEAD, to a file like refs/heads/master.) Git requires
that this always point to the refs/heads directory.
Minor technical details:
1) HEAD used to be a Unix symlink, and can still be though of that
way, but for Microsoft support, this is now what's called a
"symbolic reference" or symref, and is a plain file containing
"ref: refs/heads/master". Git treats it just like a symlink.
There's a git-update-ref helper which writes these.
2) While HEAD must point to refs/heads, it's legal for it to
point to a file that doesn't exist. This is what happens
before the first commit in a brand new repository.
When you do "git commit", a new commit object is created with the old
HEAD as a parent, and the new commit is written to the current head
(pointed to by HEAD).
* The three uses of "git checkout"
Git checkout can do three separate things:
1) Change to a new head
git checkout [-f|-m] <branch>
This makes <branch> the new HEAD, and copies its state to the index
and the working directory.
If a file has unsaved changes in the working directory, this tries
to preserve them. This is a simple attempt, and requires that the
modified files(s) are not altered between the old and new HEADs.
In that case, the version in the working directory is left untouched.
A more aggressive option is -m, which will try to do a three-way
(intra-file) merge. This can fail, leaving unmerged files in the
index.
An alternative is to use -f, which will overwrite any unsaved changes
in the working directory. This option can be used with no <branch>
specified (defaults to HEAD) to undo local edits.
2) Revert changes to a small number of files.
git checkout [<revision>] [--] <paths>
will copy the version of the <paths> from the index to the working
directory. If a <revision> is given, the index for those paths will
be updated from the given revision before copying from the index to
the working tree.
Unlike the version with no <paths> specified, this does NOT update
HEAD, even if <paths> is ".".
3) Create a branch.
git checkout [-f|-m] -b <branch> [revision]
will create, and switch to, a new branch with the given name.
This is equivalent to
git branch <branch> [<revision>]
git checkout [-f|-m] <branch>
If <revision> is omitted, it defaults to the current HEAD, in which
case no working directory files are altered.
This is the usual way that one checks out a revision that does not
have an existing head pointing to it.
* Deleting branches
"git branch -d <head>" is safe. It deletes the given <head>, but first
it checks that the commit is reachable some other way. That is, you
merged the branch in somewhere, or you never did any edits on that branch.
It's a good idea to create a "topic branch" when you're working on
anything bigger than a one-liner, but it's also a good idea to delete
them when you're done. It's still there in the history.
* Doing rude things to heads: git reset
If you need to overwrite the current HEAD for some reason, the tool to
do it with is "git reset". There are three levels of reset:
git reset --soft <head>
This overwrites the current HEAD with the contents of <head>.
If you omit <head>, it defaults to HEAD, so this does nothing.
git reset [<head>]
git reset --mixed [<head>]
These overwrite the current HEAD, and copy it to the index,
undoing any git-update-index commands you may have executed.
If you omit <head>, it default to HEAD, so there is no change
to the current branch, but all index changes are undone.
git reset --hard [<head>]
This does everything mentioned above, and updates the
working directory. This throws away all of your in-progress
edits and gets you a clean copy. This is also commonly
used without an explicit <head>, in which case the current
HEAD is used.
* Using git-reset to fix mistakes
"Oh, no! I didn't mean to commit *that*! How do I undo it?"
If you just want to undo a commit, then you can use "git reset HEAD^"
to return the current HEAD to the previous version. If you want to leave
the commit in the index (this only applies to you if you are familiar with
using the index; see below), then you can use "git reset --soft HEAD^".
And if you want to blow away every record of the changes you made,
you can use "git reset --hard HEAD^"
If you just want a stupid trivial mistake and want to replace the most
recent commit with a corrected one, "git commit --amend" is your friend.
It makes a new commit with HEAD^ rather than HEAD as its ancestor.
* Fixing mistakes without git-reset
git-reset has the problem that it doesn't preserve hacking in progress
in the working directory. It can leave the working directory alone
(making everything a "hack in progress"), but it can't merge changes
like git checkout.
So, suppose you've been trying something that should have been simple, and
made three commits before realizing that the problem is harder than you
thought and you want your work so far to be on a new branch of its own;
committing them on the current HEAD (I'll call it "old") was a mistake.
You don't want to erase anything, just rename it. Make "new" a copy of
the current "old" and move old back to HEAD^^^ (three commits ago).
While there are ways to do that using git-reset, but far better is
to use "git branch -f":
git checkout -b new
Create (and switch to) the "new" branch.
git branch -f old HEAD^^^
Forcibly move "old" back three versions.
(You could also use old~3 or new^^^ or any synonymous name.)
You can use a similar trick to rename a branch. If it's the current
HEAD, then:
git checkout -b newname
git branch -d oldname
and if it's not, then
git branch newname oldname
git branch -d oldname
An alternative in the latter case is to just use mv on the raw
.git/refs/heads/oldname file.
* How do I check out an old version?
A very common beginning question is how to check out an old version.
Say you need to compile an old release for test purposes. "git checkout
v1.2" gives a funny error message. What's going on?
Well, "git checkout" makes the current HEAD point to the head that
you specify. And, as previously mentioned, git requires that it point
to something in the .git/refs/heads directory. So you can't do that.
If you're busy doing things in your working directory, and don't want to
overwrite your work with an old version, then you can get a snapshot with
the (old) git-tar-tree or (new) git-archive commands. These produce a
tar file (git-archive can also produce a zip file) which is a snapshot
of any version you like. You can then unpack this file in a different
directory and build it.
However, if you haven't got any edits in progress, and want to check out
the old version into your working directory, just create a temp branch!
git checkout -b temp v1.2
Will do what you want. This will also do what you want if you have a
local edit (like the "#define DEBUG 1" mentioned above) that you want
to preserve while working on the old version.
You'll see this in use if you ever use the (highly recommended) git-bisect
tool. It creates a branch called "bisect" for the duration of the bisect.
(Yes, I have to confess, I sometimes wish that git would enforce the
"HEAD must point to .git/refs/heads" rule when committing (checking in)
rather than when checking out, but that's the way git has grown up.)
Note that if you want *exactly* an old version, with no local hacks,
make sure there are none (with "git status") when doing this. It's more
convenient if you do it before the checkout, but you'll get the same
answer if you ask afterwards.
Now, what about the complex case: you have local hacks that you
want to keep, but not have polluting the old version?
Well, one way of the other, you'll have to commit it. If you don't mind
committing your changes to the current branch ("git commit -a"), do that.
If they're not ready to commit, you can commit them anyway, and back
them out when you're done:
git commit -a -m "Temp commit"
git checkout -b temp v1.2
make ; make test ; whatever
git checkout master
git branch -d temp
git reset HEAD^
This leaves both the working directory and the master head in the states
they were in at the beginning.
If you don't like committing to the master branch, you can make a new one.
In this example, it's "work in progress", a.k.a. "wip":
git checkout -b wip
git commit -a -m "Temp commit"
git checkout -b temp v1.2
make ; make test ; whatever
git checkout wip
git branch -d temp
git reset master
git checkout master # Won't change working directory
git branch -d wip
* Examining history: git-log and git-rev-list
In another example of docs being better on the first command written,
the all-purpose utility for examining history is "git log", but all of
the examples of clever ways to use it are in the git-rev-list man page.
And git-log also has most of git-diff's options.
Other utilities, notably the gitk and qgit GUIs, also use the git-rev-list
command-line options, so it's well worth learning them.
git-rev-list gives you a filtered subset of the repository history.
There are two basic ways that you can do the filtering:
1) By ancestry. You specify a set of commits to include all the
ancestors of, and another set to exclude all the ancestors of.
(For this purpose, a commit is considered an ancestor of itself.)
So if you want to see all commits between v1.1 and v1.2, you
can specify
git log ^v1.1 v1.2
or, with a more convenient syntax
git log v1.1..v1.2
However, there are times when you want to specify something more
complex. For example, if a big branch that had been in progress since
v1.0.7 was merged between v1.1 and v1.2, but you don't want to see it,
you could specify any of:
git log v1.2 ^v1.1 ^bigbranch
git log ^bigbranch v1.1..v1.2
git log ^v1.1 bigbranch..v1.2
They're all equivalent. Another special syntax that's sometimes
handy is
git log branch1...branch2
Note the three dots. This generates the symmetric difference between
the two; basically it's a diff between the commits that went into
each of them.
"git log" by default pipes its output through less(1), and generates
its output from newest to oldest on the fly, so there's no great
speed penalty to not specifying a starting place. It'll generate a
few screen fulls more than you look at, but not waste any more effort
than that.
2) By path name. This is a feature which appears to be unique to git.
If you give git-rev-list (or git-log, or gitk, or qgit) a list of
pathname prefixes, it will list only commits which touch those
paths. So "git log drivers/scsi include/scsi" will list only
commits which alters a file whose name begins with drivers/scsi
or include/scsi.
(If there's any possible ambiguity between a path name and a commit
name, git-rev-list will refuse to proceed. You can resolve it by
including "--" on the command line. Everything before that is a
commit name; everything after is a path.)
This filter is in addition to the ancestry filter. It's also rather
clever about omitting unnecessary detail. In particular, if there's
a side branch which does touch drivers/scsi, then the entire branch,
and the merge at the end, will be removed from the log.
You can additionally limit the commits to a certain number, or by date,
author, committer, and so on.
By default, "git log" only shows the commit messages, so it's important to
write good ones. Other tools compress commit messages down to
the first line, so try to make that as informative as possible.
* History diagrams
When talking about various situations involving multiple branches,
people often find it handy to draw pictures. Gitk draws nice pictures
vertically, but for e-mail, ASCII art drawn horizontally is often easier.
Commits are shown as "o", and the links between them with lines drawn with
- / and \. Time goes left to right, and heads may be labelled with names.
For example:
o--o--o <-- Branch A
/
o--o--o <-- master
\
o--o--o <-- Branch B
If someone needs to talk about a particular commit, the character "o"
may be replaced with another letter or number.
* Trivial merges: fast-forward and already up-to-date.
There are two kinds of merge that are particularly simple, and you will
encounter them in git a great deal. They are mirror images.
Suppose that you are working on branch A and merge in branch B, but no
work has been done to branch B since the last time you merged, or since
you spawned branch A from it. That is, the history looks like
o--o--o--o <-- B
\
o--o--o <-- A
or
o--o--o--o--o--o <-- B
\ \
o--o--o--o--o <-- A
If you then merge B into A, A is described as "already up to date".
It is already a strict superset of B, and the merge does nothing.
In particular, git will not create a dummy commit to record the fact that
a merge was done. It turns out that are a number of bad things that would
happen if you did this, but for now, I'll just say that git doesn't do it.
Now, the opposite scenario is the "fast-forward" merge. Suppose you
merge A into B. Again, A is a strict superset of B.
In this case, git will simply change the head B to point to the same
commit as A and say that it did a "fast-forward" merge. Again, no commit
object is created to reflect this fact.
The effect is to unclutter the git history. If I create a topic branch to
work on a feature, do some hacking, and then merge the result back into
the (untouched!) master, the history will look just like I did all the
work on the master directly. If I then delete the topic branch (because
I'm done using it), the repository state is truly indistinguishable.
While the topic branch existed, you could have done something to the
master branch, in which case the final merge would have been non-trivial,
but if that didn't happen, git produces a simple, easy-to-follow linear
history.
Some people used to heavyweight branches find this confusing; they
think a merge is a big deal and it should be memorialized, but there
are actually excellent reasons for doing this.
The most important one is that a fit of merging back and forth will
eventually end. Suppose that branches A and B are maintained by separate
developers who like to track each other's work closely.
If the fast-forward case did create a commit, then merging A into B
would produce
o--o--o--o---------o <-- B
\ /
o--o--o <-- A
then merging B into A would produce:
o--o--o--o---------o <-- B
\ / \
o--o--o---o <-- A
and further merges would produce more and more dummy commits, all without
ever reaching a steady state, and without making it obvious that the
two heads are actually identical.
Since history lasts forever, cluttering it up with unimportant stuff is a
burden to all future users, and not a good idea. Allowing the merge of a
branch to be seamless in the simple case encourages lightweight branches.
If you _might_ need a separate branch, create it. If it turned out that
you didn't, it won't make a difference.
* Exchanging work with other repositories
The basic tools for exchanging work with other repositories are "git
fetch" and "git push". The fact that "git pull" is not the opposite of
"git push" is often confusing to beginners (it's a superset of git fetch),
but that's the terminology that has grown up.
The unit of sharing in git is the branch. If you've used branches in
CVS, you'll be familiar with using "CVS update" to pull changes from your
"current branch" in the repository into your working directory.
In Git, you don't pull into the working directory, but rather into a
tracking branch. You set up a branch in your repository which will be
a copy of the branch in the remote repository. For example, if you use
"git clone", then the remote "master" branch is tracked by the local
"origin" branch.
Then, when you do a "git fetch", git fetches all of the new commits
and sets the origin head to point to the newly fetched head of the
remote branch.
By default, git checks that this is a trivial fast-forward merge, that
is not throwing away history. If it finds something like:
o--o--o--o--o--o <-- remote master
\
o <-- Local origin
It will complain and abort the fetch. This is usually a warning that
something has gone wrong - in particular, you forgot that this was
supposed to be a tracking branch and committed some work to it - and it
aborts before throwing your work away.
However, sometimes the remote git user will have a branch name that they
delete and re-create frequently. There are plenty of reasons to do this.
The most common is doing a "test merge" between various branches in
progress. They're all unfinished, so the developer of branch A doesn't
want to merge in all the new bugs in branch B, but a tester might want
to create a merged version with both sets of bugs for testing.
The merged version is not intended to be a permanent part of history -
it'll get deleted after the test - but it can still be useful to have
a draft copy.
In this case, you can mark the source branch with a leading "+", to
disable this sanity check. (See the git-fetch man page for details.)
Note that in this case, you should specifically avoid merging from such
a branch into any non-test branches of your own. It is, as mentioned,
not intended to be a permanent part of history, so don't make it part
of your permanent history. (You still might want to test-merge it with
your work in progress, of course.)
The fact that you should know to treat such branches specially is why
git doesn't try to automatically cope with them.
* Alternate branch naming
The original git scheme mixes tracking branches with all the other heads.
This requires that you remember which branches are tracking branches and
which aren't. Hopefully, you remember what all your branches are for,
but if you track a lot of remote repositories, you might not remember
what every remote branch is for and what you called it locally.
* Remotes files
You can specify what to fetch on the git-fetch command line. However,
if you intend to monitor another repository on an ongoing basis,
it's generally easier to set up a short-cut by placing the options in
.git/remotes/<name>.
The syntax is explained in the git-fetch man page. When this is st
up, "git fetch <name>" will retrieve all the branches listed in the
.git/remotes/<name> file. The ability to fetch multiple branches at
once (such as release, beta, and development) is an advantage of using
a remotes file.
You can also create the remotes file "origin" (not necessarily any
relation to the branch named "origin"), which is the default for
git-fetch. If you have a single primary "upstream" repository that
you sync to, place it in the origin remotes file, and you can just type
"git fetch" to get all the latest changes.
Note that branches to fetch are identified by "Pull: " lines in the
remotes file. This is another example of the fetch/pull confusion.
git-pull will be explained eventually.
* Remote tags
TODO: Figure out how remote tags work, under what circumstances
they are fetched, and what git does if there are conflicts.
* Exchanging work with other repositories, part II: git-push
It's simpler to set up git sharing on a pull basis. If your source
code isn't secret, you can set up a public read-only server very easily
(see the git-daemon man page for details), and have other fetch from that.
However, N developers all pulling from each other is an N^2 mess.
Some centralization helps.
One way is to have a central coordinator (like Linus) who pulls from
all of the developers, and who they in turn pull from.
The other is to have a central repository that people can push to.
This generally requires an ssh login on the server. You can use git-shell
as the login shell if all you want to allow the account to do is git
fetch and push. (You can use the hook scripts to enforce rules about
who's allowed to do what to which branch.)
Git-push to the remote machine works exactly like git-fetch from the
remote machine. The objects are moved over, and the branches pushed to
are fast-forwarded. If fast-forward is impossible, you get an error.
So if you have multiple people committing to a branch on the server,
you will not be allowed to push if someone has pushed more to that branch
since last time you fetched it.
You have to merge the changes locally, and re-try the push when you've
got a new head that includes the most recently pushed work as an ancestor.
This is exactly like "cvs commit" not working if your recent checkout
wasn't the (current) tip of the branch, but git can upload more than
one commit.
The simplest way to resolve the conflict is to merge the remote head with
your local head. This is easiest if you have different local branches
for fetching the remote repository and for pushing to it.
That is, you have one head that just tracks the master repository's
main branch, and another that you add your work to, and push from.
This makes merging simpler when there are conflicts.
Another use for git-push, even for a solo developer, is sharing your work
with the world. You can set up a public git server on a high-bandwidth
machine (possibly rented from a hosting service) and then push to it to
publish something.
* Merging (finally!)
I went through everything else first because the most common merge case
is local changes with remote changes. Not that you can't merge two
branches of your own, but you don't need to do that nearly as often.
The primitive that does the merging is called (guess what?) git-merge.
And you can use that if you want. If you want to create a so-called
octopus merge, with more than two parents, you have to.
However, it's usually easier to use the git-pull wrapper. This merges
the changes from some other branch into the current HEAD and generates
a commit message automatically.
git-merge lets you specify the commit message (rather than generating it
automatically) and use a non-HEAD destination branch, but those options
are usually more annoying than useful.
The basic git-pull syntax is
git-pull <repository> <branch>
The repository can be any URL that git supports. Including, particularly,
a local file. So to do a simple local merge, you just type
git-pull . <branch>
So after doing some hacking on branch "foo", you would
git checkout master
git pull . foo
and ba-boom, all is done.
Now, you can also specify a remote repository to merge from, using a
git://, http:// or git+ssh:// URL. This is what Linus does all day long,
and why the git-pull tool is optimized to allow that. It uses git-fetch
to fetch the remote branch without assigning it a branch name (it gets
the special name FETCH_HEAD temporarily), and them merges it into the
current HEAD directly.
There is absolutely nothing wrong with doing that, but beginners often
find it confusing to have a single short command do quite so much.
And if you are working closely with someone, it's often more convenient
and less confusing to keep local tracking branches. Then you can
git fetch upstream # Fetches 'origin'
git pull . origin
It's also possible to give just a single remotes file name to git-pull:
git pull upstream
That does a git fetch, updating all of the listed branches as usual,
then merges the _first_ listed branch into HEAD.
By the way: don't blink, you might miss it! As I mentioned, pulling is
a very big part of Linus's daily routine, and he's made sure it's fast.
(Actually, it produces a fair bit of output, so you'll see.)
Just to clarify, because people often get confused:
git-pull is a MERGING tool. It always does a merge, as well as an optional
fetch. If you just want to LOOK at a remote branch, use git-fetch.
* Undoing a merge
If you discover that a merge was a mistake, it can be undone just like
any other commit. The HEAD you merged to is the first parent, so just do
git reset --hard HEAD^
This is why Linus likes a git-pull command that does so much in one shot -
if he doesn't like what he pulls, it's easy to undo.
* How merging operates
Git uses the basic three-way merge. First, it applies it to whole files,
and then to lines within files.
To do a three-way merge, you need three versions of a file. The versions
A and B you want to merge, and a common ancestor, commonly called O.
That is, history proceeds something like:
o--o--A
/
o--o--O
\
o--B
The basic idea is "I want the file O, plus all the changes made from O
to A, plus all the changes made from O to B." Since the cases where one
of A or B is a direct ancestor of the other have already been disposed
of, the three commits must be different.
For each file, there are a few cases that are trivial, and git gets
these out of the way immediately:
- If A and B are identical, the merged result is obvious.
- If O and A are the same, then the result should be B.
- If O and B are the same, then the result should be A.
In the completely trivial case when O, A and B are the same, then
all three rules apply, they all produce the same obvious result.
The "merge base" version O is generally the most recent common ancestor
of A and B. The only problem is, that's not necessarily unique!
The classic confusing case is called a "criss-cross merge", and looks
like this:
o--b-o-o--B
/ \ /
o--o--o X
\ / \
o--a-o-o--A
There are two common ancestors of A and B, marked a and b in the graph
above. And they're not the same. You could use either one and get
reasonable results, but how to choose?
The details are too advanced for this discussion, but the default
"recursive" merge strategy that git uses solves the answer by merging
a and b into a temporary commit and using *that* as the merge base.
Of course, a and b could have the same problem, so merging them could
require another merge of still-older commits. This is why the algorithm
is called "recursive." It's been tested with pathological conditions,
but multiply nested criss-cross merges are very rare, so the recursion
isn't a performance limit in practice.
If all three of a given file in O, A, B are different, then the three
versions are pulled into the index file, called "stage 1", "stage 2",
and "stage 3", and a merge strategy driver is called to resolve the mess.
Git then uses the classic line-based three-way merge, looking for isolated
changes and applying the same rules as for files when two of the source
files are the same in some range.
* Alternate merge strategies
In every version control system prior to git, the merging algorithm was
buried deep in the bowels of the software, and very difficult to change.
One of particularly nice things that git did was allow for easily
replaceable "merge strategies". Indeed, you can try multiple merge
strategies, and the fallback - print an error message and let the user
sort it out - can be thought of as just another merge strategy.
Enabling this is why the index is so important to git. It provides a
place to store an unfinished merge, so you can try various strategies
(including hand-editing) to finish it.
Generally, git's default merge strategies are just fine. There is,
however, one special case that is occasionally useful, specified with the
"-s ours" strategy.
That strategy instructs git that the merged result should be the same
as the current HEAD. Any other branches are recorded as parents, but
their contents are ignored.
What the heck is the use of that? Well, it lets you record the fact
that some work has been done in the history, and that it shouldn't be
merged again. For example, say you write and share a popular patch set.
People are always merging it in to their local source trees. But then
you discover a much better way to achieve the goal of that patch set, and
you want to publish the fact that the new patch supersedes the old one.
If you developed the new set starting from the old one, that would happen
automatically. But another way to achieve the same goal is to merge the
old branch it in using the "ours" strategy. Everyone else's git will
notice that the patch is already included, and stop trying to merge it in.
* When merging goes wrong
This is the fun part. Git's default recursive-merge strategy is pretty
clever, but sometimes changes truly do conflict and need manual fix-up.
When git is unable to complete a merge, it leaves the three different
versions in the index and places a file with CVS-style conflict markers
in the working directory.
As long as there is a "staged" file in the index, you will not be able
to commit. You must resolve the conflict, and update the index with the
resolved versions. You can do this one at a time with git-update-index,
or at the end by giving the files as arguments to git-commit.
Doing them one at a time is probably safest; checking in a file which still
has conflict markers makes a bit of a mess. Note that git will still
use the automatically generated commit message when you finally commit.
(It's in .git/MERGE_MSG, if you care.)
Note that "git diff" knows how to be useful with a staged file.
By default, it displays a multi-way diff. For example, suppose I take a
(slightly buggy) hello.c:
--- hello.c ---
#include <stdio.h>
int main(void)
{
printf("Hello, world!");
}
--- end ---
Now, suppose that in branch A, I fix some bugs - add the missing newline
and "return 0;". In branch B, I display my angst and change it to
"Goodbye, cruel world!". When I try to merge A into B, obviously I'll
get a conflict. The resultant file, with conflict markers, looks like:
--- hello.c ---
#include <stdio.h>
int
main(void)
{
<<<<<<< HEAD/hello.c
printf("Goodbye, cruel world!");
=======
printf("Hello, world!\n");
return 0;
>>>>>>> edadc53fc7a8aef2a672a4fa9d09aa16f4e14706/hello.c
}
--- end ---
and the result of "git diff" is
diff --cc hello.c
index 4b7f550,948a5f8..0000000
--- a/hello.c
+++ b/hello.c
@@@ -3,5 -3,6 +3,10 @@@
int
main(void)
{
++<<<<<<< HEAD/hello.c
+ printf("Goodbye, cruel world!");
++=======
+ printf("Hello, world!\n");
+ return 0;
++>>>>>>> edadc53fc7a8aef2a672a4fa9d09aa16f4e14706/hello.c
}
Notice how this is not a standard diff! It has two columns of diff
symbols, and shows the difference from each of the ancestors to the
current hello.c contents. I can also use "git diff -1" to compare
against the common ancestor, or "-2" or "-3" to compare against each of
the merged copies individually.
* Alternatives to merging
The bigger and more active your source tree, the more important it is to
keep the history reasonably clean. Just because git can do a merge in
under a second doesn't mean that you should do one daily. When you look
back at a feature's development history, you'd like to see meaningful
changes recorded and not a lot of meaningless ones.
Now, once you have shared a commit with others, and they have incorporated
it into their development, it becomes impossible to undo. But git
provides tools that are useful for "rewriting history" before public
release. These can be used to edit a commit for publication.
* Test merging
One way to keep the history clean is to simply not merge other branches
into your development branch. If you want to use your new features and
other people's code changes, make a test merge and use that, but don't
make that merge part of your branch.
This is slightly more work (you have to change to a test branch and do
your merging there), but not very much.
Sometimes, when doing this, a conflict appears between your changes and
someone else's development. If you get tired of fixing the same conflict
every time you do a test merge, have a look at the git-rerere tool.
This remembers resolved conflicts and tries to apply the same resolution
patch the next time.
It's written specifically to help you not do an extra merge unnecessarily.
Although its man page is well worth reading, you never invoke git-rerere
explicitly; it's invoked automatically by the merge and patch tools if
you create a .git/rr-cache directory.
* Cherry picking
If you have a series of patches on a branch, but you want a subset
of them, or in a different order, there's a handy utility called
"git-cherry-pick" which will find the diff and apply it as a patch to
the current HEAD. It automatically recycles the commit message from
the original commit.
If the patch can't be applied, it leaves the versions in the index and
conflict markers in the working directory just like a failed merge.
And just like a merge, it remembers the commit message and provides it
as a default when I finally commit.
Note that this can only work on a chain of single-parent commits.
If a commit has multiple parents, there's no single patch to apply.
You van get the list of commits on a branch with git-log or git-rev-list,
but for more complex cases, the git-cherry tool is designed to generate
the list of commits to merge. It has a rather neat approximate-match
function built in which identifies patches that appear to already be
present in the target branch.
* Rebasing
A special case of cherry-picking is if you want to move a whole branch
to a newer "base" commit. This is done by git-rebase. You specify
the branch to move (default HEAD) and where to move it to (no default),
and git cherry-picks every patch out of that branch, applies it on top
of the target, and moves the refs/heads/<branch> pointer to the newly
created commits.
By default, "the branch" is every commit back to the last common
ancestor of the branch head and the target, but you can override that
with command-line arguments.
If you want to avoid merge conflicts due to the master code changing out
from under your edits, but not have "cleanup" merges in your history,
git-rebase is the tool to use.
Git-rebase will also use git-rerere if enabled ("mkdir .git/rr-cache").
If rebasing encounters a conflict it can't resolve, it will stop halfway
and ask you to resolve the problem by hand. However, it still knows it
has a job to finish! The unapplied patches are remembered until you do
one of
git-rebase --continue
This will check in the current index. You should
do git-update-index <files> in the conflicts that
you resolve, but NOT do an actual git-commit.
git-rebase --continue will do the commit.
git-rebase --skip
This will skip the conflicting patch. You
don't have to resolve the conflicts; git will
just back up and try the next patch in the series.
git-rebase --abort
This will abandon the whole rebase operation (including
any half-done work) and return you to where you began.
Git-rebase can also help you divide up work. Suppose you've mixed up
development of two features in the current HEAD, a branch called "dev".
You want to divide them up into "dev1" and "dev2". Assuming that HEAD
is a branch off master, then you can either look through
git log master..HEAD
or just get a raw list of the commits with
git rev-list master..HEAD
Either way, suppose you figure out a list of commits that you want in
dev1 and create that branch:
git checkout -b dev1 master
for i in `cat commit_list`; do
git-cherry-pick $i
done
You can use the other half of the list you edited to generate the dev2
branch, but if you're not sure if you forgot something, or just don't
feel like doing that manual work, then you can use git-rebase to do it
for you...
git checkout -b dev2 dev # Create dev2 branch
git-rebase --onto master dev1 # Subreact dev1 and rebase
This will find all patches that are in dev and not in dev1,
apply them on top of master, and call the result dev2.
* Experimenting with merging
To play with non-trivial merging, get an existing git repository of
a non-trivial project (git itself and the Linux kernel are readily
available. Fire up gitk to look at history, find some interesting-looking
merges, and redo them yourself on a test branch.
As long as you do everything on test branches, you aren't going to screw
anything up. So play!
You can use gitk to search for "Conflicts:" in the commit comments to
find merges that didn't go smoothly and see what happens. (Or you can
search in "git log" output. gitk just draws prettier pictures.)
You can also set up two repositories on the same machine and try pulling
and pushing between them.
To identify arbitrary commits, the 40-byte raw hex ID is probably easiest;
you can cut-and-paste them from the gitk window.
For example, in the git repository,
3f69d405d749742945afd462bff6541604ecd420
looks like an interesting merge. Its parents are
Parent: 7d55561986ffe94ca7ca22dc0a6846f698893226
Parent: 097dc3d8c32f4b85bf9701d5e1de98999ac25c1c
Let's try doing that manually:
$ git checkout -b test 7d55561986ffe94ca7ca22dc0a6846f698893226
$ git pull . 097dc3d8c32f4b85bf9701d5e1de98999ac25c1c
error: no such remote ref refs/heads/097dc3d8c32f4b85bf9701d5e1de98999ac25c1c
Fetch failure: .
Cool! I didn't know that wasn't allowed. (I'll have to ask why it's
not; perhaps it's because it uses the branch name in the automatic
commit message.) I could do it by hand with git-merge, but I'll just
give it a branch name:
$ git branch test2 097dc3d8c32f4b85bf9701d5e1de98999ac25c1c
$ git pull . test2
Merging HEAD with 097dc3d8c32f4b85bf9701d5e1de98999ac25c1c
Merging:
7d55561986ffe94ca7ca22dc0a6846f698893226 Merge branch 'jc/dirwalk-n-cache-tree' into jc/cache-tree
097dc3d8c32f4b85bf9701d5e1de98999ac25c1c Remove "tree->entries" tree-entry list from tree parser
found 2 common ancestor(s):
d9b814cc97f16daac06566a5340121c446136d22 Add builtin "git rm" command
288c0384505e6c25cc1a162242919a0485d50a74 Merge branch 'js/fetchconfig'
Merging:
d9b814cc97f16daac06566a5340121c446136d22 Add builtin "git rm" command
288c0384505e6c25cc1a162242919a0485d50a74 Merge branch 'js/fetchconfig'
found 1 common ancestor(s):
63dffdf03da65ddf1a02c3215ad15ba109189d42 Remove old "git-grep.sh" remnants
Auto-merging Makefile
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in Makefile
Auto-merging builtin.h
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in builtin.h
Auto-merging cache.h
Removing check-ref-format.c
Auto-merging git.c
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in git.c
Auto-merging read-cache.c
Auto-merging update-index.c
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in update-index.c
Renaming apply.c => builtin-apply.c
Auto-merging builtin-apply.c
Renaming read-tree.c => builtin-read-tree.c
Auto-merging builtin-read-tree.c
Auto-merging .gitignore
Auto-merging Makefile
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in Makefile
Auto-merging builtin.h
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in builtin.h
Auto-merging cache.h
Auto-merging fsck-objects.c
Removing git-format-patch.sh
Auto-merging git.c
merge: warning: conflicts during merge
CONFLICT (content): Merge conflict in git.c
Auto-merging update-index.c
Automatic merge failed; fix conflicts and then commit the result.
$ git status
Hey, look, lots of interesting stuff. Particularly, see
# Changed but not updated:
# (use git-update-index to mark for commit)
#
# unmerged: Makefile
# modified: Makefile
# unmerged: builtin.h
# modified: builtin.h
# unmerged: git.c
# modified: git.c
The "unmerged" (a.k.a. "staged") files are ones that need manual resolution.
(I notice that update-index.c isn't listed, despite being mentioned
as a conflict in the message. Can someone explain that?)
Fixing those is easy, but as you can see from the original commit comment
and diffs, there were some additional changes that were necessary to
make that compile.
You can test before committing the change, or do it the git way - commit
anyway, then test and "git commit --amend" with the fixes, of any.
Unlike a centralized VCS, committing is not the same as pushing upstream.
You can use test branches in the repository to save as much work as
you like. While it's still nice to keep the public repository clean,
you don't have to worry about "breaking the tree" every time you commit.
You can do all kinds of stuff in test branches, and clean it up later.
This is why all the git merge tools do the commit without waiting for
you to test it. The merge is usually okay, and it saves time. If not,
^ permalink raw reply
* [PATCH] gitweb: Atom feeds (was: gitweb: Make RSS feed output prettier)
From: Andreas Fuchs @ 2006-11-16 21:45 UTC (permalink / raw)
To: git
In-Reply-To: <ejdmlb$77s$1@sea.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]
Jakub Narebski wrote:
> Andreas Fuchs <asf@boinkor.net> wrote:
>
>> * Wrap the commit message in <pre>
> We use <div class="pre"> in "commit" view if I remember correctly.
That's ok for rendered HTML output, but in my experience, the way feed
readers interpret that ranges from "badly" to "not at all"; it's better
to stick to explicit structure hints only in feeds. /-:
So, this is the only thing I haven't fixed in the attached patch (:
>> * Make file names into an unordered list
> Good idea.
>
>> * Add links (diff, conditional blame, history) to the file list.
> I'd rather keep RSS output as simple as possible, no frills.
I can see that, but it would be very useful on aggregation sites like
http://planet.sbcl.org/. You mentioned on IRC that you'd prefer to
forward-port the current RSS generation to a more modern feed format
like Atom 1.0.
I took the liberty to back out that change from the RSS generator, and
implement Atom 1.0 output that is more fully featured. For testing, I
left both in. Both feeds validate at feedvalidator.org for me, the
choice is yours (:
> esc_html does to_utf8, so to_utf8 is unnecessary (and spurious).
> But it is a good catch: esc_html is certainly needed.
The attached patch doesn't use to_utf8 on already-escaped strings anymore.
> We have introduced esc_path for escaping pathnames. Use it!
I changed that, too.
> Two unnecessary calls to git command. Use
> my %difftree = parse_difftree_raw_line($line)
> instead. The conditions would probably be
> next if (!$difftree{'from_id'});
> (or equivalent).
Thanks for the hint; I included that in the Atom output and re-worked
the RSS generator to not use the hideous regexp anymore.
> esc_url, not esc_html here. Or use the href() subroutine with -full=>1
> option (after applying the patch I send which added this to href()).
Okay; I changed all occurrences of esc_html where URLs are escaped to
use esc_url.
In addition to the above points, the attached patch emits a
Last-Changed: HTTP response header field, and doesn't compute the feed
body if the HTTP request type was HEAD. This helps keep the web server
load down for well-behaved feed readers that check if the feed needs
updating.
Hope you like it,
--
Andreas Fuchs, (http://|im:asf@|mailto:asf@)boinkor.net, antifuchs
[-- Attachment #2: atom-and-last-modified.diff --]
[-- Type: text/plain, Size: 9067 bytes --]
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e54a29e..b39dc65 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -425,6 +425,7 @@ my %actions = (
"history" => \&git_history,
"log" => \&git_log,
"rss" => \&git_rss,
+ "atom" => \&git_atom,
"search" => \&git_search,
"search_help" => \&git_search_help,
"shortlog" => \&git_shortlog,
@@ -1180,7 +1181,9 @@ sub parse_date {
$days[$wday], $mday, $months[$mon], 1900+$year, $hour ,$min, $sec;
$date{'mday-time'} = sprintf "%d %s %02d:%02d",
$mday, $months[$mon], $hour ,$min;
-
+ $date{'iso-8601'} = sprintf "%04d-%02d-%02dT%02d:%02d:%02dZ",
+ 1900+$year, $mon, $mday, $hour ,$min, $sec;
+
$tz =~ m/^([+\-][0-9][0-9])([0-9][0-9])$/;
my $local = $epoch + ((int $1 + ($2/60)) * 3600);
($sec, $min, $hour, $mday, $mon, $year, $wday, $yday) = gmtime($local);
@@ -1653,6 +1656,9 @@ #provides backwards capability for those
printf('<link rel="alternate" title="%s log" '.
'href="%s" type="application/rss+xml"/>'."\n",
esc_param($project), href(action=>"rss"));
+ printf('<link rel="alternate" title="%s log" '.
+ 'href="%s" type="application/atom+xml"/>'."\n",
+ esc_param($project), href(action=>"atom"));
} else {
printf('<link rel="alternate" title="%s projects list" '.
'href="%s" type="text/plain; charset=utf-8"/>'."\n",
@@ -1724,6 +1730,8 @@ sub git_footer_html {
}
print $cgi->a({-href => href(action=>"rss"),
-class => "rss_logo"}, "RSS") . "\n";
+ print $cgi->a({-href => href(action=>"atom"),
+ -class => "rss_logo"}, "Atom") . "\n";
} else {
print $cgi->a({-href => href(project=>undef, action=>"opml"),
-class => "rss_logo"}, "OPML") . " ";
@@ -4097,14 +4105,29 @@ sub git_rss {
or die_error(undef, "Open git-rev-list failed");
my @revlist = map { chomp; $_ } <$fd>;
close $fd or die_error(undef, "Reading git-rev-list failed");
- print $cgi->header(-type => 'text/xml', -charset => 'utf-8');
+
+ my %latest_commit;
+ my %latest_date;
+ if (defined($revlist[0])) {
+ %latest_commit = parse_commit($revlist[0]);
+ %latest_date = parse_date($latest_commit{'committer_epoch'});
+ print $cgi->header(-type => 'application/atom+xml', -charset => 'utf-8',
+ -last_modified => $latest_date{'rfc2822'});
+ } else {
+ print $cgi->header(-type => 'application/atom+xml', -charset => 'utf-8');
+ }
+
+ # Optimization: skip generating the body if client asks only
+ # for Last-Modified date.
+ return if ($cgi->request_method() eq 'HEAD');
+
print <<XML;
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
-<title>$project $my_uri $my_url</title>
-<link>${\esc_html("$my_url?p=$project;a=summary")}</link>
-<description>$project log</description>
+<title>${\esc_html("$project $my_uri $my_url")}</title>
+<link>${\esc_url("$my_url?p=$project;a=summary")}</link>
+<description>${\esc_html($project)} log</description>
<language>en</language>
XML
@@ -4128,32 +4151,138 @@ XML
"</title>\n" .
"<author>" . esc_html($co{'author'}) . "</author>\n" .
"<pubDate>$cd{'rfc2822'}</pubDate>\n" .
- "<guid isPermaLink=\"true\">" . esc_html("$my_url?p=$project;a=commit;h=$commit") . "</guid>\n" .
- "<link>" . esc_html("$my_url?p=$project;a=commit;h=$commit") . "</link>\n" .
+ "<guid isPermaLink=\"true\">" . esc_url("$my_url?p=$project;a=commit;h=$commit") . "</guid>\n" .
+ "<link>" . esc_url("$my_url?p=$project;a=commit;h=$commit") . "</link>\n" .
"<description>" . esc_html($co{'title'}) . "</description>\n" .
"<content:encoded>" .
"<![CDATA[\n";
my $comment = $co{'comment'};
+ print "<pre>\n";
foreach my $line (@$comment) {
- $line = to_utf8($line);
- print "$line<br/>\n";
+ $line = esc_html($line);
+ print "$line\n";
}
- print "<br/>\n";
+ print "</pre><ul>\n";
foreach my $line (@difftree) {
- if (!($line =~ m/^:([0-7]{6}) ([0-7]{6}) ([0-9a-fA-F]{40}) ([0-9a-fA-F]{40}) (.)([0-9]{0,3})\t(.*)$/)) {
- next;
- }
- my $file = esc_path(unquote($7));
- $file = to_utf8($file);
- print "$file<br/>\n";
+ my %difftree = parse_difftree_raw_line($line);
+ next if !$difftree{'from_id'};
+
+ my $file_name = $difftree{'file'};
+ my $file = esc_path($file_name);
+
+ print "<li>$file</li>\n";
}
- print "]]>\n" .
+ print "</ul>]]>\n" .
"</content:encoded>\n" .
"</item>\n";
}
print "</channel></rss>";
}
+sub git_atom {
+ open my $fd, "-|", git_cmd(), "rev-list", "--max-count=150",
+ git_get_head_hash($project), "--"
+ or die_error(undef, "Open git-rev-list failed");
+ my @revlist = map { chomp; $_ } <$fd>;
+ close $fd or die_error(undef, "Reading git-rev-list failed");
+
+ my %latest_commit;
+ my %latest_date;
+ if (defined($revlist[0])) {
+ %latest_commit = parse_commit($revlist[0]);
+ %latest_date = parse_date($latest_commit{'committer_epoch'});
+ print $cgi->header(-type => 'application/atom+xml', -charset => 'utf-8',
+ -last_modified => $latest_date{'rfc2822'});
+ } else {
+ print $cgi->header(-type => 'application/atom+xml', -charset => 'utf-8');
+ }
+
+ # Optimization: skip generating the body if client asks only
+ # for Last-Modified date.
+ return if ($cgi->request_method() eq 'HEAD');
+
+ print <<XML;
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+<title type="html">${\esc_html("$project $my_uri $my_url")}</title>
+<link rel="alternate" type="text/html" href="${\esc_url("$my_url?p=$project;a=summary")}" />
+<link rel="self" type="application/atom+xml" href="${\esc_url("$my_url?p=$project;a=atom")}" />
+<subtitle>$project log</subtitle>
+<id>${\esc_url("$my_url?p=$project")}</id>
+XML
+ if (!defined(%latest_date)) {
+ # dummy date to keep the feed valid until commits trickle in:
+ print "<updated>1970-01-01T00:00:00Z</updated>";
+ } else {
+ print "<updated>".$latest_date{'iso-8601'}."</updated>\n";
+ }
+
+ for (my $i = 0; $i <= $#revlist; $i++) {
+ my $commit = $revlist[$i];
+ my %co = parse_commit($commit);
+ # we read 150, we always show 30 and the ones more recent than 48 hours
+ if (($i >= 20) && ((time - $co{'committer_epoch'}) > 48*60*60)) {
+ last;
+ }
+ my %cd = parse_date($co{'committer_epoch'});
+
+ open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
+ $co{'parent'}, $co{'id'}, "--"
+ or next;
+ my @difftree = map { chomp; $_ } <$fd>;
+ close $fd
+ or next;
+ print "<entry>\n" .
+ "<title type=\"html\">" .
+ esc_html($co{'author_name'}) . ": " . esc_html($co{'title'}) .
+ "</title>\n" .
+ "<updated>$cd{'iso-8601'}</updated>\n" .
+ "<author><name>" . esc_html($co{'author_name'}) . "</name></author>\n" .
+ "<published>$cd{'iso-8601'}</published>\n" .
+ "<link rel=\"alternate\" type=\"text/html\" href=\"" .
+ esc_url("$my_url?p=$project;a=commit;h=$commit") . "\" />\n" .
+ "<id>" . esc_url("$my_url?p=$project;a=commit;h=$co{'id'}") . "</id>\n" .
+ "<content type=\"html\" xml:base=\"".esc_url($my_url)."\">" .
+ "<![CDATA[\n";
+ my $comment = $co{'comment'};
+ print "<pre>\n";
+ foreach my $line (@$comment) {
+ $line = to_utf8(esc_html($line));
+ print "$line\n";
+ }
+ print "</pre><ul>\n";
+ foreach my $line (@difftree) {
+ my %difftree = parse_difftree_raw_line($line);
+ next if !$difftree{'from_id'};
+
+ my $file_name = $difftree{'file'};
+ my $file = esc_path($file_name);
+ my $parent = $co{'parent'};
+ my $hash = $difftree{'to_id'};
+ my $hashparent = $difftree{'from_id'};
+
+ print "<li>[";
+ print "<a title=\"diff\" href=\"".
+ esc_url("$my_url?p=$project;a=blobdiff;f=$file;h=$hash;hp=$hashparent;hb=$commit;hpb=$parent") .
+ "\">D</a>";
+ if (gitweb_check_feature('blame')) {
+ print "<a title=\"blame\" href=\"".
+ esc_url("$my_url?p=$project;a=blame;f=$file;hb=$commit") .
+ "\">B</a>";
+ }
+ print "<a title=\"history\" href=\"".
+ esc_url("?p=$project;a=history;f=$file;h=$commit") .
+ "\">H</a>";
+ print "] $file";
+ print "</li>\n";
+ }
+ print "</ul>]]>\n" .
+ "</content>\n" .
+ "</entry>\n";
+ }
+ print "</feed>";
+}
+
sub git_opml {
my @list = git_get_projects_list();
^ permalink raw reply related
* Re: Cleaning up git user-interface warts
From: Junio C Hamano @ 2006-11-16 21:49 UTC (permalink / raw)
To: Petr Baudis; +Cc: Carl Worth, git, Andy Whitcroft, Nicolas Pitre
In-Reply-To: <20061116051240.GV7201@pasky.or.cz>
Petr Baudis <pasky@suse.cz> writes:
> (vi) Coding issues. This is probably very subjective, but a blocker for
> me. I have no issues about C here, but about the shell part of Git.
> Well, how to say it... It's just fundamentally incompatible with me. I
> *could* do things in/with it, but it's certainly something I wouldn't
> _enjoy_ doing _at all_, on a deep level. I think the current shell code
> is really hard to read, the ancient constructs are frequently strange at
> best, etc. It's surely fine code at functional level and there'll be
> people who hate _my_ style of coding and my shell code which isn't
> perfect either, but it's just how it is with me.
I've been thinking about revamping the style of shell scripts in
git-core Porcelain-ish for some time, and I have a feeling that
now may be a good time to do so, after one feature release is
out and the list is discussing UI improvements.
But before mentioning the specifics, let me mention one tangent.
I recently installed an OpenBSD bochs (it was actually a qemu
image) without knowing much about the way of the land, and after
adjusting myself to necessary glitches (like "make" being called
"gmake" there), I saw git properly built and pass its selftest.
I was pleasantly surprised when I noticed there was no 'bash' on
the system after all that.
I would like to keep it that way.
I'll list things I would want to and not want to change.
Comments from the list are very appreciated. You can say things
in two ways:
* I guarantee that the _default_ shell on all sane platforms we
care about handle this construct correctly, although it was
not in the original Bourne. There is no reason to stay away
from it these days.
or
* You've stayed away from this construct but now you say you
feel it is Ok to use it. Don't. It would break with the
shell on my platform (or "it is a bad practice because of
such and such reasons").
I do not think many people can say the former with authority
unless you have a portability lab (the company I work for used
to be like that and it was an interesting experience to learn
all about irritating implementation differences). And "POSIX
says shell should behave that way" is _not_ what I want to hear
about.
But the latter should be a lot easier to say, and would be
appreciated because it would help us avoid regressions.
Things I would want to change:
- One indent level is one tab and the tab-width is eight
columns. Some of our scripts tend to use less than eight
spaces for indentation to avoid line wrapping.
- More use of shell functions are fine. Especially if the
above change makes lines too long, the logic should be
refactored.
- It is so 80-ish to follow certain portability and performance
wisdom. The following should go:
. Use "case" when you do not have to use "if test".
. Avoid ${parameter##word} and friends and use `expr` instead
to pick a string apart.
. Avoid "export name=word", write "name=word; export name"
instead.
. Avoid ${parameter:-word} and friends when ${parameter-word}
would do.
Things I do not want to change:
- The shell scripts should start with #!/bin/sh, not
#!/bin/bash (nor even worse "#!/usr/bin/env sh").
- Shell functions are written as "name () { ... }" without
"function" noiseword.
- 'foo && bar || exit' exits with the error code of what
failed; no need to say 'exit $?'.
- String equality check for "test" is a single =, not ==.
- Do not use locals.
- Do not use shell arrays.
- In general, if something does not behave the same way in ksh,
bash and dash, don't use it (that does not mean these three
are special; it just means if something is not even portable
across these three, it is a definite no-no).
I do not think I need to list other common-sense shell idioms in
the latter category (e.g. 'using "test z$name = zexpected" when
we do not know what $name contains' falls into that).
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Linus Torvalds @ 2006-11-16 19:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Han-Wen Nienhuys, git
In-Reply-To: <7virhf8985.fsf@assigned-by-dhcp.cox.net>
On Thu, 16 Nov 2006, Junio C Hamano wrote:
>
> As you said, pull inherently involve a merge which implies the
> existence of associated working tree, so I do not think there is
> any room for --bare to get in the picture.
Fair enough. Feel free to add the signed-off-by from me too,
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Junio C Hamano @ 2006-11-16 19:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Han-Wen Nienhuys, git
In-Reply-To: <Pine.LNX.4.64.0611161027020.3349@woody.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> On Thu, 16 Nov 2006, Linus Torvalds wrote:
>> @@ -95,6 +100,12 @@ case "$merge_head" in
>> ;;
>> esac
>>
>> +if test -z "$orig_head"
>> +then
>> + git-update-ref -m "initial pull" HEAD $merge_head "" || exit 1
>> + exit
>> +fi
>> +
>
> So this is the place that probably wants a "git-checkout" before the
> exit, otherwise you'd (illogically) have to do it by hand for that
> particular case.
>
> Of course, we should _not_ do it if the "--bare" flag has been set, so you
> migth want to tweak the exact logic here.
As you said, pull inherently involve a merge which implies the
existence of associated working tree, so I do not think there is
any room for --bare to get in the picture. We already do the
checkout when we recover from a fetch that is used incorrectly
and updated the current branch head underneath us.
To give the list a summary of the discussion so far, here is a
consolidated patch.
-- >8 --
From: Linus Torvalds <torvalds@osdl.org>
Subject: git-pull: allow pulling into an empty repository
We used to complain that we cannot merge anything we fetched
with a local branch that does not exist yet. Just treat the
case as a natural extension of fast forwarding and make the
local branch'es tip point at the same commit we just fetched.
After all an empty repository without an initial commit is an
ancestor of any commit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/git-pull.sh b/git-pull.sh
index ed04e7d..e23beb6 100755
--- a/git-pull.sh
+++ b/git-pull.sh
@@ -44,10 +44,10 @@ do
shift
done
-orig_head=$(git-rev-parse --verify HEAD) || die "Pulling into a black hole?"
+orig_head=$(git-rev-parse --verify HEAD 2>/dev/null)
git-fetch --update-head-ok --reflog-action=pull "$@" || exit 1
-curr_head=$(git-rev-parse --verify HEAD)
+curr_head=$(git-rev-parse --verify HEAD 2>/dev/null)
if test "$curr_head" != "$orig_head"
then
# The fetch involved updating the current branch.
@@ -80,6 +80,11 @@ case "$merge_head" in
exit 0
;;
?*' '?*)
+ if test -z "$orig_head"
+ then
+ echo >&2 "Cannot merge multiple branches into empty head"
+ exit 1
+ fi
var=`git-repo-config --get pull.octopus`
if test -n "$var"
then
@@ -95,6 +100,13 @@ case "$merge_head" in
;;
esac
+if test -z "$orig_head"
+then
+ git-update-ref -m "initial pull" HEAD $merge_head "" &&
+ git-read-tree --reset -u HEAD || exit 1
+ exit
+fi
+
case "$strategy_args" in
'')
strategy_args=$strategy_default_args
^ permalink raw reply related
* Re: Is cp -al safe with git?
From: Junio C Hamano @ 2006-11-16 19:19 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
In-Reply-To: <ejibnp$mmq$1@sea.gmane.org>
Johannes Sixt <johannes.sixt@telecom.at> writes:
> For one reason or another I would like to "clone" a local repo including the
> checked-out working tree with cp -al instead of cg-clone/git-clone, i.e.
> have all files hard-linked instead of copied.
>
> Can the copies be worked on independently without interference (with the git
> tool set)?
>
> One thing I noticed is that git-reset or probably git-checkout-index breaks
> links of files that need not be changed by the reset. Example:
>
> # make 2 files, commit
> $ mkdir orig && cd orig
> $ git-init-db
> defaulting to local storage area
> $ echo foo > a && cp a b && git-add a b && git-commit -a -m 1
> Committing initial tree 99b876dbe094cb7d3850f1abe12b4c5426bb63ea
>
> # 2nd commit modifies only one file:
> $ echo bar > a && git-commit -a -m 2
>
> # create the copy:
> $ cd ..
> $ cp -al orig copy
> $ cd copy
>
> # working files are hard-linked:
> $ ls -l
> total 8
> -rw-r--r-- 2 jsixt users 4 Nov 16 19:24 a
> -rw-r--r-- 2 jsixt users 4 Nov 16 19:23 b
>
> # nuke a commit:
> $ git-reset --hard HEAD^
> $ ls -l
> total 8
> -rw-r--r-- 1 jsixt users 4 Nov 16 19:24 a
> -rw-r--r-- 1 jsixt users 4 Nov 16 19:24 b
>
> I'd have expected that the hard-link of b remained and only a's link were
> broken. Does it mean that git-reset writes every single file also for large
> trees like the kernel? I cannot believe this. Can someone scratch the
> tomatoes off my eyes please?
Most likely you didn't run "update-index --refresh" after "cp -l"?
Not just in the new copied repository but in the original
repository I would suspect you would see this. This is because
the index caches ctime and making a new hardlink manipulates the
files' inodes, thus making the cached information stale.
^ permalink raw reply
* Re: Is cp -al safe with git?
From: Linus Torvalds @ 2006-11-16 19:19 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
In-Reply-To: <ejibnp$mmq$1@sea.gmane.org>
On Thu, 16 Nov 2006, Johannes Sixt wrote:
>
> For one reason or another I would like to "clone" a local repo including the
> checked-out working tree with cp -al instead of cg-clone/git-clone, i.e.
> have all files hard-linked instead of copied.
It works, but I don't think you should depend on it.
> Can the copies be worked on independently without interference (with the git
> tool set)?
We _tried_ to make sure it is ok, but since it's not a normal mode of
operation, I would not guarantee it.
> One thing I noticed is that git-reset or probably git-checkout-index breaks
> links of files that need not be changed by the reset.
Yes and no. They do _not_ actually break links of files that they know
stay the same, but your example breaks the internal knowledge by using
that "cp -al". That changes the modification time of the inodes, so git
thinks that the files _may_ have changed, and when you do a "git reset",
it will overwrite them all.
> Example:
>
> # make 2 files, commit
> $ mkdir orig && cd orig
> $ git-init-db
> defaulting to local storage area
> $ echo foo > a && cp a b && git-add a b && git-commit -a -m 1
> Committing initial tree 99b876dbe094cb7d3850f1abe12b4c5426bb63ea
>
> # 2nd commit modifies only one file:
> $ echo bar > a && git-commit -a -m 2
>
> # create the copy:
> $ cd ..
> $ cp -al orig copy
> $ cd copy
>
> # working files are hard-linked:
> $ ls -l
> total 8
> -rw-r--r-- 2 jsixt users 4 Nov 16 19:24 a
> -rw-r--r-- 2 jsixt users 4 Nov 16 19:23 b
>
> # nuke a commit:
> $ git-reset --hard HEAD^
> $ ls -l
> total 8
> -rw-r--r-- 1 jsixt users 4 Nov 16 19:24 a
> -rw-r--r-- 1 jsixt users 4 Nov 16 19:24 b
>
> I'd have expected that the hard-link of b remained and only a's link were
> broken. Does it mean that git-reset writes every single file also for large
> trees like the kernel? I cannot believe this. Can someone scratch the
> tomatoes off my eyes please?
If you do a
git update-index --refresh
(or, more easily, a "git status", which will do the index refresh for you)
before you do the "git reset", you will get:
$ ls -l
total 8
-rw-r--r-- 1 jsixt users 4 Nov 16 19:24 a
-rw-r--r-- 2 jsixt users 4 Nov 16 19:24 b
like you want to. The reason "git reset" overwrites _both_ files in your
example is that the stat() information for those files changed, so "git
reset" thinks they are both dirty and both need to be rewritten.
That said, I would seriously suggest that you try these things out, and
realize that most people do _not_ use the hardlinked approach. For all I
know, some piece of git might change some files in-place. I don't _think_
we do, and it would strictly speaking be a bug, but because people don't
use it that way, you'd be the guinea pig.
I think we'll happily fix any bugs you find, but that may not make you any
happier if the bug corrupted your lifes work ;)
In general, you might want to use
git clone -l -s
instead, but that will _not_ hardlink the actual checked-out contents, so
it's not going to get the kind of sharing you look for. On the other hand,
especially with good maintenance (doing "git repack -l -d -a" etc), you
may end up sharing _more_ that way at least in the repository object
database (but never in the actual checked-out directories).
^ permalink raw reply
* Re: multi-project repos (was Re: Cleaning up git user-interface warts)
From: Linus Torvalds @ 2006-11-16 19:01 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611160958170.3349@woody.osdl.org>
On Thu, 16 Nov 2006, Linus Torvalds wrote:
>
> A "fetch" by default won't actually generate a local branch unless you
> told it to. It just squirrels the end result into the magic FETCH_HEAD
> name [...]
Btw, the magic heads are probably not all that well documented. They do
come up in the man-pages, but I don't think there is any central place
talking about them. We have:
- "HEAD" itself, which is obviously the default pointer for a lot of
operations, and that specifies the current branch (ie it should
currently always be a symref, although we've talked about relaxing
that)
- "ORIG_HEAD" is very useful indeed, and it's the head _before_ a merge
(or some other operations, like "git rebase" and "git reset": think of
it as a "original head before we did some uncontrolled operation
where we otherwise can't use HEAD^ or similar")
I use "gitk ORIG_HEAD.." a lot, and if I don't like something I see
when I do it, I end up doing "git reset --hard ORIG_HEAD" to undo a
pull I've done. This is important exactly because ORIG_HEAD is _not_
the same as the first parent of a merge, since a merge could have been
just a fast-forward.
- "FETCH_HEAD" as mentioned. Normally you'd only use this in scripting, I
suspect, but it's potentially useful if you prefer to do a fetch first
and then check out it (perhaps cherry-picking stuff instead of merging,
for example).
So you could do (for example)
git fetch some-other-repo branch
gitk ..FETCH_HEAD
git cherry-pick <some-particular-commit-you-picked>
- "MERGE_HEAD" is kind of the opposite of "ORIG_HEAD" when you're in
the middle of a merge: it's the "other" branch that you're merging.
It's mainly useful for merge resolution, ie
git log -p HEAD...MERGE_HEAD -- some/file/with/conflicts
is a great way to see what happened along both branches (note the
_triple_ dot: it's a symmetric difference), to see _why_ the confict
happened.
Most of the above are used implicitly in various cases, not just HEAD. The
"--merge" flag to git-rev-list (and thus git log and friends) is just
shorthand for the above "HEAD...MERGE_HEAD" behaviour (with the addition
of also limiting the result to just conflicting files), so
git log -p --merge
is basically exactly the same as the above (except for _all_ files that
have conflicts in them rather than just one hand-specified one).
Anyway, maybe somebody didn't know about these, and finds them useful.
Normally, the only one you would _really_ use is "ORIG_HEAD" (which is
described in several of the tutorials and examples, so people hopefully
already know about it). Most of the others tend to mostly be used
implicitly, not by explicitly naming them - although you _can_.
^ permalink raw reply
* Is cp -al safe with git?
From: Johannes Sixt @ 2006-11-16 18:47 UTC (permalink / raw)
To: git
For one reason or another I would like to "clone" a local repo including the
checked-out working tree with cp -al instead of cg-clone/git-clone, i.e.
have all files hard-linked instead of copied.
Can the copies be worked on independently without interference (with the git
tool set)?
One thing I noticed is that git-reset or probably git-checkout-index breaks
links of files that need not be changed by the reset. Example:
# make 2 files, commit
$ mkdir orig && cd orig
$ git-init-db
defaulting to local storage area
$ echo foo > a && cp a b && git-add a b && git-commit -a -m 1
Committing initial tree 99b876dbe094cb7d3850f1abe12b4c5426bb63ea
# 2nd commit modifies only one file:
$ echo bar > a && git-commit -a -m 2
# create the copy:
$ cd ..
$ cp -al orig copy
$ cd copy
# working files are hard-linked:
$ ls -l
total 8
-rw-r--r-- 2 jsixt users 4 Nov 16 19:24 a
-rw-r--r-- 2 jsixt users 4 Nov 16 19:23 b
# nuke a commit:
$ git-reset --hard HEAD^
$ ls -l
total 8
-rw-r--r-- 1 jsixt users 4 Nov 16 19:24 a
-rw-r--r-- 1 jsixt users 4 Nov 16 19:24 b
I'd have expected that the hard-link of b remained and only a's link were
broken. Does it mean that git-reset writes every single file also for large
trees like the kernel? I cannot believe this. Can someone scratch the
tomatoes off my eyes please?
-- Hannes
^ permalink raw reply
* Re: git-PS1 bash prompt setting
From: Junio C Hamano @ 2006-11-16 18:35 UTC (permalink / raw)
To: Sean; +Cc: git
In-Reply-To: <BAYC1-PASMTP037FDA6C6465F0541AC613AEE90@CEZ.ICE>
Sean <seanlkml@sympatico.ca> writes:
> Ted mentioned in the wart thread that having multiple branches per repo
> means that the standard bash prompt isn't as much help as it could be.
Yes, I think this is common issue for everybody not just people
who worked with BK or mercurial. I find myself almost typing
"pwd" to find out what branch I am on (I do not go as far as
typing "git cd" to switch branches, though).
^ permalink raw reply
* Re: multi-project repos
From: Junio C Hamano @ 2006-11-16 18:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <Pine.LNX.4.64.0611160958170.3349@woody.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> Yeah, you're supposed to "init-db" and "push". Right now, that tends to
> unpack everything (which is bad), although that is hopefully getting fixed
> (ie the receiving end shouldn't unpack any more if it is recent. Junio?)
Correct.
> See?
>
> Linus
Saw.
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Linus Torvalds @ 2006-11-16 18:28 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0611160932340.3349@woody.osdl.org>
On Thu, 16 Nov 2006, Linus Torvalds wrote:
> @@ -95,6 +100,12 @@ case "$merge_head" in
> ;;
> esac
>
> +if test -z "$orig_head"
> +then
> + git-update-ref -m "initial pull" HEAD $merge_head "" || exit 1
> + exit
> +fi
> +
So this is the place that probably wants a "git-checkout" before the
exit, otherwise you'd (illogically) have to do it by hand for that
particular case.
Of course, we should _not_ do it if the "--bare" flag has been set, so you
migth want to tweak the exact logic here.
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Junio C Hamano @ 2006-11-16 18:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0611160932340.3349@woody.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> On Thu, 16 Nov 2006, Linus Torvalds wrote:
>>
>> (And the real reason for that is simple: "git pull" simply wants to have
>> something to _start_ with. It's not hugely fundamental, it's just how it
>> was written).
>
> Here's a very lightly tested patch that allows you to use "git pull" to
> populate an empty repository.
>
> I'm not at all sure this is necessarily the nicest way to do it, but it's
> fairly straightforward.
>
> Junio, what do you think?
Yeah, I talked about making "merge" treat missing HEAD as a
special case of fast forward, but I like yours better. It is a
lot cleaner and to the point.
^ permalink raw reply
* Re: Cleaning up git user-interface warts
From: Carl Worth @ 2006-11-16 18:23 UTC (permalink / raw)
To: Michael K. Edwards; +Cc: git
In-Reply-To: <f2b55d220611160957s2e68059dk99bbe902e7e1f416@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]
On Thu, 16 Nov 2006 09:57:00 -0800, "Michael K. Edwards" wrote:
> What do you want all of those branches for? They haven't been
> published to you (that's a human interaction that doesn't go through
> git), so for all you know they're just upstream experiments, and doing
> things with them is probably shooting yourself in the foot.
The same "what do you want them all for" question could be asked of
git-clone which also fetches all available branches. I really just
want to be able to easily watch what's going on in multiple
repositories.
I want to be able to just say "git update" (or whatever) and then be
able to list and browse and explore the stuff locally.
Yes, there's still outside communication that's necessary, but with
the ability to easily track all the remote branches that communication
can be even less formal if I can easily browse and explore things
locally. For example, I might not even know the name of the branch:
Me: Have you pushed a branch for your new work on the frob-widget?
Friend: Yes
And then I can "get fetch" and see "cool-new-frob" come in without
having to be told that name. Or I could have even just fetched
without the specific communication if I was already expecting it for
some reason.
-Carl
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox