Git development
 help / color / mirror / Atom feed
* Re: Complete http-pull; where should it go?
From: Linus Torvalds @ 2005-05-01 20:46 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Petr Baudis
In-Reply-To: <Pine.LNX.4.21.0505011544120.30848-100000@iabervon.org>



On Sun, 1 May 2005, Daniel Barkalow wrote:
> 
> Right; what I want to make programs able to do is take explicit
> references, instead of only taking the objects they reference. So you
> say heads/master or heads/linus instead of
> "198b0fb635ed8a007bac0c16eab112c5e2c7995c".

That's fine. 

This is really just an issue of havign a function that does "get_sha1()", 
and then making the things that take command line arguments use that 
one instead of "get_sha1_hex()".

Then you can have rules like:
 - if it's a hex number, take it
 - if it's a filename,  look it up
 - if ".git/refs + str is a filename, look it up.

Something like

	int get_sha1(char *str, unsigned char *result)
	{
		static char pathname[PATH_MAX];

		if (get_sha1_hex(str, result) == 0)
			return 0;
		if (get_sha1_file(str, result) == 0)
			return 0;
		snprintf(pathname, sizeof(pathname), ".git/refs/%s", str);
		if (get_sha1_file(pathname, result) == 0)
			return 0;
		...
	}

where you have

	int get_sha1_file(char *path, unsigned char *result)
	{
		char buffer[60];
		int fd = open(path, O_RDONLY);
		int len;

		if (fd < 0)
			return -1;
		len = read(fd, buffer, sizeof(buffer));
		close(fd);
		if (len < 40)
			return -1;
		return get_sha1_hex(buffer, result);
	}
			
or whatever.

The _only_ thing I want to be careful about is that all the _internal_
stuff still have to use the strict "get_sha1_hex()" function, ie we should
never _ever_ accept a tree object where the "sha1" ends up being anything
but the hex thing. So this "generalized get_sha1()" would have to be used 
only on real user input (ie argv[] array and the like).

		Linus

^ permalink raw reply

* Re: Should git-prune-script warn about dircache?
From: Junio C Hamano @ 2005-05-01 20:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vll6yyiax.fsf@assigned-by-dhcp.cox.net>

>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> How about making git-prune-script first run "git-update-cache --refresh",
LT> and checking the return value of it (this, of course, assumes that
LT> git-update-cache --refresh would return non-zero if it can't refresh a
LT> file, which is currently not true, but should be easily fixable).

JCH> Or just check if it sees anything in the output, especially
JCH> "needs update" line.

Well, we were both wrong.  The problem is not about the work
tree changes since the last git-update-cache, but about the
blobs recorded in the cache but still not committed.

I think we should do something like this.

    git-ls-files --cached | "sed to SHA1 only" | sort >,,1
    git-fsck-cache --unreachable | "sed to SHA1 only" | sort >,,2
    comm -13 ,,1 ,,2 | "sed to .git/object/ path" | xargs -r rm -f


^ permalink raw reply

* Re: Quick command reference
From: David Greaves @ 2005-05-01 20:31 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: omb, Paul Mackerras, git, Linus Torvalds, Petr Baudis
In-Reply-To: <427537C6.9070806@zytor.com>

H. Peter Anvin wrote:

> Brian O'Mahoney wrote:
>
>> Thank you both for taking the time and trouble to do this, particularly
>> with the name changes and new options; why don't you merge your efforts
>> and produce a GIT-Mini-HOWTO BTW send it off as a patch again!
>
>
> Even better... man page(s)!

Of course.

Eventually.

But I'd probably get limited reviewers if I posted nroff to the list...
(probably more than if I included html in my emails though)

:)

Nice to see there's a lot of support for and interest in getting the
docs though.
[Hmm, making statements like that with Linus on the cc - I could be in
politics ;) ]

David


^ permalink raw reply

* Re: Complete http-pull; where should it go?
From: Daniel Barkalow @ 2005-05-01 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Petr Baudis
In-Reply-To: <Pine.LNX.4.58.0505011237410.2296@ppc970.osdl.org>

On Sun, 1 May 2005, Linus Torvalds wrote:

> For example, if I want to know what objects I have in my object directory 
> that are needed for a release, I want to be able to tell fsck to list the 
> objects that are extraneous for that release _regardless_ of the fact that 
> I may have .git/refs/*/* files that point to other things.
>
> So if fsck-cache automatically looks up references in .git/refs/ like in
> one of your earlier patches, then instead of adding value to the program,
> you actually _remove_ value from it by making it less flexible, and
> enforcing a world-view that is not necessarily the only view.

It's true that you might not want to include all of the refs; but doesn't
it make more sense to support the standard arrangement of refs (i.e.,
they're in .git/refs/kind/name) for the ones you want to include, rather
than having to pull out the hex to pass in yourself?

> This is why I want the true _plumbing_ to not care about these things, and 
> if you include references to trees, you _list_ them explicitly. 

Right; what I want to make programs able to do is take explicit
references, instead of only taking the objects they reference. So you say
heads/master or heads/linus instead of
"198b0fb635ed8a007bac0c16eab112c5e2c7995c".

The part that makes this important is that the user may be trying to look
up a reference on a remote machine using the same connection that the
objects will come over, and this is impractical without having the program
know how to handle reference files.

> And if you want to have a command that takes implied references, then just 
> make a script that does that for you, rather than making the core plumbing 
> understand it.

Agreed; which references to use are up to either the power user or the
script, not the core. I'm just interested in having a core implementation
for using them when specified.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: [PATCH] Really fix git-merge-one-file-script this time.
From: Linus Torvalds @ 2005-05-01 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vzmveu6zs.fsf@assigned-by-dhcp.cox.net>



On Sun, 1 May 2005, Junio C Hamano wrote:
> 
> Linus, have you decided to like or dislike the behaviour of
> git-merge-one-file-script touching the work tree in some cases
> but not in other cases?  A straightforward merge implementation
> that does a "git-read-tree -m" followed by a "git-merge-cache
> git-merge-one-file-script" does the following to your work tree
> and the cache:
> 
>  - Paths merged unsuccessfully makes git-merge-cache phase fail
>    and the work tree is not affected for such paths.
> 
>  - Paths merged "git-read-tree -m" trivially does not change the
>    work tree and "git-read-tree -m" result is kept in the cache.
> 
>  - Paths merged by "merge" successfully, and paths chosen from a
>    single side by "git-merge-one-file-script" change the work
>    tree, possibly checking out the file if you started out from
>    an empty work tree.
>  
> I am not worried about the first case where you will have to
> manually examine and resolve anyway. I am wondering if the rest
> is the desired behavior for _your_ way of using the GIT merge.
> After a successful merge, what kind of verification would you
> typically do?

I don't care about the _successful_ merge, since a successful merge is 
basically always followed by a "git-checkout-cache -f -a" anyway (and 
update-cache + remove now-stale files etc).

So let's totally ignore the case of "the tree was up-to-date before, and 
the merge is successful". It's not an interesting case.

No, the reason I'd prefer to be consistent is for the _strange_ cases, 
where the merge fails. There's two of those:
 - we had local modifications that weren't checked in
 - we had a real conflict that wasn't automatically merged.

and in both of these cases we end up having to fix things up, and I
generally think that we're better off if we do _not_ update the working
tree.

In particular, the "local modifications" case is much nicer to handle if
we can just do the merge totally (and successfully) in the index, and then
handle the "local modifications" as a failure case of "git-checkout-cache"
instead.

In particular, I think the "apply the patch forward" (that cogito does) is
as wrong with the "local modifications" as it is for the merge itself, and
that a truly good merge would actually have _another_ three-way merge on
the working file - the "original" is the version in our old HEAD branch,
with the two branches being merged are "working copy before the merge" and
"merge results".

Notice? See how this _nice_ handling of the local modifications actually
meant that our merge itself should never have touched the working tree
file. We'd actually commit the merge, and then do the "checkout-cache -f
-a", adn leave the dirty files with the result of being merged with the
new (which may, of course, have a merge clash: the user sees that very
clearly from the output of "git-diff-cache").

The other case is the "real conflict" case, and that's the case where I
again don't like modifying the working tree, because I think it's a
perfectly natural thing to do to say "ok, the merge didn't work out this
way", so let's not do it at all. Again, that means that the working tree
should not have been modified, and we should _not_ have written out the
conflict file to the same file that was conflicting. We'd be much better
off if we left _all_ checked-out files in the original state instead.

So my personal preference is still that if we actually have a real 
conflict, we don't actually "consummate" the merge at all, and that very 
much means that we don't write out some partially merged state. We'd leave 
the working directory alone, and now we can fairly easily create a MERGE 
directory which has it's .git file as a symlink to ../.git, and which 
contains all the files that had conflicts in them.

Then, if you decide to not go forwared with the merge, just doing

	read-tree $(cat .git/HEAD)
	rm -rf MERGE

does exactly that. Boom, it's gone.

See? THAT is good behaviour, I think.

> I am wondering if the following changes would make sense and
> make things easier for you:
> 
>  * git-merge-one-file-script is changed to register the path
>    with --cacheinfo using magic SHA1 0{40} instead of using the
>    resulting file on the filesystem.

This sounds fine.

>					  Do keep the current
>    behaviour of leaving the merge results of trivial merges
>    (both kind) in the work tree.

I'd actually prefer not to. Exactly because it fails _both_ the "dirty
files" case _and_ the "merge didn't complete" case.

But if the "magic SHA1" meant that we look for it in a special merge 
directory, that would work.

>  * git-write-tree is changed to refuse to write from a cache
>    that records the magic SHA1.
> 
>  * git-ls-files acquires a new option --merged to notice the
>    magic SHA1 and shows the paths that have such SHA1.
> 
>  * git-update-cache acquires a new option --resolve to notice
>    the magic SHA1 and:
> 
>    - if the named path is not in the work tree anymore, delete
>      the entry.
> 
>    - if the named path exists in the work tree, compute the
>      latest SHA1 for that file and update the entry.

Sounds sane.

On the other hand, I think it would actually be easier to just make your 
"magic SHA1" be just another "stage".

		Linus

^ permalink raw reply

* Re: Quick command reference
From: H. Peter Anvin @ 2005-05-01 20:10 UTC (permalink / raw)
  To: omb; +Cc: David Greaves, Paul Mackerras, git, Linus Torvalds, Petr Baudis
In-Reply-To: <4274F373.6030001@khandalf.com>

Brian O'Mahoney wrote:
> Thank you both for taking the time and trouble to do this, particularly
> with the name changes and new options; why don't you merge your efforts
> and produce a GIT-Mini-HOWTO BTW send it off as a patch again!

Even better... man page(s)!

	-hpa

^ permalink raw reply

* graphing commit trees
From: Lennert Buytenhek @ 2005-05-01 20:01 UTC (permalink / raw)
  To: git

Hi!

As a 5-minute hack to see how easy it'd be to create trees from commit
objects: the attached perl script creates a file suitable for feeding
to dot/dotty from the graphviz suite.  Example graph at:

	http://www.liacs.nl/~buytenh/graph_42d4dc3f4e1ec1396371aac89d0dccfdd977191b.png

Warning: big image (2746x41363), many apps can't display it properly.
(mozilla gives an error, eog shows only the top 32768 pixel rows, gimp
seems to work.)


--L


#!/usr/bin/perl

my %processed;

sub traverse {
        my $commit = shift;
        my $parent;
        my @parents;

        return if (defined $processed{$commit});
        $processed{$commit} = "";

        @parents = split(" ", `git-cat-file commit $commit | grep "^parent " | awk '{print \$2}'`);

        foreach $parent (@parents) {
                print "\"$parent\" -> \"$commit\"\n";
                traverse($parent);
        }
}

sub mk_graph {
        my $root = shift;

        print "digraph blah_$root {\n";
        traverse($root);
        print "}\n";
}

$root = `cat .git/HEAD`;
chomp $root;

mk_graph($root);



^ permalink raw reply

* Re: Complete http-pull; where should it go?
From: Linus Torvalds @ 2005-05-01 19:44 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Petr Baudis
In-Reply-To: <Pine.LNX.4.21.0505011508270.30848-100000@iabervon.org>



On Sun, 1 May 2005, Daniel Barkalow wrote:
>
> My question is, where does this belong? It's based on adding to the core
> as it has been the knowledge that .git/refs/*/* consists of hex-format
> hash files, both locally and on remote servers.

So the main reason I _don't_ like programs that automatically look up the 
refs etc is that its' often simply WRONG.

For example, if I want to know what objects I have in my object directory 
that are needed for a release, I want to be able to tell fsck to list the 
objects that are extraneous for that release _regardless_ of the fact that 
I may have .git/refs/*/* files that point to other things.

So if fsck-cache automatically looks up references in .git/refs/ like in
one of your earlier patches, then instead of adding value to the program,
you actually _remove_ value from it by making it less flexible, and
enforcing a world-view that is not necessarily the only view.

This is why I want the true _plumbing_ to not care about these things, and 
if you include references to trees, you _list_ them explicitly. 

And if you want to have a command that takes implied references, then just 
make a script that does that for you, rather than making the core plumbing 
understand it.

This is a classic "ease of use" vs "power-user" issue. I'm very
fundamentally of the opinion that power-users are good, and that ease of
use is done by having scripts that turn normal ops into "power user"  
operations.

That's the unix way, really.

		Linus

^ permalink raw reply

* Complete http-pull; where should it go?
From: Daniel Barkalow @ 2005-05-01 19:29 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Petr Baudis

I've been working on http-pull, and I've made it able to download the
target commit from ...git/refs/<dir>/<name> (instead of making you figure
it out yourself), and also write the target it looks up to your local
.git/refs/<w-d>/<w-n> (which doesn't have to be at all related to the
source one. In fact, I just got the latest Linus tree with:

git-http-pull -t -w heads/linus heads/master \
  http://www.kernel.org/pub/scm/git/git.git/
git-read-tree $(cat .git/refs/heads/linus)
git-checkout-cache -a
git-update-cache --refresh

(and I didn't get any of the history, although I could have if I wanted
to; and I could get it now if I decided I needed it).

My question is, where does this belong? It's based on adding to the core
as it has been the knowledge that .git/refs/*/* consists of hex-format
hash files, both locally and on remote servers. I think this level of
information belongs in the plumbing; at least, if people are to be able to
use different git-based systems to access the same repositories, they have
to agree. And there seems to be that much agreement, and so it makes sense
to make it part of the core.

(For that matter, people seem to agree that refs/heads/ has heads, and
refs/heads/master is the one you want to pull if you don't know
otherwise; I didn't include this information at all)

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: Quick command reference
From: Junio C Hamano @ 2005-05-01 19:27 UTC (permalink / raw)
  To: David Greaves; +Cc: git
In-Reply-To: <42750D06.70004@dgreaves.com>

>>>>> "DG" == David Greaves <david@dgreaves.com> writes:

After I sent my endorsements, I noticed some nits so I pick them
here.  As you already know, you would also need to adjust to the
big git-* renaming.

DG> 	[Eventually may be replaced with <tree> if <tree> means
DG> 	<tree/commit> in all commands]

Probably not.  I think commit-tree should insist on its first
parameter being a tree not a commit for example, so I would drop
this comment.

Also tags are included in tree/commit class for some but
probably not all commands these days.  How about coming up with
a short-and-sweet name like <tree-id> and use it instead of
<tree/commit>?  You would need <commit-id> as well because tags
can be auto-dereferenced to commits by certain commands.

DG> <type>
DG> 	Indicates that an object type is required.
DG> 	Currently one of: blob/tree/commit

That's:

	Currently one of: blob/commit/tag/tree

DG> <file>
DG> 	Indicates a filename - often includes leading path
DG> <path>
DG> 	Indicates the path of a file (is this ever useful?)

I do not know what you wanted to distinguish by having separate
<file> and <path>.  There is only one thing.

We may want to mention that Core GIT expects the commands to run
from the directory that corresponds to the root level of the
tree structure GIT_INDEX_FILE describes, and the path/file
(whichever name you pick) are expected to be relative to that
directory.  No absolute paths, no ./relative paths with leading
dot-slash.

DG> ################################################################
DG> cat-file
DG> 	cat-file (-t | <type>) <object>
DG> ...
DG> <type>
DG> 	One of: blob/tree/commit
DG> ...
DG> Output
DG> If -t is specified, one of:
DG>         blob/tree/commit

Let's not list the type but refer the reader to the top part of
the document that lists the type.

DG> ################################################################
DG> checkout-cache
DG> ... Note that the file contents are
DG> restored - NOT the file permissions.
DG> ??? l 58 checkout-cache.c says restore executable bit.

So which is correct?

DG> ################################################################
DG> diff-tree-helper
DG> 	diff-tree-helper [-z]

Update:
	diff-tree-helper [-z] [-R]

Add:
        -R	generate the patch in reverse.

DG> ################################################################
DG> fsck-cache
DG> 	fsck-cache [[--unreachable] <commit>*]

--root?

DG> ################################################################
DG> show-diff
DG> 	show-diff [-p] [-q] [-s] [-z] [paths...]

After big git-* rename this became git-diff-files; just to keep
an eye on when you do the updates.

DG> ################################################################
DG> show-files

And this one is now git-ls-files.

DG> ################################################################
DG> unpack-file
DG> 	unpack-file <blob>

Add:

    Note that the temporary file is created with mkstemp(3) and it
    would have permission 0600 or 0666 depending on your glibc
    version.  Make sure to fix the permission if you use this in
    your script.

DG> ################################################################
DG> Generating patches

Please drop the following part.  GIT_DIFF_CMD is not supported
anymore:

DG>    The first part of the above command-line can be customized via
DG>    the environment variable GIT_DIFF_CMD...
DG>    ... 
DG>    Caution:  Do not use more than two '%s' in GIT_DIFF_CMD.

Drop it also from "git Environment Variables" section.


^ permalink raw reply

* not really a [PATCH] Make git-apply-patch-script
From: Junio C Hamano @ 2005-05-01 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Because I did not have you pull from my repo (because I do not
have a publically accessible rsync repo), but sent the patch in
the dif/patch form, git-apply-patch-script ended up missing its
executable bit.  Here is a non-patch ;-).

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

    chmod +x git-apply-patch.script



^ permalink raw reply

* Re: Quick command reference
From: Junio C Hamano @ 2005-05-01 18:51 UTC (permalink / raw)
  To: Linus Torvalds, David Greaves; +Cc: git
In-Reply-To: <42750D06.70004@dgreaves.com>

I suspect by now my endorsement would count at least a bit, so ...

DG> Please commit the version below (into a Documentation dir)
DG> and then all I have to do is send you patches to commit and
DG> you don't have to put too much effort into keeping it
DG> updated. It's 'only' docs - so although I may get it wrong,
DG> people like Junio will be sure to correct me.

Linus, I'm with David on this one.  I haven't reviewed his stuff
in this latest incarnation but I did review the draft a round
before and I felt it was accurate and ready for public (meaning
Porcelain layer writers and brave end users) consumption.

DG> I chose text to start since it could easily be read on the
DG> mailing list.  I'll gladly put it into some kind of markup
DG> later when the features start to stabilise.  For now people
DG> can wrap it in <pre> tags.

Linus, I'm with David on this one.  Text is Good.


^ permalink raw reply

* Re: [PATCH] Really fix git-merge-one-file-script this time.
From: Junio C Hamano @ 2005-05-01 18:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vd5sbz436.fsf@assigned-by-dhcp.cox.net>

Following up to my own message...

Linus, have you decided to like or dislike the behaviour of
git-merge-one-file-script touching the work tree in some cases
but not in other cases?  A straightforward merge implementation
that does a "git-read-tree -m" followed by a "git-merge-cache
git-merge-one-file-script" does the following to your work tree
and the cache:

 - Paths merged unsuccessfully makes git-merge-cache phase fail
   and the work tree is not affected for such paths.

 - Paths merged "git-read-tree -m" trivially does not change the
   work tree and "git-read-tree -m" result is kept in the cache.

 - Paths merged by "merge" successfully, and paths chosen from a
   single side by "git-merge-one-file-script" change the work
   tree, possibly checking out the file if you started out from
   an empty work tree.
 
I am not worried about the first case where you will have to
manually examine and resolve anyway. I am wondering if the rest
is the desired behavior for _your_ way of using the GIT merge.
After a successful merge, what kind of verification would you
typically do?

First of all, would you usually do the merge in an empty work
tree, or in a populated work tree?  Secondly, would you care
about the distinction between "git-read-tree -m" trivial merges
and "merge" trivial merges when reviewing the result?

If you work in an empty work tree, and never review the merge
result while in that tree, then not touching the work tree in
git-merge-one-file-script at all may be desirable, especially if
you really want to keep things only in the cache.  On the other
hand, if you do review there, leaving the merge result in the
work tree is desirable.  Especially, if you want to verify the
resulting files that are "merge" trivial but not are
"git-read-tree -m" trivial, the files you see in the work tree
are the only ones you need to check.

If you do your merge in a populated work tree, and assuming your
starting work tree matches one of the commits being merged
[*1*], it becomes harder to review the changes to the "merge"
trivial but not "git-read-tree -m" trivial files.  The cache
does not tell you which ones are which with the current
implementation of "git-merge-one-file-script".  "git-diff-cache"
against the tree before the merge would report all merges,
including "git-read-tree -m" trivial ones, so you end up needing
to save the output from git-merge-one-file-script and decide
which paths to check.

I am wondering if the following changes would make sense and
make things easier for you:

 * git-merge-one-file-script is changed to register the path
   with --cacheinfo using magic SHA1 0{40} instead of using the
   resulting file on the filesystem.  Do keep the current
   behaviour of leaving the merge results of trivial merges
   (both kind) in the work tree.

 * git-write-tree is changed to refuse to write from a cache
   that records the magic SHA1.

 * git-ls-files acquires a new option --merged to notice the
   magic SHA1 and shows the paths that have such SHA1.

 * git-update-cache acquires a new option --resolve to notice
   the magic SHA1 and:

   - if the named path is not in the work tree anymore, delete
     the entry.

   - if the named path exists in the work tree, compute the
     latest SHA1 for that file and update the entry.

Changes other than the first two listed above are purely
optional, since the Porcelain layer can implement them without
the Plumbing support.  Not doing them would keep the Plumbing
somewhat cleaner by not having to know about this magic SHA1
convention.  On the other hand, we already use that convention
in git-diff-cache, so it may even be a consistent change to make
these optional changes.  Essentially, the magic SHA1 in the
cache means "I know the user wants me to keep an eye on this
path when it matters" [*2*].

Please veto if these changes would not help _your_ use pattern.

[Footnotes]

*1* That is, you do "read-tree -m O A B" and your work tree
before the merge matches A (e.g. linux-2.6.git or your
yet-to-be-published descendant of it), B is a subsystem tree
(e.g. rmk/linux-2.6-serial.git) and O is the common ancestor.

*2* This convention would also make an implementation of "SCM
add" in the Porcelain layer a bit more efficient.  A typical
workflow without such a convention would consist of:

 * Create a file and start editing.
 * "SCM add" file, causing "git-update-cache --add -- file".
 * Do more changes, and review.
 * "SCM commit" which does"git-update-cache" changed files,
   "git-write-tree" and "git-commit-tree" to commit.

which wastes one extra blob object per "SCM add".  My gut
feeling is that more than 80% of the time "SCM add" is followed
by some edit to the added file before "SCM commit", unless it is
the initial import.  If we adopt that convention, "SCM add"
would register with --cacheinfo with the magic SHA1 without
creating the useless blob, and "SCM commit" will be written to
lazily pick things up from the work tree.




^ permalink raw reply

* [PATCH] Make pull not assume anything about current objects
From: Daniel Barkalow @ 2005-05-01 17:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Previously, pull assumed that, if you have a commit, you either have or
don't want everything it references. Change this to actually check
locally on everything you want, to be completely sure.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Index: pull.c
===================================================================
--- 6f0f1d99169f9d90aa44e47d1bcff7b1dd4d8ea0/pull.c  (mode:100644 sha1:86a7b6901fe69a82c12c3470b456982ef52cebd0)
+++ 661b090ca8652d2cfa299b4cac3ffceebdd2b43c/pull.c  (mode:100644 sha1:90d2d41ed2c56580f72f020bc93c3e1b8a3befa5)
@@ -48,8 +48,6 @@
 	if (get_history) {
 		struct commit_list *parents = obj->parents;
 		for (; parents; parents = parents->next) {
-			if (has_sha1_file(parents->item->object.sha1))
-				continue;
 			if (fetch(parents->item->object.sha1)) {
 				/* The server might not have it, and
 				 * we don't mind. 


^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: Edgar Toernig @ 2005-05-01 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505010934480.2296@ppc970.osdl.org>

Linus Torvalds wrote:
>
> 	Date: Fri, 08 Apr 2005 02:20:10 0200 -> bad
> 	Date: Mon, 18 Apr 05 15:05:29 Hora oficial do Brasil -> bad
> 	Date: 2002/04/11 18:29:07 -> bad
> 
> The second one is funny. Not just the "Hora oficial do Brasil" (hey, I 
> could add it as a real timezone and my parser would do the right thing ;) 
> but also because my parser decides that "05" is not a year, but the day in 
> the month, so it doesn't see the year.
> 
> I can fake out that year thing pretty easily ("if it starts with '0' it's 
> not a day of the month"), but it does show just how _strange_ stuff 
> there's out there.

And what happens then with the first example?  2008 Apr 2005?


I thought about missing timezones once more.  Don't you think it's
better to default to -0000?  Afaics, it was defined for just these
cases.  Simply appending an arbitrary timezone seems wrong.

Ciao, ET.

^ permalink raw reply

* Re: Should git-prune-script warn about dircache?
From: Junio C Hamano @ 2005-05-01 17:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505010916510.2296@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> How about making git-prune-script first run "git-update-cache --refresh",
LT> and checking the return value of it (this, of course, assumes that
LT> git-update-cache --refresh would return non-zero if it can't refresh a
LT> file, which is currently not true, but should be easily fixable).

Or just check if it sees anything in the output, especially
"needs update" line.

I do not think it is such a big deal [*1*] but I should point
out that, "git-update-cache --refresh" needs to be run on all of
them if the user (or the porcelain layer) is using more than one
GIT_INDEX_FILEs [*2*].


[Footnotes]

*1* Because git-prune-script is just an example and it already
assumes it knows where the valid heads are; right now it looks
only at .git/HEAD and not .git/refs/*/*.  Each Porcelain layer
implementation should provide its own prune script anyway.

*2* I do not do this anymore but an earlier incarnation of my
little SCM on GIT [*3*] allowed a user to keep snapshots of work
tree state and switch between them by juggling multiple
GIT_INDEX_FILE.  I just create commits off of the current state
when making a snapshot in the latest version so it is not a
problem anymore for me.

*3* (PLUG) found in http://members.cox.net/junkio/.



^ permalink raw reply

* Re: Quick command reference
From: David Greaves @ 2005-05-01 17:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: omb, Paul Mackerras, git, Petr Baudis
In-Reply-To: <Pine.LNX.4.58.0505010927040.2296@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

Linus Torvalds wrote:

>On Sun, 1 May 2005, David Greaves wrote:
>  
>
>>I've spent many many hours doing this and I'm happy to spend many more -
>>but I'm at that frustrated point where it makes no sense until I know
>>it's of use.
>>    
>>
>I just tend to be concentrating on the technology itself, so docs 
>invariably fall a bit behind for me, until I get to the point where I 
>start looking at what (for me) ends up being the secondary things.
>  
>
Yep - I expected as much. That's why _I_ did the README. Not sexy but
since I needed it, I figured others would...
And frankly I've been trying to balance badgering without being annoying
- but you keep saying what a thick skin you have so WTH, I kept on at it :)

>Anyway, what I'd really appreciate is a whole "Documentation" 
>subdirectory, and preferably in some standard format. Maybe real 
>old-fashioned man-pages, but hey, especially with something like this, 
>just html would be good too.
>  
>
That was the intent... it was supposed to look manpage-ish.
It's all in one file for now to make it easier to skip around and
cut'n'paste.
(Good job too since the command names still aren't stable.)

>(And no, by "standard format" I do _not_ mean xml or stuff like that. I 
>mean something that is actually easy to read ;)
>  
>
I chose text to start since it could easily be read on the mailing list.
I'll gladly put it into some kind of markup later when the features
start to stabilise.
For now people can wrap it in <pre> tags.

Please commit the version below (into a Documentation dir) and then all
I have to do is send you patches to commit and you don't have to put too
much effort into keeping it updated. It's 'only' docs - so although I
may get it wrong, people like Junio will be sure to correct me.

If you're OK with it then I can accept patches and track comments on the
list and act as collator/sub-editor. I can also do housekeeping like
keeping usage() strings consistent with the docs.

What say you?

David

Reference documentation for the core git commands.

Signed-off-by: David Greaves <david@dgreaves.com>

---



[-- Attachment #2: README.reference --]
[-- Type: text/plain, Size: 35927 bytes --]

This file contains reference information for the core git commands.
It is actually based on the source from Petr Baudis' tree and may
therefore contain a few 'extras' that may or may not make it upstream.

The README contains much useful definition and clarification info -
read that first.  And of the commands, I suggest reading
'update-cache' and 'read-tree' first - I wish I had!

Thanks to original email authors and proof readers esp Junio C Hamano
<junkio@cox.net>

David Greaves <david@dgreaves.com>
24/4/05

Identifier terminology used:

<object>
	Indicates any object sha1 identifier

<blob>
	Indicates a blob object sha1 identifier

<tree>
	Indicates a tree object sha1 identifier

<commit>
	Indicates a commit object sha1 identifier

<tree/commit>
	Indicates a tree or commit object sha1 identifier (usually
	because the command can read the <tree> a <commit> contains).
	[Eventually may be replaced with <tree> if <tree> means
	<tree/commit> in all commands]

<type>
	Indicates that an object type is required.
	Currently one of: blob/tree/commit

<file>
	Indicates a filename - often includes leading path

<path>
	Indicates the path of a file (is this ever useful?)



################################################################
cat-file
	cat-file (-t | <type>) <object>

Provide contents or type of objects in the repository. The type is
required if -t is not being used to find the object type.

<object>
	The sha1 identifier of the object.

-t
	show the object type identified by <object>

<type>
	One of: blob/tree/commit

Output

If -t is specified, one of:
        blob/tree/commit

Otherwise the raw (though uncompressed) contents of the <object> will
be returned.


################################################################
check-files
	check-files <file>...

Check that a list of files are up-to-date between the filesystem and
the cache. Used to verify a patch target before doing a patch.

Files that do not exist on the filesystem are considered up-to-date
(whether or not they are in the cache).

Emits an error message on failure.
preparing to update existing file <file> not in cache
	  <file> exists but is not in the cache

preparing to update file <file> not uptodate in cache
	  <file> on disk is not up-to-date with the cache

exits with a status code indicating success if all files are
up-to-date.

see also: update-cache


################################################################
checkout-cache
	checkout-cache [-q] [-a] [-f] [-n] [--prefix=<string>]
		       [--] <file>...

Will copy all files listed from the cache to the working directory
(not overwriting existing files). Note that the file contents are
restored - NOT the file permissions.
??? l 58 checkout-cache.c says restore executable bit.

-q
	be quiet if files exist or are not in the cache

-f
	forces overwrite of existing files

-a
	checks out all files in the cache (will then continue to
	process listed files).
-n
	Don't checkout new files, only refresh files already checked
	out.

--prefix=<string>
	When creating files, prepend <string> (usually a directory
	including a trailing /)

--
	Do not interpret any more arguments as options.

Note that the order of the flags matters:

	checkout-cache -a -f file.c

will first check out all files listed in the cache (but not overwrite
any old ones), and then force-checkout file.c a second time (ie that
one _will_ overwrite any old contents with the same filename).

Also, just doing "checkout-cache" does nothing. You probably meant
"checkout-cache -a". And if you want to force it, you want
"checkout-cache -f -a".

Intuitiveness is not the goal here. Repeatability is. The reason for
the "no arguments means no work" thing is that from scripts you are
supposed to be able to do things like

	find . -name '*.h' -print0 | xargs -0 checkout-cache -f --

which will force all existing *.h files to be replaced with their
cached copies. If an empty command line implied "all", then this would
force-refresh everything in the cache, which was not the point.

To update and refresh only the files already checked out:

   checkout-cache -n -f -a && update-cache --ignore-missing --refresh

Oh, and the "--" is just a good idea when you know the rest will be
filenames. Just so that you wouldn't have a filename of "-a" causing
problems (not possible in the above example, but get used to it in
scripting!).

The prefix ability basically makes it trivial to use checkout-cache as
a "export as tree" function. Just read the desired tree into the
index, and do a
  
        checkout-cache --prefix=export-dir/ -a
  
and checkout-cache will "export" the cache into the specified
directory.
  
NOTE! The final "/" is important. The exported name is literally just
prefixed with the specified string, so you can also do something like
  
        checkout-cache --prefix=.merged- Makefile
  
to check out the currently cached copy of "Makefile" into the file
".merged-Makefile".


################################################################
commit-tree
	commit-tree <tree> [-p <parent commit>]*   < changelog

Creates a new commit object based on the provided tree object and
emits the new commit object id on stdout. If no parent is given then
it is considered to be an initial tree.

A commit object usually has 1 parent (a commit after a change) or up
to 16 parents.  More than one parent represents a merge of branches
that led to them.

While a tree represents a particular directory state of a working
directory, a commit represents that state in "time", and explains how
to get there.

Normally a commit would identify a new "HEAD" state, and while git
doesn't care where you save the note about that state, in practice we
tend to just write the result to the file ".git/HEAD", so that we can
always see what the last committed state was.

Options

<tree>
	An existing tree object

-p <parent commit>
	Each -p indicates a the id of a parent commit object.
	

Commit Information

A commit encapsulates:
	all parent object ids
	author name, email and date
	committer name and email and the commit time.

If not provided, commit-tree uses your name, hostname and domain to
provide author and committer info. This can be overridden using the
following environment variables.
	AUTHOR_NAME
	AUTHOR_EMAIL
	AUTHOR_DATE
	COMMIT_AUTHOR_NAME
	COMMIT_AUTHOR_EMAIL
(nb <,> and '\n's are stripped)

A commit comment is read from stdin (max 999 chars). If a changelog
entry is not provided via '<' redirection, commit-tree will just wait
for one to be entered and terminated with ^D

see also: write-tree


################################################################
diff-cache
	diff-cache [-p] [-r] [-z] [--cached] <tree/commit>

Compares the content and mode of the blobs found via a tree object
with the content of the current cache and, optionally ignoring the
stat state of the file on disk.

<tree/commit>
	The id of a tree or commit object to diff against.

-p
	generate patch (see section on generating patches)

-r
	recurse

-z
	\0 line termination on output

--cached
	do not consider the on-disk file at all

Output format:

See "Output format from diff-cache, diff-tree and show-diff" section.

Operating Modes

You can choose whether you want to trust the index file entirely
(using the "--cached" flag) or ask the diff logic to show any files
that don't match the stat state as being "tentatively changed".  Both
of these operations are very useful indeed.

Cached Mode

If --cached is specified, it allows you to ask:
	show me the differences between HEAD and the current index
	contents (the ones I'd write with a "write-tree")

For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do

	diff-cache --cached $(cat .git/HEAD)

Example: let's say I had renamed "commit.c" to "git-commit.c", and I had 
done an "upate-cache" to make that effective in the index file. 
"show-diff" wouldn't show anything at all, since the index file matches 
my working directory. But doing a diff-cache does:
	torvalds@ppc970:~/git> diff-cache --cached $(cat .git/HEAD)
	-100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        commit.c
	+100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        git-commit.c

And as you can see, the output matches "diff-tree -r" output (we
always do equivalent of "-r", since the index is flat).
You can trivially see that the above is a rename.

In fact, "diff-cache --cached" _should_ always be entirely equivalent to
actually doing a "write-tree" and comparing that. Except this one is much
nicer for the case where you just want to check where you are.

So doing a "diff-cache --cached" is basically very useful when you are 
asking yourself "what have I already marked for being committed, and 
what's the difference to a previous tree".

Non-cached Mode

The "non-cached" mode takes a different approach, and is potentially
the even more useful of the two in that what it does can't be emulated
with a "write-tree + diff-tree". Thus that's the default mode.  The
non-cached version asks the question

   "show me the differences between HEAD and the currently checked out 
    tree - index contents _and_ files that aren't up-to-date"

which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the "diff-tree -r" output to
a tee, but with a twist.

The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic "all-zero" sha1 to show
that. So let's say that you have edited "kernel/sched.c", but have not
actually done an update-cache on it yet - there is no "object" associated
with the new state, and you get:

	torvalds@ppc970:~/v2.6/linux> diff-cache $(cat .git/HEAD )
	*100644->100664 blob    7476bbcfe5ef5a1dd87d745f298b831143e4d77e->0000000000000000000000000000000000000000      kernel/sched.c

ie it shows that the tree has changed, and that "kernel/sched.c" has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.

NOTE! As with other commands of this type, "diff-cache" does not actually 
look at the contents of the file at all. So maybe "kernel/sched.c" hasn't 
actually changed, and it's just that you touched it. In either case, it's 
a note that you need to upate-cache it to make the cache be in sync.

NOTE 2! You can have a mixture of files show up as "has been updated" and
"is still dirty in the working directory" together. You can always tell
which file is in which state, since the "has been updated" ones show a
valid sha1, and the "not in sync with the index" ones will always have the
special all-zero sha1.

################################################################
diff-tree
	diff-tree [-p] [-r] [-z] <tree/commit> <tree/commit> [<pattern>]*

Compares the content and mode of the blobs found via two tree objects.

Note that diff-tree can use the tree encapsulated in a commit object.

<tree sha1>
	The id of a tree or commit object.

<pattern>

	If provided, the results are limited to a subset of files
	matching one of these prefix strings.
	ie file matches /^<pattern1>|<pattern2>|.../
	Note that pattern does not provide any wildcard or regexp features.

-p
	generate patch (see section on generating patches)

-r
	recurse

-z
	\0 line termination on output

Limiting Output

If you're only interested in differences in a subset of files, for
example some architecture-specific files, you might do:

	diff-tree -r <tree/commit> <tree/commit> arch/ia64 include/asm-ia64

and it will only show you what changed in those two directories.

Or if you are searching for what changed in just kernel/sched.c, just do

	diff-tree -r <tree/commit> <tree/commit> kernel/sched.c

and it will ignore all differences to other files.

The pattern is always the prefix, and is matched exactly (ie there are no
wildcards - although matching a directory, which it does support, can
obviously be seen as a "wildcard" for all the files under that directory).

Output format:

See "Output format from diff-cache, diff-tree and show-diff" section.

An example of normal usage is:

	torvalds@ppc970:~/git> diff-tree 5319e4d609cdd282069cc4dce33c1db559539b03 b4e628ea30d5ab3606119d2ea5caeab141d38df7
	*100664->100664 blob    ac348b7d5278e9d04e3a1cd417389379c32b014f->a01513ed4d4d565911a60981bfb4173311ba3688      fsck-cache.c

which tells you that the last commit changed just one file (it's from
this one:

	commit 3c6f7ca19ad4043e9e72fa94106f352897e651a8
	tree 5319e4d609cdd282069cc4dce33c1db559539b03
	parent b4e628ea30d5ab3606119d2ea5caeab141d38df7
	author Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
	committer Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005

	Make "fsck-cache" print out all the root commits it finds.

	Once I do the reference tracking, I'll also make it print out all the
	HEAD commits it finds, which is even more interesting.

in case you care).

################################################################
diff-tree-helper
	diff-tree-helper [-z]

Reads output from diff-cache, diff-tree and show-diff and
generates patch format output.

-z
	\0 line termination on input

See also the section on generating patches.

################################################################
fsck-cache
	fsck-cache [[--unreachable] <commit>*]

Verifies the connectivity and validity of the objects in the database.

<commit>
	A commit object to treat as the head of an unreachability
	trace

--unreachable
	print out objects that exist but that aren't readable from any
	of the specified root nodes

It tests SHA1 and general object sanity, but it does full tracking of
the resulting reachability and everything else. It prints out any
corruption it finds (missing or bad objects), and if you use the
"--unreachable" flag it will also print out objects that exist but
that aren't readable from any of the specified root nodes.

So for example

	fsck-cache --unreachable $(cat .git/HEAD)

or, for Cogito users:

	fsck-cache --unreachable $(cat .git/heads/*)

will do quite a _lot_ of verification on the tree. There are a few
extra validity tests to be added (make sure that tree objects are
sorted properly etc), but on the whole if "fsck-cache" is happy, you
do have a valid tree.

Any corrupt objects you will have to find in backups or other archives
(ie you can just remove them and do an "rsync" with some other site in
the hopes that somebody else has the object you have corrupted).

Of course, "valid tree" doesn't mean that it wasn't generated by some
evil person, and the end result might be crap. Git is a revision
tracking system, not a quality assurance system ;)

Extracted Diagnostics

expect dangling commits - potential heads - due to lack of head information
	You haven't specified any nodes as heads so it won't be
	possible to differentiate between un-parented commits and
	root nodes.

missing sha1 directory '<dir>'
	The directory holding the sha1 objects is missing.

unreachable <type> <object>
	The <type> object <object>, isn't actually referred to directly
	or indirectly in any of the trees or commits seen. This can
	mean that there's another root na SHA1_ode that you're not specifying
	or that the tree is corrupt. If you haven't missed a root node
	then you might as well delete unreachable nodes since they
	can't be used.

missing <type> <object>
	The <type> object <object>, is referred to but isn't present in
	the database.

dangling <type> <object>
	The <type> object <object>, is present in the database but never
	_directly_ used. A dangling commit could be a root node.

warning: fsck-cache: tree <tree> has full pathnames in it
	And it shouldn't...

sha1 mismatch <object>
	The database has an object who's sha1 doesn't match the
	database value.
	This indicates a ??serious?? data integrity problem.
	(note: this error occured during early git development when
	the database format changed.)

Environment Variables

SHA1_FILE_DIRECTORY
	used to specify the object database root (usually .git/objects)

################################################################
git-export
	git-export top [base]

probably deprecated:
On Wed, 20 Apr 2005, Petr Baudis wrote:
>> I will probably not buy git-export, though. (That is, it is merged, but
>> I won't make git frontend for it.) My "git export" already does
>> something different, but more importantly, "git patch" of mine already
>> does effectively the same thing as you do, just for a single patch; so I
>> will probably just extend it to do it for an (a,b] range of patches.


That's fine. It was a quick hack, just to show that if somebody wants to, 
the data is trivially exportable.

		Linus

Although in Linus' distribution, git-export is not part of 'core' git.

################################################################
init-db
	init-db

This simply creates an empty git object database - basically a .git
directory.

If the object storage directory is specified via the
SHA1_FILE_DIRECTORY environment variable then the sha1 directories are
created underneath - otherwise the default .git/objects directory is
used.

init-db won't hurt an existing repository.


################################################################
ls-tree
	ls-tree [-r] [-z] <tree/commit>

convert the tree object to a human readable (and script
processable) form.

<tree/commit>
	Id of a tree or commit object.
-r
	recurse into sub-trees

-z
	\0 line termination on output

Output Format
<mode>\t	<type>\t	<object>\t	<path><file>	


################################################################
merge-base
	merge-base <commit> <commit>

merge-base finds as good a common ancestor as possible. Given a
selection of equally good common ancestors it should not be relied on
to decide in any particular way.

The merge-base algorithm is still in flux - use the source...


################################################################
merge-cache
	merge-cache <merge-program> (-a | -- | <file>*) 

This looks up the <file>(s) in the cache and, if there are any merge
entries, unpacks all of them (which may be just one file, of course)
into up to three separate temporary files, and then executes the
supplied <merge-program> with those three files as arguments 1,2,3
(empty argument if no file), and <file> as argument 4.

--
	Interpret all future arguments as filenames

-a
	Run merge against all files in the cache that need merging.

If merge-cache is called with multiple <file>s (or -a) then it
processes them in turn only stopping if merge returns a non-zero exit
code.

Typically this is run with the a script calling the merge command from
the RCS package.

A sample script called git-merge-one-file-script is included in the
ditribution.

ALERT ALERT ALERT! The git "merge object order" is different from the
RCS "merge" program merge object order. In the above ordering, the
original is first. But the argument order to the 3-way merge program
"merge" is to have the original in the middle. Don't ask me why.

Examples:

	torvalds@ppc970:~/merge-test> merge-cache cat MM
	This is MM from the original tree.			# original
	This is modified MM in the branch A.			# merge1
	This is modified MM in the branch B.			# merge2
	This is modified MM in the branch B.			# current contents

or 

	torvalds@ppc970:~/merge-test> merge-cache cat AA MM
	cat: : No such file or directory
	This is added AA in the branch A.
	This is added AA in the branch B.
	This is added AA in the branch B.
	fatal: merge program failed

where the latter example shows how "merge-cache" will stop trying to
merge once anything has returned an error (ie "cat" returned an error
for the AA file, because it didn't exist in the original, and thus
"merge-cache" didn't even try to merge the MM thing).


################################################################
read-tree
	read-tree (<tree/commit> | -m <tree/commit1> [<tree/commit2> <tree/commit3>])"

Reads the tree information given by <tree> into the directory cache,
but does not actually _update_ any of the files it "caches". (see:
checkout-cache)

Optionally, it can merge a tree into the cache or perform a 3-way
merge.

Trivial merges are done by read-tree itself.  Only conflicting paths
will be in unmerged state when read-tree returns.

-m
	Perform a merge, not just a read

<tree#>
	The id of the tree object(s) to be read/merged.


Merging
If -m is specified, read-tree performs 2 kinds of merge, a single tree
merge if only 1 tree is given or a 3-way merge if 3 trees are
provided.

Single Tree Merge
If only 1 tree is specified, read-tree operates as if the user did not
specify "-m", except that if the original cache has an entry for a
given pathname; and the contents of the path matches with the tree
being read, the stat info from the cache is used. (In other words, the
cache's stat()s take precedence over the merged tree's)

That means that if you do a "read-tree -m <newtree>" followed by a
"checkout-cache -f -a", the checkout-cache only checks out the stuff
that really changed.

This is used to avoid unnecessary false hits when show-diff is
run after read-tree.

3-Way Merge
Each "index" entry has two bits worth of "stage" state. stage 0 is the
normal one, and is the only one you'd see in any kind of normal use.

However, when you do "read-tree" with multiple trees, the "stage"
starts out at 0, but increments for each tree you read. And in
particular, the "-m" flag means "start at stage 1" instead.

This means that you can do

	read-tree -m <tree1> <tree2> <tree3>

and you will end up with an index with all of the <tree1> entries in
"stage1", all of the <tree2> entries in "stage2" and all of the
<tree3> entries in "stage3".

Furthermore, "read-tree" has special-case logic that says: if you see
a file that matches in all respects in the following states, it
"collapses" back to "stage0":

   - stage 2 and 3 are the same; take one or the other (it makes no
     difference - the same work has been done on stage 2 and 3)

   - stage 1 and stage 2 are the same and stage 3 is different; take
     stage 3 (some work has been done on stage 3)

   - stage 1 and stage 3 are the same and stage 2 is different take
     stage 2 (some work has been done on stage 2)

Write-tree refuses to write a nonsensical tree, so write-tree will
complain about unmerged entries if it sees a single entry that is not
stage 0".

Ok, this all sounds like a collection of totally nonsensical rules,
but it's actually exactly what you want in order to do a fast
merge. The different stages represent the "result tree" (stage 0, aka
"merged"), the original tree (stage 1, aka "orig"), and the two trees
you are trying to merge (stage 2 and 3 respectively).

In fact, the way "read-tree" works, it's entirely agnostic about how
you assign the stages, and you could really assign them any which way,
and the above is just a suggested way to do it (except since
"write-tree" refuses to write anything but stage0 entries, it makes
sense to always consider stage 0 to be the "full merge" state).

So what happens? Try it out. Select the original tree, and two trees
to merge, and look how it works:

 - if a file exists in identical format in all three trees, it will 
   automatically collapse to "merged" state by the new read-tree.

 - a file that has _any_ difference what-so-ever in the three trees
   will stay as separate entries in the index. It's up to "script
   policy" to determine how to remove the non-0 stages, and insert a
   merged version.  But since the index is always sorted, they're easy
   to find: they'll be clustered together.

 - the index file saves and restores with all this information, so you
   can merge things incrementally, but as long as it has entries in
   stages 1/2/3 (ie "unmerged entries") you can't write the result.

So now the merge algorithm ends up being really simple:

 - you walk the index in order, and ignore all entries of stage 0,
   since they've already been done.

 - if you find a "stage1", but no matching "stage2" or "stage3", you
   know it's been removed from both trees (it only existed in the
   original tree), and you remove that entry.  - if you find a
   matching "stage2" and "stage3" tree, you remove one of them, and
   turn the other into a "stage0" entry. Remove any matching "stage1"
   entry if it exists too.  .. all the normal trivial rules ..

Incidentally - it also means that you don't even have to have a separate 
subdirectory for this. All the information literally is in the index file, 
which is a temporary thing anyway. There is no need to worry about what is in 
the working directory, since it is never shown and never used.

see also:
write-tree
show-files


################################################################
rev-list <commit>

Lists commit objects in reverse chronological order starting at the
given commit, taking ancestry relationship into account.  This is
useful to produce human-readable log output.


################################################################
rev-tree
	rev-tree [--edges] [--cache <cache-file>] [^]<commit> [[^]<commit>]

Provides the revision tree for one or more commits.

--edges
	Show edges (ie places where the marking changes between parent
	and child)

--cache <cache-file>
	Use the specified file as a cache. [Not implemented yet]

[^]<commit>
	The commit id to trace (a leading caret means to ignore this
	commit-id and below)

Output:
<date> <commit>:<flags> [<parent-commit>:<flags> ]*

<date>
	Date in 'seconds since epoch'

<commit>
	id of commit object

<parent-commit>
	id of each parent commit object (>1 indicates a merge)

<flags>

	The flags are read as a bitmask representing each commit
	provided on the commandline. eg: given the command:

		 $ rev-tree <com1> <com2> <com3>

	The output:

	    <date> <commit>:5

	 means that <commit> is reachable from <com1>(1) and <com3>(4)
	
A revtree can get quite large. rev-tree will eventually allow you to
cache previous state so that you don't have to follow the whole thing
down.

So the change difference between two commits is literally

	rev-tree [commit-id1]  > commit1-revtree
	rev-tree [commit-id2]  > commit2-revtree
	join -t : commit1-revtree commit2-revtree > common-revisions

(this is also how to find the most common parent - you'd look at just
the head revisions - the ones that aren't referred to by other
revisions - in "common-revision", and figure out the best one. I
think.)


################################################################
show-diff
	show-diff [-p] [-q] [-s] [-z] [paths...]

Compares the files in the working tree and the cache.  When paths
are specified, compares only those named paths.  Otherwise all
entries in the cache are compared.  The output format is the
same as diff-cache and diff-tree.

-p
	generate patch (see section on generating patches)

-q
	Remain silent even on nonexisting files

-s
	Does not do anything other than what -q does.

Output format:

See "Output format from diff-cache, diff-tree and show-diff" section.

################################################################
show-files
	show-files [-z] [-t]
		(--[cached|deleted|others|ignored|stage|unmerged])*
		(-[c|d|o|i|s|u])*
		[-x <pattern>|--exclude=<pattern>]
		[-X <file>|--exclude-from=<file>]

This merges the file listing in the directory cache index with the
actual working directory list, and shows different combinations of the
two.

One or more of the options below may be used to determine the files
shown:

-c|--cached
	Show cached files in the output (default)

-d|--deleted
	Show deleted files in the output

-o|--others
	Show other files in the output

-i|--ignored
	Show ignored files in the output
	Note the this also reverses any exclude list present.

-s|--stage
	Show stage files in the output

-u|--unmerged
	Show unmerged files in the output (forces --stage)

#-t [not in Linus' tree (yet?)]
#	Identify the file status with the following tags (followed by
#	a space) at the start of each line:
#	H	cached
#	M	unmerged
#	R	removed/deleted
#	?	other

-z
	\0 line termination on output

-x|--exclude=<pattern>
	Skips files matching pattern.
	Note that pattern is a shell wildcard pattern.

-X|--exclude-from=<file>
	exclude patterns are read from <file>; 1 per line.
	Allows the use of the famous dontdiff file as follows to find
	out about uncommitted files just as dontdiff is used with
	the diff command:
	     show-files --others --exclude-from=dontdiff

Output
show files just outputs the filename unless --stage is specified in
which case it outputs:

[<tag> ]<mode> <object> <stage> <file>

show-files --unmerged" and "show-files --stage " can be used to examine
detailed information on unmerged paths.

For an unmerged path, instead of recording a single mode/SHA1 pair,
the dircache records up to three such pairs; one from tree O in stage
1, A in stage 2, and B in stage 3.  This information can be used by
the user (or Cogito) to see what should eventually be recorded at the
path. (see read-cache for more information on state)

see also:
read-cache


################################################################
unpack-file
	unpack-file <blob>

Creates a file holding the contents of the blob specified by sha1. It
returns the name of the temporary file in the following format:
	.merge_file_XXXXX

<blob>
	Must be a blob id

################################################################
update-cache
	update-cache [--add] [--remove] [--refresh [--ignore-missing]]
		     [--cacheinfo <mode> <object> <path>]*
		     [--] [<file>]*

Modifies the index or directory cache. Each file mentioned is updated
into the cache and any 'unmerged' or 'needs updating' state is
cleared.

The way update-cache handles files it is told about can be modified
using the various options:

--add
	If a specified file isn't in the cache already then it's
	added.
	Default behaviour is to ignore new files.

--remove
	If a specified file is in the cache but is missing then it's
	removed.
	Default behaviour is to ignore removed file.

--refresh
	Looks at the current cache and checks to see if merges or
	updates are needed by checking stat() information.

--ignore-missing
	Ignores missing files during a --refresh

--cacheinfo <mode> <object> <path>
	Directly insert the specified info into the cache.
	
--
	Do not interpret any more arguments as options.

<file>
	Files to act on.
	Note that files begining with '.' are discarded. This includes
	"./file" and "dir/./file". If you don't want this, then use	
	cleaner names.
	The same applies to directories ending '/' and paths with '//'


Using --refresh

--refresh does not calculate a new sha1 file or bring the cache
up-to-date for mode/content changes. But what it _does_ do is to
"re-match" the stat information of a file with the cache, so that you
can refresh the cache for a file that hasn't been changed but where
the stat entry is out of date.

For example, you'd want to do this after doing a "read-tree", to link
up the stat cache details with the proper files.

Using --cacheinfo
--cacheinfo is used to register a file that is not in the current
working directory.  This is useful for minimum-checkout merging.

To pretend you have a file with mode and sha1 at path, say:

 $ update-cache --cacheinfo mode sha1 path

To update and refresh only the files already checked out:

   checkout-cache -n -f -a && update-cache --ignore-missing --refresh


################################################################
write-tree
	write-tree

Creates a tree object using the current cache.

The cache must be merged.

Conceptually, write-tree sync()s the current directory cache contents
into a set of tree files.
In order to have that match what is actually in your directory right
now, you need to have done a "update-cache" phase before you did the
"write-tree".


################################################################

Output format from diff-cache, diff-tree and show-diff.

These commands all compare two sets of things; what are
compared are different:

    diff-cache <tree/commit>

        compares the <tree/commit> and the files on the filesystem.

    diff-cache --cached <tree/commit>

        compares the <tree/commit> and the cache.

    diff-tree [-r] <tree/commit-1> <tree/commit-2> [paths...]

        compares the trees named by the two arguments.

    show-diff [paths...]

        compares the cache and the files on the filesystem.

The following desription uses "old" and "new" to mean those
compared entities.

For files in old but not in new (i.e. removed):
-<mode> \t <type> \t <object> \t <path>

For files not in old but in new (i.e. added):
+<mode> \t <type> \t <object> \t <path>

For files that differ:
*<old-mode>-><new-mode> \t <type> \t <old-sha1>-><new-sha1> \t <path>

<new-sha1> is shown as all 0's if new is a file on the
filesystem and it is out of sync with the cache.  Example:

    *100644->100660 blob    5be4a414b32cf4204f889469942986d3d783da84->0000000000000000000000000000000000000000      file.c

################################################################

Generating patches

When diff-cache, diff-tree, or show-diff are run with a -p
option, they do not produce the output described in "Output
format from diff-cache, diff-tree and show-diff" section.  It
instead produces a patch file.

The patch generation can be customized at two levels.  This
customization also applies to diff-tree-helper.

1. When the environment variable GIT_EXTERNAL_DIFF is not set,
   these commands internally invoke diff like this:

   diff -L k/<path> -L l/<path> -pu <old> <new>

   For added files, /dev/null is used for <old>.  For removed
   files, /dev/null is used for <new>

   The first part of the above command-line can be customized via
   the environment variable GIT_DIFF_CMD.  For example, if you
   do not want to show the extra level of leading path, you can
   say this:

   GIT_DIFF_CMD="diff -L'%s' -L'%s'" show-diff -p

   Caution:  Do not use more than two '%s' in GIT_DIFF_CMD.

   The diff formatting options can be customized via the
   environment variable GIT_DIFF_OPTS.  For example, if you
   prefer context diff:

   GIT_DIFF_OPTS=-c diff-cache -p $(cat .git/HEAD)


2. When the environment variable GIT_EXTERNAL_DIFF is set, the
   program named by it is called, instead of the diff invocation
   described above.

   For a path that is added, removed, or modified,
   GIT_EXTERNAL_DIFF is called with 7 parameters:

     path old-file old-hex old-mode new-file new-hex new-mode

   where
     <old|new>-file are files GIT_EXTERNAL_DIFF can use to read the
                    contents of <old|ne>,
     <old|new>-hex are the 40-hexdigit SHA1 hashes,
     <old|new>-mode are the octal representation of the file modes.

   The file parameters can point at the user's working file
   (e.g. new-file in show-diff), /dev/null (e.g. old-file when a
   new file is added), or a temporary file (e.g. old-file in the
   cache).  GIT_EXTERNAL_DIFF should not worry about
   unlinking the temporary file --- it is removed when
   GIT_EXTERNAL_DIFF exits.

   For a path that is unmerged, GIT_EXTERNAL_DIFF is called with
   1 parameter, path.

################################################################

Terminology: - see README for description
Each line contains terms used interchangeably

object database, .git directory
directory cache, index
id, sha1, sha1-id, sha1 hash
type, tag
blob, blob object
tree, tree object
commit, commit object
parent
root object
changeset


git Environment Variables
AUTHOR_NAME
AUTHOR_EMAIL
AUTHOR_DATE
COMMIT_AUTHOR_NAME
COMMIT_AUTHOR_EMAIL
GIT_DIFF_CMD
GIT_DIFF_OPTS
GIT_EXTERNAL_DIFF
GIT_INDEX_FILE
SHA1_FILE_DIRECTORY


^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: Randy.Dunlap @ 2005-05-01 16:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: froese, git
In-Reply-To: <Pine.LNX.4.58.0505010934480.2296@ppc970.osdl.org>

On Sun, 1 May 2005 09:46:52 -0700 (PDT) Linus Torvalds wrote:

| 
| 
| On Sun, 1 May 2005, Edgar Toernig wrote:
| > 
| > And I had the impression the strict checks in the original
| > version were intentionally ;-)
| 
| Btw, here's my test of every single email in my email archive (which is 
| not that big any more - after the SCO subpoena, I decided that I never 
| want to go through with that kind of crap ever again, so now it's only a 
| month or two of things). 
| 
| Almost everything seems to follow the RFC's or at least be close enough
| that my "accept anything" ends up doing something sane, except for three
| emails:
| 
| 	Date: Fri, 08 Apr 2005 02:20:10 0200 -> bad
| 	Date: Mon, 18 Apr 05 15:05:29 Hora oficial do Brasil -> bad
| 	Date: 2002/04/11 18:29:07 -> bad
| 
| That first one doesn't have a sign in front of the timezone (I'll fix
| things up - right now I end up believing that it's "year 200"), and the
| third one has the sane European date order that sorts nicely (and which
| I'll also fix up).

Third one is almost ISO 8601 standard date format, except that
ISO uses hyphens, e.g., 2002-04-11, so I hope that the
punctation is a little flexible...

| The second one is funny. Not just the "Hora oficial do Brasil" (hey, I 
| could add it as a real timezone and my parser would do the right thing ;) 
| but also because my parser decides that "05" is not a year, but the day in 
| the month, so it doesn't see the year.
| 
| I can fake out that year thing pretty easily ("if it starts with '0' it's 
| not a day of the month"), but it does show just how _strange_ stuff 
| there's out there.
| 
| ("Hora" is also Swedish for "whore", so that timezone does end up being
| mentally parsed _quite_ the wrong way for somebody like me who doesn't
| speak spanish).


---
~Randy

^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: Linus Torvalds @ 2005-05-01 16:46 UTC (permalink / raw)
  To: Edgar Toernig; +Cc: git
In-Reply-To: <20050501005434.2d47131a.froese@gmx.de>



On Sun, 1 May 2005, Edgar Toernig wrote:
> 
> And I had the impression the strict checks in the original
> version were intentionally ;-)

Btw, here's my test of every single email in my email archive (which is 
not that big any more - after the SCO subpoena, I decided that I never 
want to go through with that kind of crap ever again, so now it's only a 
month or two of things). 

Almost everything seems to follow the RFC's or at least be close enough
that my "accept anything" ends up doing something sane, except for three
emails:

	Date: Fri, 08 Apr 2005 02:20:10 0200 -> bad
	Date: Mon, 18 Apr 05 15:05:29 Hora oficial do Brasil -> bad
	Date: 2002/04/11 18:29:07 -> bad

That first one doesn't have a sign in front of the timezone (I'll fix
things up - right now I end up believing that it's "year 200"), and the
third one has the sane European date order that sorts nicely (and which
I'll also fix up).

The second one is funny. Not just the "Hora oficial do Brasil" (hey, I 
could add it as a real timezone and my parser would do the right thing ;) 
but also because my parser decides that "05" is not a year, but the day in 
the month, so it doesn't see the year.

I can fake out that year thing pretty easily ("if it starts with '0' it's 
not a day of the month"), but it does show just how _strange_ stuff 
there's out there.

("Hora" is also Swedish for "whore", so that timezone does end up being
mentally parsed _quite_ the wrong way for somebody like me who doesn't
speak spanish).

			Linus

^ permalink raw reply

* Re: Quick command reference
From: Linus Torvalds @ 2005-05-01 16:29 UTC (permalink / raw)
  To: David Greaves; +Cc: omb, Paul Mackerras, git, Petr Baudis
In-Reply-To: <4274FB3F.8090206@dgreaves.com>



On Sun, 1 May 2005, David Greaves wrote:
> 
> I've spent many many hours doing this and I'm happy to spend many more -
> but I'm at that frustrated point where it makes no sense until I know
> it's of use.

I just tend to be concentrating on the technology itself, so docs 
invariably fall a bit behind for me, until I get to the point where I 
start looking at what (for me) ends up being the secondary things.

Anyway, what I'd really appreciate is a whole "Documentation" 
subdirectory, and preferably in some standard format. Maybe real 
old-fashioned man-pages, but hey, especially with something like this, 
just html would be good too.

(And no, by "standard format" I do _not_ mean xml or stuff like that. I 
mean something that is actually easy to read ;)

		Linus

^ permalink raw reply

* Re: Should git-prune-script warn about dircache?
From: Linus Torvalds @ 2005-05-01 16:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwtqjxlrv.fsf@assigned-by-dhcp.cox.net>



On Sun, 1 May 2005, Junio C Hamano wrote:
> 
> Maybe a big warning in red ugly bold blinking typeface somewhere
> in the doc?

How about making git-prune-script first run "git-update-cache --refresh",
and checking the return value of it (this, of course, assumes that
git-update-cache --refresh would return non-zero if it can't refresh a
file, which is currently not true, but should be easily fixable).

		Linus

^ permalink raw reply

* Re: Quick command reference
From: David Greaves @ 2005-05-01 15:52 UTC (permalink / raw)
  To: omb; +Cc: Paul Mackerras, git, Linus Torvalds, Petr Baudis
In-Reply-To: <4274F373.6030001@khandalf.com>

Brian O'Mahoney wrote:

>Thank you both for taking the time and trouble to do this, particularly
>with the name changes and new options; why don't you merge your efforts
>and produce a GIT-Mini-HOWTO BTW send it off as a patch again!
>  
>
I will happily work on this as soon as Linus says that he'll accept it.
That way when people make changes to the options and behaviour of
commands they can trivially update the README.reference too.

Petr accepted an early version but then it became 'core git' rather than
Cogito so it made more sense to put it in Linus' tree.

I don't know if Linus just doesn't get my mails (ISPs?) or if he's not
bothered about docs right now? or he's plain busy?

I've spent many many hours doing this and I'm happy to spend many more -
but I'm at that frustrated point where it makes no sense until I know
it's of use.

Fingers crossed :)

David


^ permalink raw reply

* Re: How to get bash to shut up about SIGPIPE?
From: David A. Wheeler @ 2005-05-01 15:51 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Ryan Anderson, torvalds, rene.scharfe, git, pasky
In-Reply-To: <E1DSDER-0000kS-00@gondolin.me.apana.org.au>

Herbert Xu wrote:
> This issue has been around for years.  The discussion that led to
> Debian setting this option may be helpful in understanding it:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=10494

Thanks for the pointer.  That discussion points
to some alternative fixes that may be more useful
(instead of installing a repaired shell).

One approach is to install a trap for SIGPIPE in
non-terminating command in a pipeline where the
later items might not process all the data, e.g.:
   (trap {} SIGPIPE; find .) | head -1

<rant>
THIS IS A REALLY, REALLY BAD DECISION BY THE BASH TEAM.
Why should the default be "create annoying spurious error report?".
This should at least be a run-time settable option,
with the OPPOSITE default.
</rant>

Which is sad, bash is generally reasonably good.
I guess now I have to say "bash, once properly configured
using DONT_REPORT_SIGPIPE, is reasonably good."


--- David A. Wheeler

^ permalink raw reply

* Re: Quick command reference
From: Brian O'Mahoney @ 2005-05-01 15:19 UTC (permalink / raw)
  To: David Greaves; +Cc: Paul Mackerras, git, Linus Torvalds, Petr Baudis
In-Reply-To: <4274EB3D.2060602@dgreaves.com>

Thank you both for taking the time and trouble to do this, particularly
with the name changes and new options; why don't you merge your efforts
and produce a GIT-Mini-HOWTO BTW send it off as a patch again!

regards, Brian

David Greaves wrote:
> Paul Mackerras wrote:
> 
> 
>>As an aid to my understanding of the core git commands, I created this
>>summary of the commands and their options and parameters.  I hope it
>>will be useful to others.  Corrections welcome of course.
>>
>>Paul.
>> 
>>
> 
> 
> Thanks Paul
> 
> Shame to see duplicated effort...
> 
> I've submitted this document to Linus and the list a few times and
> included all the feedback but for some reason it's not gone into any of
> the trees which means that people like you have to redo it from scratch...
> 
> Getting frustrated now...
> 
> David
> 
> 
> 
> ------------------------------------------------------------------------
> 
> This file contains reference information for the core git commands.
> It is actually based on the source from Petr Baudis' tree and may
> therefore contain a few 'extras' that may or may not make it upstream.
> 
> The README contains much useful definition and clarification info -
> read that first.  And of the commands, I suggest reading
> 'update-cache' and 'read-tree' first - I wish I had!
> 
> Thanks to original email authors and proof readers esp Junio C Hamano
> <junkio@cox.net>
> 
> David Greaves <david@dgreaves.com>
> 24/4/05
> 
> Identifier terminology used:
> 
> <object>
> 	Indicates any object sha1 identifier
> 
> <blob>
> 	Indicates a blob object sha1 identifier
> 
> <tree>
> 	Indicates a tree object sha1 identifier
> 
> <commit>
> 	Indicates a commit object sha1 identifier
> 
> <tree/commit>
> 	Indicates a tree or commit object sha1 identifier (usually
> 	because the command can read the <tree> a <commit> contains).
> 	[Eventually may be replaced with <tree> if <tree> means
> 	<tree/commit> in all commands]
> 
> <type>
> 	Indicates that an object type is required.
> 	Currently one of: blob/tree/commit
> 
> <file>
> 	Indicates a filename - often includes leading path
> 
> <path>
> 	Indicates the path of a file (is this ever useful?)
> 
> 
> 
> ################################################################
> cat-file
> 	cat-file (-t | <type>) <object>
> 
> Provide contents or type of objects in the repository. The type is
> required if -t is not being used to find the object type.
> 
> <object>
> 	The sha1 identifier of the object.
> 
> -t
> 	show the object type identified by <object>
> 
> <type>
> 	One of: blob/tree/commit
> 
> Output
> 
> If -t is specified, one of:
>         blob/tree/commit
> 
> Otherwise the raw (though uncompressed) contents of the <object> will
> be returned.
> 
> 
> ################################################################
> check-files
> 	check-files <file>...
> 
> Check that a list of files are up-to-date between the filesystem and
> the cache. Used to verify a patch target before doing a patch.
> 
> Files that do not exist on the filesystem are considered up-to-date
> (whether or not they are in the cache).
> 
> Emits an error message on failure.
> preparing to update existing file <file> not in cache
> 	  <file> exists but is not in the cache
> 
> preparing to update file <file> not uptodate in cache
> 	  <file> on disk is not up-to-date with the cache
> 
> exits with a status code indicating success if all files are
> up-to-date.
> 
> see also: update-cache
> 
> 
> ################################################################
> checkout-cache
> 	checkout-cache [-q] [-a] [-f] [-n] [--prefix=<string>]
> 		       [--] <file>...
> 
> Will copy all files listed from the cache to the working directory
> (not overwriting existing files). Note that the file contents are
> restored - NOT the file permissions.
> ??? l 58 checkout-cache.c says restore executable bit.
> 
> -q
> 	be quiet if files exist or are not in the cache
> 
> -f
> 	forces overwrite of existing files
> 
> -a
> 	checks out all files in the cache (will then continue to
> 	process listed files).
> -n
> 	Don't checkout new files, only refresh files already checked
> 	out.
> 
> --prefix=<string>
> 	When creating files, prepend <string> (usually a directory
> 	including a trailing /)
> 
> --
> 	Do not interpret any more arguments as options.
> 
> Note that the order of the flags matters:
> 
> 	checkout-cache -a -f file.c
> 
> will first check out all files listed in the cache (but not overwrite
> any old ones), and then force-checkout file.c a second time (ie that
> one _will_ overwrite any old contents with the same filename).
> 
> Also, just doing "checkout-cache" does nothing. You probably meant
> "checkout-cache -a". And if you want to force it, you want
> "checkout-cache -f -a".
> 
> Intuitiveness is not the goal here. Repeatability is. The reason for
> the "no arguments means no work" thing is that from scripts you are
> supposed to be able to do things like
> 
> 	find . -name '*.h' -print0 | xargs -0 checkout-cache -f --
> 
> which will force all existing *.h files to be replaced with their
> cached copies. If an empty command line implied "all", then this would
> force-refresh everything in the cache, which was not the point.
> 
> To update and refresh only the files already checked out:
> 
>    checkout-cache -n -f -a && update-cache --ignore-missing --refresh
> 
> Oh, and the "--" is just a good idea when you know the rest will be
> filenames. Just so that you wouldn't have a filename of "-a" causing
> problems (not possible in the above example, but get used to it in
> scripting!).
> 
> The prefix ability basically makes it trivial to use checkout-cache as
> a "export as tree" function. Just read the desired tree into the
> index, and do a
>   
>         checkout-cache --prefix=export-dir/ -a
>   
> and checkout-cache will "export" the cache into the specified
> directory.
>   
> NOTE! The final "/" is important. The exported name is literally just
> prefixed with the specified string, so you can also do something like
>   
>         checkout-cache --prefix=.merged- Makefile
>   
> to check out the currently cached copy of "Makefile" into the file
> ".merged-Makefile".
> 
> 
> ################################################################
> commit-tree
> 	commit-tree <tree> [-p <parent commit>]*   < changelog
> 
> Creates a new commit object based on the provided tree object and
> emits the new commit object id on stdout. If no parent is given then
> it is considered to be an initial tree.
> 
> A commit object usually has 1 parent (a commit after a change) or up
> to 16 parents.  More than one parent represents a merge of branches
> that led to them.
> 
> While a tree represents a particular directory state of a working
> directory, a commit represents that state in "time", and explains how
> to get there.
> 
> Normally a commit would identify a new "HEAD" state, and while git
> doesn't care where you save the note about that state, in practice we
> tend to just write the result to the file ".git/HEAD", so that we can
> always see what the last committed state was.
> 
> Options
> 
> <tree>
> 	An existing tree object
> 
> -p <parent commit>
> 	Each -p indicates a the id of a parent commit object.
> 	
> 
> Commit Information
> 
> A commit encapsulates:
> 	all parent object ids
> 	author name, email and date
> 	committer name and email and the commit time.
> 
> If not provided, commit-tree uses your name, hostname and domain to
> provide author and committer info. This can be overridden using the
> following environment variables.
> 	AUTHOR_NAME
> 	AUTHOR_EMAIL
> 	AUTHOR_DATE
> 	COMMIT_AUTHOR_NAME
> 	COMMIT_AUTHOR_EMAIL
> (nb <,> and '\n's are stripped)
> 
> A commit comment is read from stdin (max 999 chars). If a changelog
> entry is not provided via '<' redirection, commit-tree will just wait
> for one to be entered and terminated with ^D
> 
> see also: write-tree
> 
> 
> ################################################################
> diff-cache
> 	diff-cache [-p] [-r] [-z] [--cached] <tree/commit>
> 
> Compares the content and mode of the blobs found via a tree object
> with the content of the current cache and, optionally ignoring the
> stat state of the file on disk.
> 
> <tree/commit>
> 	The id of a tree or commit object to diff against.
> 
> -p
> 	generate patch (see section on generating patches)
> 
> -r
> 	recurse
> 
> -z
> 	\0 line termination on output
> 
> --cached
> 	do not consider the on-disk file at all
> 
> Output format:
> 
> See "Output format from diff-cache, diff-tree and show-diff" section.
> 
> Operating Modes
> 
> You can choose whether you want to trust the index file entirely
> (using the "--cached" flag) or ask the diff logic to show any files
> that don't match the stat state as being "tentatively changed".  Both
> of these operations are very useful indeed.
> 
> Cached Mode
> 
> If --cached is specified, it allows you to ask:
> 	show me the differences between HEAD and the current index
> 	contents (the ones I'd write with a "write-tree")
> 
> For example, let's say that you have worked on your index file, and are
> ready to commit. You want to see eactly _what_ you are going to commit is
> without having to write a new tree object and compare it that way, and to
> do that, you just do
> 
> 	diff-cache --cached $(cat .git/HEAD)
> 
> Example: let's say I had renamed "commit.c" to "git-commit.c", and I had 
> done an "upate-cache" to make that effective in the index file. 
> "show-diff" wouldn't show anything at all, since the index file matches 
> my working directory. But doing a diff-cache does:
> 	torvalds@ppc970:~/git> diff-cache --cached $(cat .git/HEAD)
> 	-100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        commit.c
> 	+100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        git-commit.c
> 
> And as you can see, the output matches "diff-tree -r" output (we
> always do equivalent of "-r", since the index is flat).
> You can trivially see that the above is a rename.
> 
> In fact, "diff-cache --cached" _should_ always be entirely equivalent to
> actually doing a "write-tree" and comparing that. Except this one is much
> nicer for the case where you just want to check where you are.
> 
> So doing a "diff-cache --cached" is basically very useful when you are 
> asking yourself "what have I already marked for being committed, and 
> what's the difference to a previous tree".
> 
> Non-cached Mode
> 
> The "non-cached" mode takes a different approach, and is potentially
> the even more useful of the two in that what it does can't be emulated
> with a "write-tree + diff-tree". Thus that's the default mode.  The
> non-cached version asks the question
> 
>    "show me the differences between HEAD and the currently checked out 
>     tree - index contents _and_ files that aren't up-to-date"
> 
> which is obviously a very useful question too, since that tells you what
> you _could_ commit. Again, the output matches the "diff-tree -r" output to
> a tee, but with a twist.
> 
> The twist is that if some file doesn't match the cache, we don't have a
> backing store thing for it, and we use the magic "all-zero" sha1 to show
> that. So let's say that you have edited "kernel/sched.c", but have not
> actually done an update-cache on it yet - there is no "object" associated
> with the new state, and you get:
> 
> 	torvalds@ppc970:~/v2.6/linux> diff-cache $(cat .git/HEAD )
> 	*100644->100664 blob    7476bbcfe5ef5a1dd87d745f298b831143e4d77e->0000000000000000000000000000000000000000      kernel/sched.c
> 
> ie it shows that the tree has changed, and that "kernel/sched.c" has is
> not up-to-date and may contain new stuff. The all-zero sha1 means that to
> get the real diff, you need to look at the object in the working directory
> directly rather than do an object-to-object diff.
> 
> NOTE! As with other commands of this type, "diff-cache" does not actually 
> look at the contents of the file at all. So maybe "kernel/sched.c" hasn't 
> actually changed, and it's just that you touched it. In either case, it's 
> a note that you need to upate-cache it to make the cache be in sync.
> 
> NOTE 2! You can have a mixture of files show up as "has been updated" and
> "is still dirty in the working directory" together. You can always tell
> which file is in which state, since the "has been updated" ones show a
> valid sha1, and the "not in sync with the index" ones will always have the
> special all-zero sha1.
> 
> ################################################################
> diff-tree
> 	diff-tree [-p] [-r] [-z] <tree/commit> <tree/commit> [<pattern>]*
> 
> Compares the content and mode of the blobs found via two tree objects.
> 
> Note that diff-tree can use the tree encapsulated in a commit object.
> 
> <tree sha1>
> 	The id of a tree or commit object.
> 
> <pattern>
> 
> 	If provided, the results are limited to a subset of files
> 	matching one of these prefix strings.
> 	ie file matches /^<pattern1>|<pattern2>|.../
> 	Note that pattern does not provide any wildcard or regexp features.
> 
> -p
> 	generate patch (see section on generating patches)
> 
> -r
> 	recurse
> 
> -z
> 	\0 line termination on output
> 
> Limiting Output
> 
> If you're only interested in differences in a subset of files, for
> example some architecture-specific files, you might do:
> 
> 	diff-tree -r <tree/commit> <tree/commit> arch/ia64 include/asm-ia64
> 
> and it will only show you what changed in those two directories.
> 
> Or if you are searching for what changed in just kernel/sched.c, just do
> 
> 	diff-tree -r <tree/commit> <tree/commit> kernel/sched.c
> 
> and it will ignore all differences to other files.
> 
> The pattern is always the prefix, and is matched exactly (ie there are no
> wildcards - although matching a directory, which it does support, can
> obviously be seen as a "wildcard" for all the files under that directory).
> 
> Output format:
> 
> See "Output format from diff-cache, diff-tree and show-diff" section.
> 
> An example of normal usage is:
> 
> 	torvalds@ppc970:~/git> diff-tree 5319e4d609cdd282069cc4dce33c1db559539b03 b4e628ea30d5ab3606119d2ea5caeab141d38df7
> 	*100664->100664 blob    ac348b7d5278e9d04e3a1cd417389379c32b014f->a01513ed4d4d565911a60981bfb4173311ba3688      fsck-cache.c
> 
> which tells you that the last commit changed just one file (it's from
> this one:
> 
> 	commit 3c6f7ca19ad4043e9e72fa94106f352897e651a8
> 	tree 5319e4d609cdd282069cc4dce33c1db559539b03
> 	parent b4e628ea30d5ab3606119d2ea5caeab141d38df7
> 	author Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
> 	committer Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
> 
> 	Make "fsck-cache" print out all the root commits it finds.
> 
> 	Once I do the reference tracking, I'll also make it print out all the
> 	HEAD commits it finds, which is even more interesting.
> 
> in case you care).
> 
> ################################################################
> diff-tree-helper
> 	diff-tree-helper [-z]
> 
> Reads output from diff-cache, diff-tree and show-diff and
> generates patch format output.
> 
> -z
> 	\0 line termination on input
> 
> See also the section on generating patches.
> 
> ################################################################
> fsck-cache
> 	fsck-cache [[--unreachable] <commit>*]
> 
> Verifies the connectivity and validity of the objects in the database.
> 
> <commit>
> 	A commit object to treat as the head of an unreachability
> 	trace
> 
> --unreachable
> 	print out objects that exist but that aren't readable from any
> 	of the specified root nodes
> 
> It tests SHA1 and general object sanity, but it does full tracking of
> the resulting reachability and everything else. It prints out any
> corruption it finds (missing or bad objects), and if you use the
> "--unreachable" flag it will also print out objects that exist but
> that aren't readable from any of the specified root nodes.
> 
> So for example
> 
> 	fsck-cache --unreachable $(cat .git/HEAD)
> 
> or, for Cogito users:
> 
> 	fsck-cache --unreachable $(cat .git/heads/*)
> 
> will do quite a _lot_ of verification on the tree. There are a few
> extra validity tests to be added (make sure that tree objects are
> sorted properly etc), but on the whole if "fsck-cache" is happy, you
> do have a valid tree.
> 
> Any corrupt objects you will have to find in backups or other archives
> (ie you can just remove them and do an "rsync" with some other site in
> the hopes that somebody else has the object you have corrupted).
> 
> Of course, "valid tree" doesn't mean that it wasn't generated by some
> evil person, and the end result might be crap. Git is a revision
> tracking system, not a quality assurance system ;)
> 
> Extracted Diagnostics
> 
> expect dangling commits - potential heads - due to lack of head information
> 	You haven't specified any nodes as heads so it won't be
> 	possible to differentiate between un-parented commits and
> 	root nodes.
> 
> missing sha1 directory '<dir>'
> 	The directory holding the sha1 objects is missing.
> 
> unreachable <type> <object>
> 	The <type> object <object>, isn't actually referred to directly
> 	or indirectly in any of the trees or commits seen. This can
> 	mean that there's another root na SHA1_ode that you're not specifying
> 	or that the tree is corrupt. If you haven't missed a root node
> 	then you might as well delete unreachable nodes since they
> 	can't be used.
> 
> missing <type> <object>
> 	The <type> object <object>, is referred to but isn't present in
> 	the database.
> 
> dangling <type> <object>
> 	The <type> object <object>, is present in the database but never
> 	_directly_ used. A dangling commit could be a root node.
> 
> warning: fsck-cache: tree <tree> has full pathnames in it
> 	And it shouldn't...
> 
> sha1 mismatch <object>
> 	The database has an object who's sha1 doesn't match the
> 	database value.
> 	This indicates a ??serious?? data integrity problem.
> 	(note: this error occured during early git development when
> 	the database format changed.)
> 
> Environment Variables
> 
> SHA1_FILE_DIRECTORY
> 	used to specify the object database root (usually .git/objects)
> 
> ################################################################
> git-export
> 	git-export top [base]
> 
> probably deprecated:
> On Wed, 20 Apr 2005, Petr Baudis wrote:
> 
>>>I will probably not buy git-export, though. (That is, it is merged, but
>>>I won't make git frontend for it.) My "git export" already does
>>>something different, but more importantly, "git patch" of mine already
>>>does effectively the same thing as you do, just for a single patch; so I
>>>will probably just extend it to do it for an (a,b] range of patches.
> 
> 
> 
> That's fine. It was a quick hack, just to show that if somebody wants to, 
> the data is trivially exportable.
> 
> 		Linus
> 
> Although in Linus' distribution, git-export is not part of 'core' git.
> 
> ################################################################
> init-db
> 	init-db
> 
> This simply creates an empty git object database - basically a .git
> directory.
> 
> If the object storage directory is specified via the
> SHA1_FILE_DIRECTORY environment variable then the sha1 directories are
> created underneath - otherwise the default .git/objects directory is
> used.
> 
> init-db won't hurt an existing repository.
> 
> 
> ################################################################
> ls-tree
> 	ls-tree [-r] [-z] <tree/commit>
> 
> convert the tree object to a human readable (and script
> processable) form.
> 
> <tree/commit>
> 	Id of a tree or commit object.
> -r
> 	recurse into sub-trees
> 
> -z
> 	\0 line termination on output
> 
> Output Format
> <mode>\t	<type>\t	<object>\t	<path><file>	
> 
> 
> ################################################################
> merge-base
> 	merge-base <commit> <commit>
> 
> merge-base finds as good a common ancestor as possible. Given a
> selection of equally good common ancestors it should not be relied on
> to decide in any particular way.
> 
> The merge-base algorithm is still in flux - use the source...
> 
> 
> ################################################################
> merge-cache
> 	merge-cache <merge-program> (-a | -- | <file>*) 
> 
> This looks up the <file>(s) in the cache and, if there are any merge
> entries, unpacks all of them (which may be just one file, of course)
> into up to three separate temporary files, and then executes the
> supplied <merge-program> with those three files as arguments 1,2,3
> (empty argument if no file), and <file> as argument 4.
> 
> --
> 	Interpret all future arguments as filenames
> 
> -a
> 	Run merge against all files in the cache that need merging.
> 
> If merge-cache is called with multiple <file>s (or -a) then it
> processes them in turn only stopping if merge returns a non-zero exit
> code.
> 
> Typically this is run with the a script calling the merge command from
> the RCS package.
> 
> A sample script called git-merge-one-file-script is included in the
> ditribution.
> 
> ALERT ALERT ALERT! The git "merge object order" is different from the
> RCS "merge" program merge object order. In the above ordering, the
> original is first. But the argument order to the 3-way merge program
> "merge" is to have the original in the middle. Don't ask me why.
> 
> Examples:
> 
> 	torvalds@ppc970:~/merge-test> merge-cache cat MM
> 	This is MM from the original tree.			# original
> 	This is modified MM in the branch A.			# merge1
> 	This is modified MM in the branch B.			# merge2
> 	This is modified MM in the branch B.			# current contents
> 
> or 
> 
> 	torvalds@ppc970:~/merge-test> merge-cache cat AA MM
> 	cat: : No such file or directory
> 	This is added AA in the branch A.
> 	This is added AA in the branch B.
> 	This is added AA in the branch B.
> 	fatal: merge program failed
> 
> where the latter example shows how "merge-cache" will stop trying to
> merge once anything has returned an error (ie "cat" returned an error
> for the AA file, because it didn't exist in the original, and thus
> "merge-cache" didn't even try to merge the MM thing).
> 
> 
> ################################################################
> read-tree
> 	read-tree (<tree/commit> | -m <tree/commit1> [<tree/commit2> <tree/commit3>])"
> 
> Reads the tree information given by <tree> into the directory cache,
> but does not actually _update_ any of the files it "caches". (see:
> checkout-cache)
> 
> Optionally, it can merge a tree into the cache or perform a 3-way
> merge.
> 
> Trivial merges are done by read-tree itself.  Only conflicting paths
> will be in unmerged state when read-tree returns.
> 
> -m
> 	Perform a merge, not just a read
> 
> <tree#>
> 	The id of the tree object(s) to be read/merged.
> 
> 
> Merging
> If -m is specified, read-tree performs 2 kinds of merge, a single tree
> merge if only 1 tree is given or a 3-way merge if 3 trees are
> provided.
> 
> Single Tree Merge
> If only 1 tree is specified, read-tree operates as if the user did not
> specify "-m", except that if the original cache has an entry for a
> given pathname; and the contents of the path matches with the tree
> being read, the stat info from the cache is used. (In other words, the
> cache's stat()s take precedence over the merged tree's)
> 
> That means that if you do a "read-tree -m <newtree>" followed by a
> "checkout-cache -f -a", the checkout-cache only checks out the stuff
> that really changed.
> 
> This is used to avoid unnecessary false hits when show-diff is
> run after read-tree.
> 
> 3-Way Merge
> Each "index" entry has two bits worth of "stage" state. stage 0 is the
> normal one, and is the only one you'd see in any kind of normal use.
> 
> However, when you do "read-tree" with multiple trees, the "stage"
> starts out at 0, but increments for each tree you read. And in
> particular, the "-m" flag means "start at stage 1" instead.
> 
> This means that you can do
> 
> 	read-tree -m <tree1> <tree2> <tree3>
> 
> and you will end up with an index with all of the <tree1> entries in
> "stage1", all of the <tree2> entries in "stage2" and all of the
> <tree3> entries in "stage3".
> 
> Furthermore, "read-tree" has special-case logic that says: if you see
> a file that matches in all respects in the following states, it
> "collapses" back to "stage0":
> 
>    - stage 2 and 3 are the same; take one or the other (it makes no
>      difference - the same work has been done on stage 2 and 3)
> 
>    - stage 1 and stage 2 are the same and stage 3 is different; take
>      stage 3 (some work has been done on stage 3)
> 
>    - stage 1 and stage 3 are the same and stage 2 is different take
>      stage 2 (some work has been done on stage 2)
> 
> Write-tree refuses to write a nonsensical tree, so write-tree will
> complain about unmerged entries if it sees a single entry that is not
> stage 0".
> 
> Ok, this all sounds like a collection of totally nonsensical rules,
> but it's actually exactly what you want in order to do a fast
> merge. The different stages represent the "result tree" (stage 0, aka
> "merged"), the original tree (stage 1, aka "orig"), and the two trees
> you are trying to merge (stage 2 and 3 respectively).
> 
> In fact, the way "read-tree" works, it's entirely agnostic about how
> you assign the stages, and you could really assign them any which way,
> and the above is just a suggested way to do it (except since
> "write-tree" refuses to write anything but stage0 entries, it makes
> sense to always consider stage 0 to be the "full merge" state).
> 
> So what happens? Try it out. Select the original tree, and two trees
> to merge, and look how it works:
> 
>  - if a file exists in identical format in all three trees, it will 
>    automatically collapse to "merged" state by the new read-tree.
> 
>  - a file that has _any_ difference what-so-ever in the three trees
>    will stay as separate entries in the index. It's up to "script
>    policy" to determine how to remove the non-0 stages, and insert a
>    merged version.  But since the index is always sorted, they're easy
>    to find: they'll be clustered together.
> 
>  - the index file saves and restores with all this information, so you
>    can merge things incrementally, but as long as it has entries in
>    stages 1/2/3 (ie "unmerged entries") you can't write the result.
> 
> So now the merge algorithm ends up being really simple:
> 
>  - you walk the index in order, and ignore all entries of stage 0,
>    since they've already been done.
> 
>  - if you find a "stage1", but no matching "stage2" or "stage3", you
>    know it's been removed from both trees (it only existed in the
>    original tree), and you remove that entry.  - if you find a
>    matching "stage2" and "stage3" tree, you remove one of them, and
>    turn the other into a "stage0" entry. Remove any matching "stage1"
>    entry if it exists too.  .. all the normal trivial rules ..
> 
> Incidentally - it also means that you don't even have to have a separate 
> subdirectory for this. All the information literally is in the index file, 
> which is a temporary thing anyway. There is no need to worry about what is in 
> the working directory, since it is never shown and never used.
> 
> see also:
> write-tree
> show-files
> 
> 
> ################################################################
> rev-list <commit>
> 
> Lists commit objects in reverse chronological order starting at the
> given commit, taking ancestry relationship into account.  This is
> useful to produce human-readable log output.
> 
> 
> ################################################################
> rev-tree
> 	rev-tree [--edges] [--cache <cache-file>] [^]<commit> [[^]<commit>]
> 
> Provides the revision tree for one or more commits.
> 
> --edges
> 	Show edges (ie places where the marking changes between parent
> 	and child)
> 
> --cache <cache-file>
> 	Use the specified file as a cache. [Not implemented yet]
> 
> [^]<commit>
> 	The commit id to trace (a leading caret means to ignore this
> 	commit-id and below)
> 
> Output:
> <date> <commit>:<flags> [<parent-commit>:<flags> ]*
> 
> <date>
> 	Date in 'seconds since epoch'
> 
> <commit>
> 	id of commit object
> 
> <parent-commit>
> 	id of each parent commit object (>1 indicates a merge)
> 
> <flags>
> 
> 	The flags are read as a bitmask representing each commit
> 	provided on the commandline. eg: given the command:
> 
> 		 $ rev-tree <com1> <com2> <com3>
> 
> 	The output:
> 
> 	    <date> <commit>:5
> 
> 	 means that <commit> is reachable from <com1>(1) and <com3>(4)
> 	
> A revtree can get quite large. rev-tree will eventually allow you to
> cache previous state so that you don't have to follow the whole thing
> down.
> 
> So the change difference between two commits is literally
> 
> 	rev-tree [commit-id1]  > commit1-revtree
> 	rev-tree [commit-id2]  > commit2-revtree
> 	join -t : commit1-revtree commit2-revtree > common-revisions
> 
> (this is also how to find the most common parent - you'd look at just
> the head revisions - the ones that aren't referred to by other
> revisions - in "common-revision", and figure out the best one. I
> think.)
> 
> 
> ################################################################
> show-diff
> 	show-diff [-p] [-q] [-s] [-z] [paths...]
> 
> Compares the files in the working tree and the cache.  When paths
> are specified, compares only those named paths.  Otherwise all
> entries in the cache are compared.  The output format is the
> same as diff-cache and diff-tree.
> 
> -p
> 	generate patch (see section on generating patches)
> 
> -q
> 	Remain silent even on nonexisting files
> 
> -s
> 	Does not do anything other than what -q does.
> 
> Output format:
> 
> See "Output format from diff-cache, diff-tree and show-diff" section.
> 
> ################################################################
> show-files
> 	show-files [-z] [-t]
> 		(--[cached|deleted|others|ignored|stage|unmerged])*
> 		(-[c|d|o|i|s|u])*
> 		[-x <pattern>|--exclude=<pattern>]
> 		[-X <file>|--exclude-from=<file>]
> 
> This merges the file listing in the directory cache index with the
> actual working directory list, and shows different combinations of the
> two.
> 
> One or more of the options below may be used to determine the files
> shown:
> 
> -c|--cached
> 	Show cached files in the output (default)
> 
> -d|--deleted
> 	Show deleted files in the output
> 
> -o|--others
> 	Show other files in the output
> 
> -i|--ignored
> 	Show ignored files in the output
> 	Note the this also reverses any exclude list present.
> 
> -s|--stage
> 	Show stage files in the output
> 
> -u|--unmerged
> 	Show unmerged files in the output (forces --stage)
> 
> #-t [not in Linus' tree (yet?)]
> #	Identify the file status with the following tags (followed by
> #	a space) at the start of each line:
> #	H	cached
> #	M	unmerged
> #	R	removed/deleted
> #	?	other
> 
> -z
> 	\0 line termination on output
> 
> -x|--exclude=<pattern>
> 	Skips files matching pattern.
> 	Note that pattern is a shell wildcard pattern.
> 
> -X|--exclude-from=<file>
> 	exclude patterns are read from <file>; 1 per line.
> 	Allows the use of the famous dontdiff file as follows to find
> 	out about uncommitted files just as dontdiff is used with
> 	the diff command:
> 	     show-files --others --exclude-from=dontdiff
> 
> Output
> show files just outputs the filename unless --stage is specified in
> which case it outputs:
> 
> [<tag> ]<mode> <object> <stage> <file>
> 
> show-files --unmerged" and "show-files --stage " can be used to examine
> detailed information on unmerged paths.
> 
> For an unmerged path, instead of recording a single mode/SHA1 pair,
> the dircache records up to three such pairs; one from tree O in stage
> 1, A in stage 2, and B in stage 3.  This information can be used by
> the user (or Cogito) to see what should eventually be recorded at the
> path. (see read-cache for more information on state)
> 
> see also:
> read-cache
> 
> 
> ################################################################
> unpack-file
> 	unpack-file <blob>
> 
> Creates a file holding the contents of the blob specified by sha1. It
> returns the name of the temporary file in the following format:
> 	.merge_file_XXXXX
> 
> <blob>
> 	Must be a blob id
> 
> ################################################################
> update-cache
> 	update-cache [--add] [--remove] [--refresh [--ignore-missing]]
> 		     [--cacheinfo <mode> <object> <path>]*
> 		     [--] [<file>]*
> 
> Modifies the index or directory cache. Each file mentioned is updated
> into the cache and any 'unmerged' or 'needs updating' state is
> cleared.
> 
> The way update-cache handles files it is told about can be modified
> using the various options:
> 
> --add
> 	If a specified file isn't in the cache already then it's
> 	added.
> 	Default behaviour is to ignore new files.
> 
> --remove
> 	If a specified file is in the cache but is missing then it's
> 	removed.
> 	Default behaviour is to ignore removed file.
> 
> --refresh
> 	Looks at the current cache and checks to see if merges or
> 	updates are needed by checking stat() information.
> 
> --ignore-missing
> 	Ignores missing files during a --refresh
> 
> --cacheinfo <mode> <object> <path>
> 	Directly insert the specified info into the cache.
> 	
> --
> 	Do not interpret any more arguments as options.
> 
> <file>
> 	Files to act on.
> 	Note that files begining with '.' are discarded. This includes
> 	"./file" and "dir/./file". If you don't want this, then use	
> 	cleaner names.
> 	The same applies to directories ending '/' and paths with '//'
> 
> 
> Using --refresh
> 
> --refresh does not calculate a new sha1 file or bring the cache
> up-to-date for mode/content changes. But what it _does_ do is to
> "re-match" the stat information of a file with the cache, so that you
> can refresh the cache for a file that hasn't been changed but where
> the stat entry is out of date.
> 
> For example, you'd want to do this after doing a "read-tree", to link
> up the stat cache details with the proper files.
> 
> Using --cacheinfo
> --cacheinfo is used to register a file that is not in the current
> working directory.  This is useful for minimum-checkout merging.
> 
> To pretend you have a file with mode and sha1 at path, say:
> 
>  $ update-cache --cacheinfo mode sha1 path
> 
> To update and refresh only the files already checked out:
> 
>    checkout-cache -n -f -a && update-cache --ignore-missing --refresh
> 
> 
> ################################################################
> write-tree
> 	write-tree
> 
> Creates a tree object using the current cache.
> 
> The cache must be merged.
> 
> Conceptually, write-tree sync()s the current directory cache contents
> into a set of tree files.
> In order to have that match what is actually in your directory right
> now, you need to have done a "update-cache" phase before you did the
> "write-tree".
> 
> 
> ################################################################
> 
> Output format from diff-cache, diff-tree and show-diff.
> 
> These commands all compare two sets of things; what are
> compared are different:
> 
>     diff-cache <tree/commit>
> 
>         compares the <tree/commit> and the files on the filesystem.
> 
>     diff-cache --cached <tree/commit>
> 
>         compares the <tree/commit> and the cache.
> 
>     diff-tree [-r] <tree/commit-1> <tree/commit-2> [paths...]
> 
>         compares the trees named by the two arguments.
> 
>     show-diff [paths...]
> 
>         compares the cache and the files on the filesystem.
> 
> The following desription uses "old" and "new" to mean those
> compared entities.
> 
> For files in old but not in new (i.e. removed):
> -<mode> \t <type> \t <object> \t <path>
> 
> For files not in old but in new (i.e. added):
> +<mode> \t <type> \t <object> \t <path>
> 
> For files that differ:
> *<old-mode>-><new-mode> \t <type> \t <old-sha1>-><new-sha1> \t <path>
> 
> <new-sha1> is shown as all 0's if new is a file on the
> filesystem and it is out of sync with the cache.  Example:
> 
>     *100644->100660 blob    5be4a414b32cf4204f889469942986d3d783da84->0000000000000000000000000000000000000000      file.c
> 
> ################################################################
> 
> Generating patches
> 
> When diff-cache, diff-tree, or show-diff are run with a -p
> option, they do not produce the output described in "Output
> format from diff-cache, diff-tree and show-diff" section.  It
> instead produces a patch file.
> 
> The patch generation can be customized at two levels.  This
> customization also applies to diff-tree-helper.
> 
> 1. When the environment variable GIT_EXTERNAL_DIFF is not set,
>    these commands internally invoke diff like this:
> 
>    diff -L k/<path> -L l/<path> -pu <old> <new>
> 
>    For added files, /dev/null is used for <old>.  For removed
>    files, /dev/null is used for <new>
> 
>    The first part of the above command-line can be customized via
>    the environment variable GIT_DIFF_CMD.  For example, if you
>    do not want to show the extra level of leading path, you can
>    say this:
> 
>    GIT_DIFF_CMD="diff -L'%s' -L'%s'" show-diff -p
> 
>    Caution:  Do not use more than two '%s' in GIT_DIFF_CMD.
> 
>    The diff formatting options can be customized via the
>    environment variable GIT_DIFF_OPTS.  For example, if you
>    prefer context diff:
> 
>    GIT_DIFF_OPTS=-c diff-cache -p $(cat .git/HEAD)
> 
> 
> 2. When the environment variable GIT_EXTERNAL_DIFF is set, the
>    program named by it is called, instead of the diff invocation
>    described above.
> 
>    For a path that is added, removed, or modified,
>    GIT_EXTERNAL_DIFF is called with 7 parameters:
> 
>      path old-file old-hex old-mode new-file new-hex new-mode
> 
>    where
>      <old|new>-file are files GIT_EXTERNAL_DIFF can use to read the
>                     contents of <old|ne>,
>      <old|new>-hex are the 40-hexdigit SHA1 hashes,
>      <old|new>-mode are the octal representation of the file modes.
> 
>    The file parameters can point at the user's working file
>    (e.g. new-file in show-diff), /dev/null (e.g. old-file when a
>    new file is added), or a temporary file (e.g. old-file in the
>    cache).  GIT_EXTERNAL_DIFF should not worry about
>    unlinking the temporary file --- it is removed when
>    GIT_EXTERNAL_DIFF exits.
> 
>    For a path that is unmerged, GIT_EXTERNAL_DIFF is called with
>    1 parameter, path.
> 
> ################################################################
> 
> Terminology: - see README for description
> Each line contains terms used interchangeably
> 
> object database, .git directory
> directory cache, index
> id, sha1, sha1-id, sha1 hash
> type, tag
> blob, blob object
> tree, tree object
> commit, commit object
> parent
> root object
> changeset
> 
> 
> git Environment Variables
> AUTHOR_NAME
> AUTHOR_EMAIL
> AUTHOR_DATE
> COMMIT_AUTHOR_NAME
> COMMIT_AUTHOR_EMAIL
> GIT_DIFF_CMD
> GIT_DIFF_OPTS
> GIT_EXTERNAL_DIFF
> GIT_INDEX_FILE
> SHA1_FILE_DIRECTORY
> 

-- 
mit freundlichen Grüßen, Brian.

Dr. Brian O'Mahoney
Mobile +41 (0)79 334 8035 Email: omb@bluewin.ch
Bleicherstrasse 25, CH-8953 Dietikon, Switzerland
PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D  F5 14 39 C9 6D 38 56 D5

^ permalink raw reply

* Re: Quick command reference
From: David Greaves @ 2005-05-01 14:44 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git, Linus Torvalds, Petr Baudis
In-Reply-To: <17012.53862.704670.858276@cargo.ozlabs.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

Paul Mackerras wrote:

>As an aid to my understanding of the core git commands, I created this
>summary of the commands and their options and parameters.  I hope it
>will be useful to others.  Corrections welcome of course.
>
>Paul.
>  
>

Thanks Paul

Shame to see duplicated effort...

I've submitted this document to Linus and the list a few times and
included all the feedback but for some reason it's not gone into any of
the trees which means that people like you have to redo it from scratch...

Getting frustrated now...

David

-- 


[-- Attachment #2: README.reference --]
[-- Type: text/plain, Size: 35927 bytes --]

This file contains reference information for the core git commands.
It is actually based on the source from Petr Baudis' tree and may
therefore contain a few 'extras' that may or may not make it upstream.

The README contains much useful definition and clarification info -
read that first.  And of the commands, I suggest reading
'update-cache' and 'read-tree' first - I wish I had!

Thanks to original email authors and proof readers esp Junio C Hamano
<junkio@cox.net>

David Greaves <david@dgreaves.com>
24/4/05

Identifier terminology used:

<object>
	Indicates any object sha1 identifier

<blob>
	Indicates a blob object sha1 identifier

<tree>
	Indicates a tree object sha1 identifier

<commit>
	Indicates a commit object sha1 identifier

<tree/commit>
	Indicates a tree or commit object sha1 identifier (usually
	because the command can read the <tree> a <commit> contains).
	[Eventually may be replaced with <tree> if <tree> means
	<tree/commit> in all commands]

<type>
	Indicates that an object type is required.
	Currently one of: blob/tree/commit

<file>
	Indicates a filename - often includes leading path

<path>
	Indicates the path of a file (is this ever useful?)



################################################################
cat-file
	cat-file (-t | <type>) <object>

Provide contents or type of objects in the repository. The type is
required if -t is not being used to find the object type.

<object>
	The sha1 identifier of the object.

-t
	show the object type identified by <object>

<type>
	One of: blob/tree/commit

Output

If -t is specified, one of:
        blob/tree/commit

Otherwise the raw (though uncompressed) contents of the <object> will
be returned.


################################################################
check-files
	check-files <file>...

Check that a list of files are up-to-date between the filesystem and
the cache. Used to verify a patch target before doing a patch.

Files that do not exist on the filesystem are considered up-to-date
(whether or not they are in the cache).

Emits an error message on failure.
preparing to update existing file <file> not in cache
	  <file> exists but is not in the cache

preparing to update file <file> not uptodate in cache
	  <file> on disk is not up-to-date with the cache

exits with a status code indicating success if all files are
up-to-date.

see also: update-cache


################################################################
checkout-cache
	checkout-cache [-q] [-a] [-f] [-n] [--prefix=<string>]
		       [--] <file>...

Will copy all files listed from the cache to the working directory
(not overwriting existing files). Note that the file contents are
restored - NOT the file permissions.
??? l 58 checkout-cache.c says restore executable bit.

-q
	be quiet if files exist or are not in the cache

-f
	forces overwrite of existing files

-a
	checks out all files in the cache (will then continue to
	process listed files).
-n
	Don't checkout new files, only refresh files already checked
	out.

--prefix=<string>
	When creating files, prepend <string> (usually a directory
	including a trailing /)

--
	Do not interpret any more arguments as options.

Note that the order of the flags matters:

	checkout-cache -a -f file.c

will first check out all files listed in the cache (but not overwrite
any old ones), and then force-checkout file.c a second time (ie that
one _will_ overwrite any old contents with the same filename).

Also, just doing "checkout-cache" does nothing. You probably meant
"checkout-cache -a". And if you want to force it, you want
"checkout-cache -f -a".

Intuitiveness is not the goal here. Repeatability is. The reason for
the "no arguments means no work" thing is that from scripts you are
supposed to be able to do things like

	find . -name '*.h' -print0 | xargs -0 checkout-cache -f --

which will force all existing *.h files to be replaced with their
cached copies. If an empty command line implied "all", then this would
force-refresh everything in the cache, which was not the point.

To update and refresh only the files already checked out:

   checkout-cache -n -f -a && update-cache --ignore-missing --refresh

Oh, and the "--" is just a good idea when you know the rest will be
filenames. Just so that you wouldn't have a filename of "-a" causing
problems (not possible in the above example, but get used to it in
scripting!).

The prefix ability basically makes it trivial to use checkout-cache as
a "export as tree" function. Just read the desired tree into the
index, and do a
  
        checkout-cache --prefix=export-dir/ -a
  
and checkout-cache will "export" the cache into the specified
directory.
  
NOTE! The final "/" is important. The exported name is literally just
prefixed with the specified string, so you can also do something like
  
        checkout-cache --prefix=.merged- Makefile
  
to check out the currently cached copy of "Makefile" into the file
".merged-Makefile".


################################################################
commit-tree
	commit-tree <tree> [-p <parent commit>]*   < changelog

Creates a new commit object based on the provided tree object and
emits the new commit object id on stdout. If no parent is given then
it is considered to be an initial tree.

A commit object usually has 1 parent (a commit after a change) or up
to 16 parents.  More than one parent represents a merge of branches
that led to them.

While a tree represents a particular directory state of a working
directory, a commit represents that state in "time", and explains how
to get there.

Normally a commit would identify a new "HEAD" state, and while git
doesn't care where you save the note about that state, in practice we
tend to just write the result to the file ".git/HEAD", so that we can
always see what the last committed state was.

Options

<tree>
	An existing tree object

-p <parent commit>
	Each -p indicates a the id of a parent commit object.
	

Commit Information

A commit encapsulates:
	all parent object ids
	author name, email and date
	committer name and email and the commit time.

If not provided, commit-tree uses your name, hostname and domain to
provide author and committer info. This can be overridden using the
following environment variables.
	AUTHOR_NAME
	AUTHOR_EMAIL
	AUTHOR_DATE
	COMMIT_AUTHOR_NAME
	COMMIT_AUTHOR_EMAIL
(nb <,> and '\n's are stripped)

A commit comment is read from stdin (max 999 chars). If a changelog
entry is not provided via '<' redirection, commit-tree will just wait
for one to be entered and terminated with ^D

see also: write-tree


################################################################
diff-cache
	diff-cache [-p] [-r] [-z] [--cached] <tree/commit>

Compares the content and mode of the blobs found via a tree object
with the content of the current cache and, optionally ignoring the
stat state of the file on disk.

<tree/commit>
	The id of a tree or commit object to diff against.

-p
	generate patch (see section on generating patches)

-r
	recurse

-z
	\0 line termination on output

--cached
	do not consider the on-disk file at all

Output format:

See "Output format from diff-cache, diff-tree and show-diff" section.

Operating Modes

You can choose whether you want to trust the index file entirely
(using the "--cached" flag) or ask the diff logic to show any files
that don't match the stat state as being "tentatively changed".  Both
of these operations are very useful indeed.

Cached Mode

If --cached is specified, it allows you to ask:
	show me the differences between HEAD and the current index
	contents (the ones I'd write with a "write-tree")

For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do

	diff-cache --cached $(cat .git/HEAD)

Example: let's say I had renamed "commit.c" to "git-commit.c", and I had 
done an "upate-cache" to make that effective in the index file. 
"show-diff" wouldn't show anything at all, since the index file matches 
my working directory. But doing a diff-cache does:
	torvalds@ppc970:~/git> diff-cache --cached $(cat .git/HEAD)
	-100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        commit.c
	+100644 blob    4161aecc6700a2eb579e842af0b7f22b98443f74        git-commit.c

And as you can see, the output matches "diff-tree -r" output (we
always do equivalent of "-r", since the index is flat).
You can trivially see that the above is a rename.

In fact, "diff-cache --cached" _should_ always be entirely equivalent to
actually doing a "write-tree" and comparing that. Except this one is much
nicer for the case where you just want to check where you are.

So doing a "diff-cache --cached" is basically very useful when you are 
asking yourself "what have I already marked for being committed, and 
what's the difference to a previous tree".

Non-cached Mode

The "non-cached" mode takes a different approach, and is potentially
the even more useful of the two in that what it does can't be emulated
with a "write-tree + diff-tree". Thus that's the default mode.  The
non-cached version asks the question

   "show me the differences between HEAD and the currently checked out 
    tree - index contents _and_ files that aren't up-to-date"

which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the "diff-tree -r" output to
a tee, but with a twist.

The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic "all-zero" sha1 to show
that. So let's say that you have edited "kernel/sched.c", but have not
actually done an update-cache on it yet - there is no "object" associated
with the new state, and you get:

	torvalds@ppc970:~/v2.6/linux> diff-cache $(cat .git/HEAD )
	*100644->100664 blob    7476bbcfe5ef5a1dd87d745f298b831143e4d77e->0000000000000000000000000000000000000000      kernel/sched.c

ie it shows that the tree has changed, and that "kernel/sched.c" has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.

NOTE! As with other commands of this type, "diff-cache" does not actually 
look at the contents of the file at all. So maybe "kernel/sched.c" hasn't 
actually changed, and it's just that you touched it. In either case, it's 
a note that you need to upate-cache it to make the cache be in sync.

NOTE 2! You can have a mixture of files show up as "has been updated" and
"is still dirty in the working directory" together. You can always tell
which file is in which state, since the "has been updated" ones show a
valid sha1, and the "not in sync with the index" ones will always have the
special all-zero sha1.

################################################################
diff-tree
	diff-tree [-p] [-r] [-z] <tree/commit> <tree/commit> [<pattern>]*

Compares the content and mode of the blobs found via two tree objects.

Note that diff-tree can use the tree encapsulated in a commit object.

<tree sha1>
	The id of a tree or commit object.

<pattern>

	If provided, the results are limited to a subset of files
	matching one of these prefix strings.
	ie file matches /^<pattern1>|<pattern2>|.../
	Note that pattern does not provide any wildcard or regexp features.

-p
	generate patch (see section on generating patches)

-r
	recurse

-z
	\0 line termination on output

Limiting Output

If you're only interested in differences in a subset of files, for
example some architecture-specific files, you might do:

	diff-tree -r <tree/commit> <tree/commit> arch/ia64 include/asm-ia64

and it will only show you what changed in those two directories.

Or if you are searching for what changed in just kernel/sched.c, just do

	diff-tree -r <tree/commit> <tree/commit> kernel/sched.c

and it will ignore all differences to other files.

The pattern is always the prefix, and is matched exactly (ie there are no
wildcards - although matching a directory, which it does support, can
obviously be seen as a "wildcard" for all the files under that directory).

Output format:

See "Output format from diff-cache, diff-tree and show-diff" section.

An example of normal usage is:

	torvalds@ppc970:~/git> diff-tree 5319e4d609cdd282069cc4dce33c1db559539b03 b4e628ea30d5ab3606119d2ea5caeab141d38df7
	*100664->100664 blob    ac348b7d5278e9d04e3a1cd417389379c32b014f->a01513ed4d4d565911a60981bfb4173311ba3688      fsck-cache.c

which tells you that the last commit changed just one file (it's from
this one:

	commit 3c6f7ca19ad4043e9e72fa94106f352897e651a8
	tree 5319e4d609cdd282069cc4dce33c1db559539b03
	parent b4e628ea30d5ab3606119d2ea5caeab141d38df7
	author Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
	committer Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005

	Make "fsck-cache" print out all the root commits it finds.

	Once I do the reference tracking, I'll also make it print out all the
	HEAD commits it finds, which is even more interesting.

in case you care).

################################################################
diff-tree-helper
	diff-tree-helper [-z]

Reads output from diff-cache, diff-tree and show-diff and
generates patch format output.

-z
	\0 line termination on input

See also the section on generating patches.

################################################################
fsck-cache
	fsck-cache [[--unreachable] <commit>*]

Verifies the connectivity and validity of the objects in the database.

<commit>
	A commit object to treat as the head of an unreachability
	trace

--unreachable
	print out objects that exist but that aren't readable from any
	of the specified root nodes

It tests SHA1 and general object sanity, but it does full tracking of
the resulting reachability and everything else. It prints out any
corruption it finds (missing or bad objects), and if you use the
"--unreachable" flag it will also print out objects that exist but
that aren't readable from any of the specified root nodes.

So for example

	fsck-cache --unreachable $(cat .git/HEAD)

or, for Cogito users:

	fsck-cache --unreachable $(cat .git/heads/*)

will do quite a _lot_ of verification on the tree. There are a few
extra validity tests to be added (make sure that tree objects are
sorted properly etc), but on the whole if "fsck-cache" is happy, you
do have a valid tree.

Any corrupt objects you will have to find in backups or other archives
(ie you can just remove them and do an "rsync" with some other site in
the hopes that somebody else has the object you have corrupted).

Of course, "valid tree" doesn't mean that it wasn't generated by some
evil person, and the end result might be crap. Git is a revision
tracking system, not a quality assurance system ;)

Extracted Diagnostics

expect dangling commits - potential heads - due to lack of head information
	You haven't specified any nodes as heads so it won't be
	possible to differentiate between un-parented commits and
	root nodes.

missing sha1 directory '<dir>'
	The directory holding the sha1 objects is missing.

unreachable <type> <object>
	The <type> object <object>, isn't actually referred to directly
	or indirectly in any of the trees or commits seen. This can
	mean that there's another root na SHA1_ode that you're not specifying
	or that the tree is corrupt. If you haven't missed a root node
	then you might as well delete unreachable nodes since they
	can't be used.

missing <type> <object>
	The <type> object <object>, is referred to but isn't present in
	the database.

dangling <type> <object>
	The <type> object <object>, is present in the database but never
	_directly_ used. A dangling commit could be a root node.

warning: fsck-cache: tree <tree> has full pathnames in it
	And it shouldn't...

sha1 mismatch <object>
	The database has an object who's sha1 doesn't match the
	database value.
	This indicates a ??serious?? data integrity problem.
	(note: this error occured during early git development when
	the database format changed.)

Environment Variables

SHA1_FILE_DIRECTORY
	used to specify the object database root (usually .git/objects)

################################################################
git-export
	git-export top [base]

probably deprecated:
On Wed, 20 Apr 2005, Petr Baudis wrote:
>> I will probably not buy git-export, though. (That is, it is merged, but
>> I won't make git frontend for it.) My "git export" already does
>> something different, but more importantly, "git patch" of mine already
>> does effectively the same thing as you do, just for a single patch; so I
>> will probably just extend it to do it for an (a,b] range of patches.


That's fine. It was a quick hack, just to show that if somebody wants to, 
the data is trivially exportable.

		Linus

Although in Linus' distribution, git-export is not part of 'core' git.

################################################################
init-db
	init-db

This simply creates an empty git object database - basically a .git
directory.

If the object storage directory is specified via the
SHA1_FILE_DIRECTORY environment variable then the sha1 directories are
created underneath - otherwise the default .git/objects directory is
used.

init-db won't hurt an existing repository.


################################################################
ls-tree
	ls-tree [-r] [-z] <tree/commit>

convert the tree object to a human readable (and script
processable) form.

<tree/commit>
	Id of a tree or commit object.
-r
	recurse into sub-trees

-z
	\0 line termination on output

Output Format
<mode>\t	<type>\t	<object>\t	<path><file>	


################################################################
merge-base
	merge-base <commit> <commit>

merge-base finds as good a common ancestor as possible. Given a
selection of equally good common ancestors it should not be relied on
to decide in any particular way.

The merge-base algorithm is still in flux - use the source...


################################################################
merge-cache
	merge-cache <merge-program> (-a | -- | <file>*) 

This looks up the <file>(s) in the cache and, if there are any merge
entries, unpacks all of them (which may be just one file, of course)
into up to three separate temporary files, and then executes the
supplied <merge-program> with those three files as arguments 1,2,3
(empty argument if no file), and <file> as argument 4.

--
	Interpret all future arguments as filenames

-a
	Run merge against all files in the cache that need merging.

If merge-cache is called with multiple <file>s (or -a) then it
processes them in turn only stopping if merge returns a non-zero exit
code.

Typically this is run with the a script calling the merge command from
the RCS package.

A sample script called git-merge-one-file-script is included in the
ditribution.

ALERT ALERT ALERT! The git "merge object order" is different from the
RCS "merge" program merge object order. In the above ordering, the
original is first. But the argument order to the 3-way merge program
"merge" is to have the original in the middle. Don't ask me why.

Examples:

	torvalds@ppc970:~/merge-test> merge-cache cat MM
	This is MM from the original tree.			# original
	This is modified MM in the branch A.			# merge1
	This is modified MM in the branch B.			# merge2
	This is modified MM in the branch B.			# current contents

or 

	torvalds@ppc970:~/merge-test> merge-cache cat AA MM
	cat: : No such file or directory
	This is added AA in the branch A.
	This is added AA in the branch B.
	This is added AA in the branch B.
	fatal: merge program failed

where the latter example shows how "merge-cache" will stop trying to
merge once anything has returned an error (ie "cat" returned an error
for the AA file, because it didn't exist in the original, and thus
"merge-cache" didn't even try to merge the MM thing).


################################################################
read-tree
	read-tree (<tree/commit> | -m <tree/commit1> [<tree/commit2> <tree/commit3>])"

Reads the tree information given by <tree> into the directory cache,
but does not actually _update_ any of the files it "caches". (see:
checkout-cache)

Optionally, it can merge a tree into the cache or perform a 3-way
merge.

Trivial merges are done by read-tree itself.  Only conflicting paths
will be in unmerged state when read-tree returns.

-m
	Perform a merge, not just a read

<tree#>
	The id of the tree object(s) to be read/merged.


Merging
If -m is specified, read-tree performs 2 kinds of merge, a single tree
merge if only 1 tree is given or a 3-way merge if 3 trees are
provided.

Single Tree Merge
If only 1 tree is specified, read-tree operates as if the user did not
specify "-m", except that if the original cache has an entry for a
given pathname; and the contents of the path matches with the tree
being read, the stat info from the cache is used. (In other words, the
cache's stat()s take precedence over the merged tree's)

That means that if you do a "read-tree -m <newtree>" followed by a
"checkout-cache -f -a", the checkout-cache only checks out the stuff
that really changed.

This is used to avoid unnecessary false hits when show-diff is
run after read-tree.

3-Way Merge
Each "index" entry has two bits worth of "stage" state. stage 0 is the
normal one, and is the only one you'd see in any kind of normal use.

However, when you do "read-tree" with multiple trees, the "stage"
starts out at 0, but increments for each tree you read. And in
particular, the "-m" flag means "start at stage 1" instead.

This means that you can do

	read-tree -m <tree1> <tree2> <tree3>

and you will end up with an index with all of the <tree1> entries in
"stage1", all of the <tree2> entries in "stage2" and all of the
<tree3> entries in "stage3".

Furthermore, "read-tree" has special-case logic that says: if you see
a file that matches in all respects in the following states, it
"collapses" back to "stage0":

   - stage 2 and 3 are the same; take one or the other (it makes no
     difference - the same work has been done on stage 2 and 3)

   - stage 1 and stage 2 are the same and stage 3 is different; take
     stage 3 (some work has been done on stage 3)

   - stage 1 and stage 3 are the same and stage 2 is different take
     stage 2 (some work has been done on stage 2)

Write-tree refuses to write a nonsensical tree, so write-tree will
complain about unmerged entries if it sees a single entry that is not
stage 0".

Ok, this all sounds like a collection of totally nonsensical rules,
but it's actually exactly what you want in order to do a fast
merge. The different stages represent the "result tree" (stage 0, aka
"merged"), the original tree (stage 1, aka "orig"), and the two trees
you are trying to merge (stage 2 and 3 respectively).

In fact, the way "read-tree" works, it's entirely agnostic about how
you assign the stages, and you could really assign them any which way,
and the above is just a suggested way to do it (except since
"write-tree" refuses to write anything but stage0 entries, it makes
sense to always consider stage 0 to be the "full merge" state).

So what happens? Try it out. Select the original tree, and two trees
to merge, and look how it works:

 - if a file exists in identical format in all three trees, it will 
   automatically collapse to "merged" state by the new read-tree.

 - a file that has _any_ difference what-so-ever in the three trees
   will stay as separate entries in the index. It's up to "script
   policy" to determine how to remove the non-0 stages, and insert a
   merged version.  But since the index is always sorted, they're easy
   to find: they'll be clustered together.

 - the index file saves and restores with all this information, so you
   can merge things incrementally, but as long as it has entries in
   stages 1/2/3 (ie "unmerged entries") you can't write the result.

So now the merge algorithm ends up being really simple:

 - you walk the index in order, and ignore all entries of stage 0,
   since they've already been done.

 - if you find a "stage1", but no matching "stage2" or "stage3", you
   know it's been removed from both trees (it only existed in the
   original tree), and you remove that entry.  - if you find a
   matching "stage2" and "stage3" tree, you remove one of them, and
   turn the other into a "stage0" entry. Remove any matching "stage1"
   entry if it exists too.  .. all the normal trivial rules ..

Incidentally - it also means that you don't even have to have a separate 
subdirectory for this. All the information literally is in the index file, 
which is a temporary thing anyway. There is no need to worry about what is in 
the working directory, since it is never shown and never used.

see also:
write-tree
show-files


################################################################
rev-list <commit>

Lists commit objects in reverse chronological order starting at the
given commit, taking ancestry relationship into account.  This is
useful to produce human-readable log output.


################################################################
rev-tree
	rev-tree [--edges] [--cache <cache-file>] [^]<commit> [[^]<commit>]

Provides the revision tree for one or more commits.

--edges
	Show edges (ie places where the marking changes between parent
	and child)

--cache <cache-file>
	Use the specified file as a cache. [Not implemented yet]

[^]<commit>
	The commit id to trace (a leading caret means to ignore this
	commit-id and below)

Output:
<date> <commit>:<flags> [<parent-commit>:<flags> ]*

<date>
	Date in 'seconds since epoch'

<commit>
	id of commit object

<parent-commit>
	id of each parent commit object (>1 indicates a merge)

<flags>

	The flags are read as a bitmask representing each commit
	provided on the commandline. eg: given the command:

		 $ rev-tree <com1> <com2> <com3>

	The output:

	    <date> <commit>:5

	 means that <commit> is reachable from <com1>(1) and <com3>(4)
	
A revtree can get quite large. rev-tree will eventually allow you to
cache previous state so that you don't have to follow the whole thing
down.

So the change difference between two commits is literally

	rev-tree [commit-id1]  > commit1-revtree
	rev-tree [commit-id2]  > commit2-revtree
	join -t : commit1-revtree commit2-revtree > common-revisions

(this is also how to find the most common parent - you'd look at just
the head revisions - the ones that aren't referred to by other
revisions - in "common-revision", and figure out the best one. I
think.)


################################################################
show-diff
	show-diff [-p] [-q] [-s] [-z] [paths...]

Compares the files in the working tree and the cache.  When paths
are specified, compares only those named paths.  Otherwise all
entries in the cache are compared.  The output format is the
same as diff-cache and diff-tree.

-p
	generate patch (see section on generating patches)

-q
	Remain silent even on nonexisting files

-s
	Does not do anything other than what -q does.

Output format:

See "Output format from diff-cache, diff-tree and show-diff" section.

################################################################
show-files
	show-files [-z] [-t]
		(--[cached|deleted|others|ignored|stage|unmerged])*
		(-[c|d|o|i|s|u])*
		[-x <pattern>|--exclude=<pattern>]
		[-X <file>|--exclude-from=<file>]

This merges the file listing in the directory cache index with the
actual working directory list, and shows different combinations of the
two.

One or more of the options below may be used to determine the files
shown:

-c|--cached
	Show cached files in the output (default)

-d|--deleted
	Show deleted files in the output

-o|--others
	Show other files in the output

-i|--ignored
	Show ignored files in the output
	Note the this also reverses any exclude list present.

-s|--stage
	Show stage files in the output

-u|--unmerged
	Show unmerged files in the output (forces --stage)

#-t [not in Linus' tree (yet?)]
#	Identify the file status with the following tags (followed by
#	a space) at the start of each line:
#	H	cached
#	M	unmerged
#	R	removed/deleted
#	?	other

-z
	\0 line termination on output

-x|--exclude=<pattern>
	Skips files matching pattern.
	Note that pattern is a shell wildcard pattern.

-X|--exclude-from=<file>
	exclude patterns are read from <file>; 1 per line.
	Allows the use of the famous dontdiff file as follows to find
	out about uncommitted files just as dontdiff is used with
	the diff command:
	     show-files --others --exclude-from=dontdiff

Output
show files just outputs the filename unless --stage is specified in
which case it outputs:

[<tag> ]<mode> <object> <stage> <file>

show-files --unmerged" and "show-files --stage " can be used to examine
detailed information on unmerged paths.

For an unmerged path, instead of recording a single mode/SHA1 pair,
the dircache records up to three such pairs; one from tree O in stage
1, A in stage 2, and B in stage 3.  This information can be used by
the user (or Cogito) to see what should eventually be recorded at the
path. (see read-cache for more information on state)

see also:
read-cache


################################################################
unpack-file
	unpack-file <blob>

Creates a file holding the contents of the blob specified by sha1. It
returns the name of the temporary file in the following format:
	.merge_file_XXXXX

<blob>
	Must be a blob id

################################################################
update-cache
	update-cache [--add] [--remove] [--refresh [--ignore-missing]]
		     [--cacheinfo <mode> <object> <path>]*
		     [--] [<file>]*

Modifies the index or directory cache. Each file mentioned is updated
into the cache and any 'unmerged' or 'needs updating' state is
cleared.

The way update-cache handles files it is told about can be modified
using the various options:

--add
	If a specified file isn't in the cache already then it's
	added.
	Default behaviour is to ignore new files.

--remove
	If a specified file is in the cache but is missing then it's
	removed.
	Default behaviour is to ignore removed file.

--refresh
	Looks at the current cache and checks to see if merges or
	updates are needed by checking stat() information.

--ignore-missing
	Ignores missing files during a --refresh

--cacheinfo <mode> <object> <path>
	Directly insert the specified info into the cache.
	
--
	Do not interpret any more arguments as options.

<file>
	Files to act on.
	Note that files begining with '.' are discarded. This includes
	"./file" and "dir/./file". If you don't want this, then use	
	cleaner names.
	The same applies to directories ending '/' and paths with '//'


Using --refresh

--refresh does not calculate a new sha1 file or bring the cache
up-to-date for mode/content changes. But what it _does_ do is to
"re-match" the stat information of a file with the cache, so that you
can refresh the cache for a file that hasn't been changed but where
the stat entry is out of date.

For example, you'd want to do this after doing a "read-tree", to link
up the stat cache details with the proper files.

Using --cacheinfo
--cacheinfo is used to register a file that is not in the current
working directory.  This is useful for minimum-checkout merging.

To pretend you have a file with mode and sha1 at path, say:

 $ update-cache --cacheinfo mode sha1 path

To update and refresh only the files already checked out:

   checkout-cache -n -f -a && update-cache --ignore-missing --refresh


################################################################
write-tree
	write-tree

Creates a tree object using the current cache.

The cache must be merged.

Conceptually, write-tree sync()s the current directory cache contents
into a set of tree files.
In order to have that match what is actually in your directory right
now, you need to have done a "update-cache" phase before you did the
"write-tree".


################################################################

Output format from diff-cache, diff-tree and show-diff.

These commands all compare two sets of things; what are
compared are different:

    diff-cache <tree/commit>

        compares the <tree/commit> and the files on the filesystem.

    diff-cache --cached <tree/commit>

        compares the <tree/commit> and the cache.

    diff-tree [-r] <tree/commit-1> <tree/commit-2> [paths...]

        compares the trees named by the two arguments.

    show-diff [paths...]

        compares the cache and the files on the filesystem.

The following desription uses "old" and "new" to mean those
compared entities.

For files in old but not in new (i.e. removed):
-<mode> \t <type> \t <object> \t <path>

For files not in old but in new (i.e. added):
+<mode> \t <type> \t <object> \t <path>

For files that differ:
*<old-mode>-><new-mode> \t <type> \t <old-sha1>-><new-sha1> \t <path>

<new-sha1> is shown as all 0's if new is a file on the
filesystem and it is out of sync with the cache.  Example:

    *100644->100660 blob    5be4a414b32cf4204f889469942986d3d783da84->0000000000000000000000000000000000000000      file.c

################################################################

Generating patches

When diff-cache, diff-tree, or show-diff are run with a -p
option, they do not produce the output described in "Output
format from diff-cache, diff-tree and show-diff" section.  It
instead produces a patch file.

The patch generation can be customized at two levels.  This
customization also applies to diff-tree-helper.

1. When the environment variable GIT_EXTERNAL_DIFF is not set,
   these commands internally invoke diff like this:

   diff -L k/<path> -L l/<path> -pu <old> <new>

   For added files, /dev/null is used for <old>.  For removed
   files, /dev/null is used for <new>

   The first part of the above command-line can be customized via
   the environment variable GIT_DIFF_CMD.  For example, if you
   do not want to show the extra level of leading path, you can
   say this:

   GIT_DIFF_CMD="diff -L'%s' -L'%s'" show-diff -p

   Caution:  Do not use more than two '%s' in GIT_DIFF_CMD.

   The diff formatting options can be customized via the
   environment variable GIT_DIFF_OPTS.  For example, if you
   prefer context diff:

   GIT_DIFF_OPTS=-c diff-cache -p $(cat .git/HEAD)


2. When the environment variable GIT_EXTERNAL_DIFF is set, the
   program named by it is called, instead of the diff invocation
   described above.

   For a path that is added, removed, or modified,
   GIT_EXTERNAL_DIFF is called with 7 parameters:

     path old-file old-hex old-mode new-file new-hex new-mode

   where
     <old|new>-file are files GIT_EXTERNAL_DIFF can use to read the
                    contents of <old|ne>,
     <old|new>-hex are the 40-hexdigit SHA1 hashes,
     <old|new>-mode are the octal representation of the file modes.

   The file parameters can point at the user's working file
   (e.g. new-file in show-diff), /dev/null (e.g. old-file when a
   new file is added), or a temporary file (e.g. old-file in the
   cache).  GIT_EXTERNAL_DIFF should not worry about
   unlinking the temporary file --- it is removed when
   GIT_EXTERNAL_DIFF exits.

   For a path that is unmerged, GIT_EXTERNAL_DIFF is called with
   1 parameter, path.

################################################################

Terminology: - see README for description
Each line contains terms used interchangeably

object database, .git directory
directory cache, index
id, sha1, sha1-id, sha1 hash
type, tag
blob, blob object
tree, tree object
commit, commit object
parent
root object
changeset


git Environment Variables
AUTHOR_NAME
AUTHOR_EMAIL
AUTHOR_DATE
COMMIT_AUTHOR_NAME
COMMIT_AUTHOR_EMAIL
GIT_DIFF_CMD
GIT_DIFF_OPTS
GIT_EXTERNAL_DIFF
GIT_INDEX_FILE
SHA1_FILE_DIRECTORY


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox