* Re: Mercurial 0.3 vs git benchmarks
From: Florian Weimer @ 2005-04-27 15:01 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andrew Morton, Linus Torvalds, magnus.damm, mason, mike.taht, mpm,
linux-kernel, git
In-Reply-To: <426ED20B.9070706@zytor.com>
* H. Peter Anvin:
> While you're doing this anyway, you might want to make sure you enable
> -O +dir_index and run fsck -D.
Directory hashing has a negative impact on some applications (notably
tar and unpatched mutt on large Maildir folders). For git, it's a win
because hashing destroys locality anyway.
^ permalink raw reply
* Re: A shortcoming of the git repo format
From: C. Scott Ananian @ 2005-04-27 15:00 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Git Mailing List
In-Reply-To: <426F2671.1080105@zytor.com>
On Tue, 26 Apr 2005, H. Peter Anvin wrote:
> Additionally, there is the question of the handling of strings that may
> contain \n or even \0 (which may be necessary for some applications).
While we're at it, I'll just mention that '\0' is a rather bad delimiter
for zlib-compressed files; it usually ends up enlarging the file by three
or more bytes compared to using any whitespace character. The reason is
obvious: \0 isn't actually used anywhere else in the compressed contents,
so it tends to pollute zlib's dictionary.
It's probably too late to do anything about this, but hey.
--scott
Soviet STANDEL Yakima JMTRAX Hussein Ft. Meade algorithm JMBLUG CIA
SEQUIN Bejing Morwenstow Boston nuclear Sigint Ft. Bragg ZRBRIEF Peking
( http://cscott.net/ )
^ permalink raw reply
* Re: I'm missing isofs.h
From: Petr Baudis @ 2005-04-27 13:58 UTC (permalink / raw)
To: Andrew Morton, git
In-Reply-To: <20050427125843.GA9454@delft.aura.cs.cmu.edu>
Dear diary, on Wed, Apr 27, 2005 at 02:58:44PM CEST, I got a letter
where Jan Harkes <jaharkes@cs.cmu.edu> told me that...
> On Tue, Apr 26, 2005 at 09:43:38PM -0700, Andrew Morton wrote:
> > In a current tree, using git-pasky-0.7:
>
> It looks like git-pasky-0.7 doesn't include the following commit, but
> there are also several other diff and merge related fixes that were
> added since then.
Why do you think it doesn't include it? I can see the fix in the code.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
^ permalink raw reply
* Darcs-git pulling from the Linux repo: a Linux VM question
From: Juliusz Chroboczek @ 2005-04-27 13:10 UTC (permalink / raw)
To: darcs-devel, Git Mailing List
Hi,
If you are one of the few initiated who can tune the Linux VM, please
skip to the end of this mail and give me some advice. If you are one
of the even fewer initiated who understand Darcs' memory usage, read
the whole of this message and send me a patch. Otherwise, press D.
Now that I've got a Darcs that groks Git repos, I can play with a
fairly large tree -- the Linux 2.6 one. All the experiments described
below were done on a 1.4 MHz Pentium-M with 640 MB of memory, running
Linux 2.6.9 (Debian branded) over Reiserfs.
All the commands that don't need to actually read the underlying blobs
are instantaneous; for example, ``darcs changes'' takes 0.4s.
Commands that require reading the blobs but allow discarding them
straight away are reasonable enough -- ``darcs changes -s'' on all but
the initial import takes a very reasonable 15s, ``darcs changes -s''
including the initial import takes 2m30s real time, (50s CPU time).
The trouble, of course, is with commands that need to read a full tree
and keep it in memory. This is, unfortunately, the case with pull of
the initial commit, which is over 200MB in size. Darcs behaviour when
pulling this initial commit is as follows.
As I'm currently reading the git repository eagerly, Darcs starts by
reading the whole of the initial tree into memory; this takes roughly
2 minutes of real time (at less than 10% CPU), reads 18987 Git files
(blobs and trees), of which 18512 are unique (meaning that less than
500 were read two times or more -- yes, I should be keeping track of
the blobs I've already read). When that is done, Darcs' VMEM usage is
beneath 300MB.
At that point, Darcs stops doing I/O, and starts trying to interpret
the data. It runs between 80% and 100% of CPU, and grows steadily
until its VMEM reaches 550MB. At that point, the system starts
swapping very lightly (no more than 200kB/s or so), and Darcs' VMEM
usage grows up to 720MB after 5 minutes CPU, 8 minutes real time.
When Darcs has grokked the fullness of the Linux kernel, it decides to
write out a patch. So it starts touching all of its memory while
simultaneously writing out data to a patch file at a fairly sustained
rate. It gets pretty close to the end -- over 200MB of patch are
written --, when suddenly the system appears to freeze for a second,
then the OOM killer triggers and kills the Darcs process.
Now obviously there is a problem with Darcs -- it shouldn't be needing
720MB of virtual memory just to grok a 250 MB import --, but there's
also a problem with the VM. A 720 MB process should be reasonable on
a machine with 640 MB, and there's no apparent reason why the kernel
couldn't go more heavily into swap. My completely uninformed guess
would be that the heavy I/O activity generated by Darcs in the final
stage causes a shortage of some resource (probably buffers) that is
essential for the VM to perform the swapping, and that the only way
the kernel sees to get itself out of the tight spot is to invoke the
OOM killer on the process that's causing the I/O activity.
So yes, in the longer term we need to fix Darcs. For now, does anyone
know how I can tune the Linux VM to get a 720 MB process to run
reliably in 640 MB of main memory? Obviously, adding swap or tuning
the overcommit policy doesn't help (the issue is precisely that the VM
refuses to dig into the swap early enough). I don't understand what
``swappinness'' is, but it doesn't appear to help. The
``min_free_kbytes'' and ``dirty_*'' knobs look promising, but nobody
seems to know what they mean.
So what was it you said about self-tuning VM systems?
Juliusz
^ permalink raw reply
* Re: [FILE] Docs update
From: David Greaves @ 2005-04-27 11:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Petr Baudis, GIT Mailing Lists
In-Reply-To: <Pine.LNX.4.58.0504241438280.15879@ppc970.osdl.org>
[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]
Linus Torvalds wrote:
>On Sun, 24 Apr 2005, David Greaves wrote:
>
>
>>And I've attached this as a file rather than a patch to make it easier
>>for people to read.
>>
>>
>Suggestion: move "diff-tree" up above "diff-cache", since as it is now,
>you explain "diff-cache" in terms of diff-tree, before you've even
>explained diff-tree in the first place.
>
>
Yes, but this isn't a tutorial.
Ideally it'll become man pages (the chatty kind)
For now keeping it in alphabetical order is probably easier for people
looking for info on a particular command - and I've updated the 'see
diff-tree' comment to point people forward. OK?
>Also, the current diff-tree has an extension:
>
>
added.
I've also incorporated comments from Junio and Daniel.
I know pasky accepted an early version into his tree but now it's purely
core-git, it would be nice to get it into your tree.
What I propose is that you accept it as-is and then it would seem
reasonable for you to ask for relevant patches to include changes to the
docs before you finally push them.
They don't have to be perfect - no need to patch the synopsis or args
spec; just some explanatory text - I'll keep an eye on it and fix up the
editing. And if you're really lazy and just commit with a 'docs need
updating' or something then I guess I can catch that... eventually. And
I'll take care of the rename too when it happens.
And yes, I know it's now out-of-date (it is 3 days old!!) - so part 2 of
my proposal is that I'll continue to send more patches for the missing
commands and I'll also send patches to the code to bring the usage()
text in line with the consistent terminology used in the docs.
And since it's not used, how would you feel about making all the
commands take a -h and returning the usage and possibly a synopsis?
I could patch that in at the same time as I consistency-ise all the
usage strings.
David
Reference documentation for the core git commands.
Signed-off-by: David Greaves <david@dgreaves.com>
---
[-- Attachment #2: README.reference --]
[-- Type: text/plain, Size: 32860 bytes --]
This file contains reference information for the core git commands.
It is actually based on the source from Petr Baudis' tree and may
therefore contain a few 'extras' that may or may not make it upstream.
The README contains much useful definition and clarification info -
read that first. And of the commands, I suggest reading
'update-cache' and 'read-tree' first - I wish I had!
Thanks to original email authors and proof readers esp Junio C Hamano
<junkio@cox.net>
David Greaves <david@dgreaves.com>
24/4/05
Identifier terminology used:
<object>
Indicates any object sha1 identifier
<blob>
Indicates a blob object sha1 identifier
<tree>
Indicates a tree object sha1 identifier
<commit>
Indicates a commit object sha1 identifier
<tree/commit>
Indicates a tree or commit object sha1 identifier (usually
because the command can read the <tree> a <commit> contains).
[Eventually may be replaced with <tree> if <tree> means
<tree/commit> in all commands]
<type>
Indicates that an object type is required.
Currently one of: blob/tree/commit
<file>
Indicates a filename - often includes leading path
<path>
Indicates the path of a file (is this ever useful?)
################################################################
cat-file
cat-file (-t | <type>) <object>
Provide contents or type of objects in the repository. The type is
required if -t is not being used to find the object type.
<object>
The sha1 identifier of the object.
-t
show the object type identified by <object>
<type>
One of: blob/tree/commit
Output
If -t is specified, one of:
blob/tree/commit
Otherwise the raw (though uncompressed) contents of the <object> will
be returned.
################################################################
check-files
check-files <file>...
Check that a list of files are up-to-date between the filesystem and
the cache. Used to verify a patch target before doing a patch.
Files that do not exist on the filesystem are considered up-to-date
(whether or not they are in the cache).
Emits an error message on failure.
preparing to update existing file <file> not in cache
<file> exists but is not in the cache
preparing to update file <file> not uptodate in cache
<file> on disk is not up-to-date with the cache
exits with a status code indicating success if all files are
up-to-date.
see also: update-cache
################################################################
checkout-cache
checkout-cache [-q] [-a] [-f] [-n] [--prefix=<string>]
[--] <file>...
Will copy all files listed from the cache to the working directory
(not overwriting existing files). Note that the file contents are
restored - NOT the file permissions.
-q
be quiet if files exist or are not in the cache
-f
forces overwrite of existing files
-a
checks out all files in the cache (will then continue to
process listed files).
-n
Don't checkout new files, only refresh files already checked
out.
--prefix=<string>
When creating files, prepend <string> (usually a directory
including a trailing /)
--
Do not interpret any more arguments as options.
Note that the order of the flags matters:
checkout-cache -a -f file.c
will first check out all files listed in the cache (but not overwrite
any old ones), and then force-checkout file.c a second time (ie that
one _will_ overwrite any old contents with the same filename).
Also, just doing "checkout-cache" does nothing. You probably meant
"checkout-cache -a". And if you want to force it, you want
"checkout-cache -f -a".
Intuitiveness is not the goal here. Repeatability is. The reason for
the "no arguments means no work" thing is that from scripts you are
supposed to be able to do things like
find . -name '*.h' -print0 | xargs -0 checkout-cache -f --
which will force all existing *.h files to be replaced with their
cached copies. If an empty command line implied "all", then this would
force-refresh everything in the cache, which was not the point.
To update and refresh only the files already checked out:
checkout-cache -n -f -a && update-cache --ignore-missing --refresh
Oh, and the "--" is just a good idea when you know the rest will be
filenames. Just so that you wouldn't have a filename of "-a" causing
problems (not possible in the above example, but get used to it in
scripting!).
The prefix ability basically makes it trivial to use checkout-cache as
a "export as tree" function. Just read the desired tree into the
index, and do a
checkout-cache --prefix=export-dir/ -a
and checkout-cache will "export" the cache into the specified
directory.
NOTE! The final "/" is important. The exported name is literally just
prefixed with the specified string, so you can also do something like
checkout-cache --prefix=.merged- Makefile
to check out the currently cached copy of "Makefile" into the file
".merged-Makefile".
################################################################
commit-tree
commit-tree <tree> [-p <parent commit>]* < changelog
Creates a new commit object based on the provided tree object and
emits the new commit object id on stdout. If no parent is given then
it is considered to be an initial tree.
A commit object usually has 1 parent (a commit after a change) or up
to 16 parents. More than one parent represents a merge of branches
that led to them.
While a tree represents a particular directory state of a working
directory, a commit represents that state in "time", and explains how
to get there.
Normally a commit would identify a new "HEAD" state, and while git
doesn't care where you save the note about that state, in practice we
tend to just write the result to the file ".git/HEAD", so that we can
always see what the last committed state was.
Options
<tree>
An existing tree object
-p <parent commit>
Each -p indicates a the id of a parent commit object.
Commit Information
A commit encapsulates:
all parent object ids
author name, email and date
committer name and email and the commit time.
If not provided, commit-tree uses your name, hostname and domain to
provide author and committer info. This can be overridden using the
following environment variables.
AUTHOR_NAME
AUTHOR_EMAIL
AUTHOR_DATE
COMMIT_AUTHOR_NAME
COMMIT_AUTHOR_EMAIL
(nb <,> and '\n's are stripped)
A commit comment is read from stdin (max 999 chars). If a changelog
entry is not provided via '<' redirection, commit-tree will just wait
for one to be entered and terminated with ^D
see also: write-tree
################################################################
diff-cache
diff-cache [-r] [-z] [--cached] <tree/commit>
Compares the content and mode of the blobs found via a tree object
with the content of the current cache and, optionally ignoring the
stat state of the file on disk.
(This is basically a special case of diff-tree that works with the
current cache as the first tree - see diff-tree for more details)
<tree/commit>
The id of a tree or commit object to diff against.
-r
recurse
-z
\0 line termination on output
--cached
do not consider the on-disk file at all
Output format:
For files in the tree but not in the cache
-<mode>\t <type>\t <object>\t <path><file>
For files in the cache but not in the tree
+<mode>\t <type>\t <object>\t <path><file>
For files that differ:
*<tree-mode>-><cache-mode>\t <type>\t <tree-sha1>-><cache-sha1>\t <path><file>
In the special case of the file being changed on disk and out of sync
with the cache, the sha1 is all 0's. Example:
*100644->100660 blob 5be4a414b32cf4204f889469942986d3d783da84->0000000000000000000000000000000000000000 file.c
Operating Modes
You can choose whether you want to trust the index file entirely
(using the "--cached" flag) or ask the diff logic to show any files
that don't match the stat state as being "tentatively changed". Both
of these operations are very useful indeed.
Cached Mode
If --cached is specified, it allows you to ask:
show me the differences between HEAD and the current index
contents (the ones I'd write with a "write-tree")
For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do
diff-cache --cached $(cat .git/HEAD)
Example: let's say I had renamed "commit.c" to "git-commit.c", and I had
done an "upate-cache" to make that effective in the index file.
"show-diff" wouldn't show anything at all, since the index file matches
my working directory. But doing a diff-cache does:
torvalds@ppc970:~/git> diff-cache --cached $(cat .git/HEAD)
-100644 blob 4161aecc6700a2eb579e842af0b7f22b98443f74 commit.c
+100644 blob 4161aecc6700a2eb579e842af0b7f22b98443f74 git-commit.c
And as you can see, the output matches "diff-tree -r" output (we
always do "-r", since the index is always fully populated
??CHECK??).
You can trivially see that the above is a rename.
In fact, "diff-cache --cached" _should_ always be entirely equivalent to
actually doing a "write-tree" and comparing that. Except this one is much
nicer for the case where you just want to check where you are.
So doing a "diff-cache --cached" is basically very useful when you are
asking yourself "what have I already marked for being committed, and
what's the difference to a previous tree".
Non-cached Mode
The "non-cached" mode takes a different approach, and is potentially
the even more useful of the two in that what it does can't be emulated
with a "write-tree + diff-tree". Thus that's the default mode. The
non-cached version asks the question
"show me the differences between HEAD and the currently checked out
tree - index contents _and_ files that aren't up-to-date"
which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the "diff-tree -r" output to
a tee, but with a twist.
The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic "all-zero" sha1 to show
that. So let's say that you have edited "kernel/sched.c", but have not
actually done an update-cache on it yet - there is no "object" associated
with the new state, and you get:
torvalds@ppc970:~/v2.6/linux> diff-cache $(cat .git/HEAD )
*100644->100664 blob 7476bbcfe5ef5a1dd87d745f298b831143e4d77e->0000000000000000000000000000000000000000 kernel/sched.c
ie it shows that the tree has changed, and that "kernel/sched.c" has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.
NOTE! As with other commands of this type, "diff-cache" does not actually
look at the contents of the file at all. So maybe "kernel/sched.c" hasn't
actually changed, and it's just that you touched it. In either case, it's
a note that you need to upate-cache it to make the cache be in sync.
NOTE 2! You can have a mixture of files show up as "has been updated" and
"is still dirty in the working directory" together. You can always tell
which file is in which state, since the "has been updated" ones show a
valid sha1, and the "not in sync with the index" ones will always have the
special all-zero sha1.
################################################################
diff-tree
diff-tree [-r] [-z] <tree/commit> <tree/commit> [<pattern>]*
Compares the content and mode of the blobs found via two tree objects.
Note that diff-tree can use the tree encapsulated in a commit object.
<tree sha1>
The id of a tree or commit object.
<pattern>
If provided, the results are limited to a subset of files
matching one of these prefix strings.
ie file matches /^<pattern1>|<pattern2>|.../
Note that pattern does not provide any wildcard or regexp features.
-r
recurse
-z
\0 line termination on output
Limiting Output
If you're only interested in differences in a subset of files, for
example some architecture-specific files, you might do:
diff-tree -r <tree/commit> <tree/commit> arch/ia64 include/asm-ia64
and it will only show you what changed in those two directories.
Or if you are searching for what changed in just kernel/sched.c, just do
diff-tree -r <tree/commit> <tree/commit> kernel/sched.c
and it will ignore all differences to other files.
The pattern is always the prefix, and is matched exactly (ie there are no
wildcards - although matching a directory, which it does support, can
obviously be seen as a "wildcard" for all the files under that directory).
Output format:
For files in tree1 but not in tree2
-<mode>\t <type>\t <object>\t <path><file>
For files not in tree1 but in tree2
+<mode>\t <type>\t <object>\t <path><file>
For files that differ:
*<tree1-mode>-><tree2-mode>\t <type>\t <tree1 sha1>-><tree2 sha1>\t <path><file>
An example of normal usage is:
torvalds@ppc970:~/git> diff-tree 5319e4d609cdd282069cc4dce33c1db559539b03 b4e628ea30d5ab3606119d2ea5caeab141d38df7
*100664->100664 blob ac348b7d5278e9d04e3a1cd417389379c32b014f->a01513ed4d4d565911a60981bfb4173311ba3688 fsck-cache.c
which tells you that the last commit changed just one file (it's from
this one:
commit 3c6f7ca19ad4043e9e72fa94106f352897e651a8
tree 5319e4d609cdd282069cc4dce33c1db559539b03
parent b4e628ea30d5ab3606119d2ea5caeab141d38df7
author Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
committer Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
Make "fsck-cache" print out all the root commits it finds.
Once I do the reference tracking, I'll also make it print out all the
HEAD commits it finds, which is even more interesting.
in case you care).
################################################################
fsck-cache
fsck-cache [[--unreachable] <commit>*]
Verifies the connectivity and validity of the objects in the database.
<commit>
A commit object to treat as the head of an unreachability
trace
--unreachable
print out objects that exist but that aren't readable from any
of the specified root nodes
It tests SHA1 and general object sanity, but it does full tracking of
the resulting reachability and everything else. It prints out any
corruption it finds (missing or bad objects), and if you use the
"--unreachable" flag it will also print out objects that exist but
that aren't readable from any of the specified root nodes.
So for example
fsck-cache --unreachable $(cat .git/HEAD)
or, for Cogito users:
fsck-cache --unreachable $(cat .git/heads/*)
will do quite a _lot_ of verification on the tree. There are a few
extra validity tests to be added (make sure that tree objects are
sorted properly etc), but on the whole if "fsck-cache" is happy, you
do have a valid tree.
Any corrupt objects you will have to find in backups or other archives
(ie you can just remove them and do an "rsync" with some other site in
the hopes that somebody else has the object you have corrupted).
Of course, "valid tree" doesn't mean that it wasn't generated by some
evil person, and the end result might be crap. Git is a revision
tracking system, not a quality assurance system ;)
Extracted Diagnostics
expect dangling commits - potential heads - due to lack of head information
You haven't specified any nodes as heads so it won't be
possible to differentiate between un-parented commits and
root nodes.
missing sha1 directory '<dir>'
The directory holding the sha1 objects is missing.
unreachable <type> <object>
The <type> object <object>, isn't actually referred to directly
or indirectly in any of the trees or commits seen. This can
mean that there's another root na SHA1_ode that you're not specifying
or that the tree is corrupt. If you haven't missed a root node
then you might as well delete unreachable nodes since they
can't be used.
missing <type> <object>
The <type> object <object>, is referred to but isn't present in
the database.
dangling <type> <object>
The <type> object <object>, is present in the database but never
_directly_ used. A dangling commit could be a root node.
warning: fsck-cache: tree <tree> has full pathnames in it
And it shouldn't...
sha1 mismatch <object>
The database has an object who's sha1 doesn't match the
database value.
This indicates a ??serious?? data integrity problem.
(note: this error occured during early git development when
the database format changed.)
Environment Variables
SHA1_FILE_DIRECTORY
used to specify the object database root (usually .git/objects)
################################################################
git-export
git-export top [base]
probably deprecated:
On Wed, 20 Apr 2005, Petr Baudis wrote:
>> I will probably not buy git-export, though. (That is, it is merged, but
>> I won't make git frontend for it.) My "git export" already does
>> something different, but more importantly, "git patch" of mine already
>> does effectively the same thing as you do, just for a single patch; so I
>> will probably just extend it to do it for an (a,b] range of patches.
That's fine. It was a quick hack, just to show that if somebody wants to,
the data is trivially exportable.
Linus
Although in Linus' distribution, git-export is not part of 'core' git.
################################################################
init-db
init-db
This simply creates an empty git object database - basically a .git
directory.
If the object storage directory is specified via the
SHA1_FILE_DIRECTORY environment variable then the sha1 directories are
created underneath - otherwise the default .git/objects directory is
used.
init-db won't hurt an existing repository.
################################################################
ls-tree
ls-tree [-r] [-z] <tree/commit>
convert the tree object to a human readable (and script
processable) form.
<tree/commit>
Id of a tree or commit object.
-r
recurse into sub-trees
-z
\0 line termination on output
Output Format
<mode>\t <type>\t <object>\t <path><file>
################################################################
merge-base
merge-base <commit> <commit>
merge-base finds as good a common ancestor as possible. Given a
selection of equally good common ancestors it should not be relied on
to decide in any particular way.
The merge-base algorithm is still in flux - use the source...
################################################################
merge-cache
merge-cache <merge-program> (-a | -- | <file>*)
This looks up the <file>(s) in the cache and, if there are any merge
entries, unpacks all of them (which may be just one file, of course)
into up to three separate temporary files, and then executes the
supplied <merge-program> with those three files as arguments 1,2,3
(empty argument if no file), and <file> as argument 4.
--
Interpret all future arguments as filenames
-a
Run merge against all files in the cache that need merging.
If merge-cache is called with multiple <file>s (or -a) then it
processes them in turn only stopping if merge returns a non-zero exit
code.
Typically this is run with the a script calling the merge command from
the RCS package.
A sample script called git-merge-one-file-script is included in the
ditribution.
ALERT ALERT ALERT! The git "merge object order" is different from the
RCS "merge" program merge object order. In the above ordering, the
original is first. But the argument order to the 3-way merge program
"merge" is to have the original in the middle. Don't ask me why.
Examples:
torvalds@ppc970:~/merge-test> merge-cache cat MM
This is MM from the original tree. # original
This is modified MM in the branch A. # merge1
This is modified MM in the branch B. # merge2
This is modified MM in the branch B. # current contents
or
torvalds@ppc970:~/merge-test> merge-cache cat AA MM
cat: : No such file or directory
This is added AA in the branch A.
This is added AA in the branch B.
This is added AA in the branch B.
fatal: merge program failed
where the latter example shows how "merge-cache" will stop trying to
merge once anything has returned an error (ie "cat" returned an error
for the AA file, because it didn't exist in the original, and thus
"merge-cache" didn't even try to merge the MM thing).
################################################################
read-tree
read-tree (<tree/commit> | -m <tree/commit1> [<tree/commit2> <tree/commit3>])"
Reads the tree information given by <tree> into the directory cache,
but does not actually _update_ any of the files it "caches". (see:
checkout-cache)
Optionally, it can merge a tree into the cache or perform a 3-way
merge.
Trivial merges are done by read-tree itself. Only conflicting paths
will be in unmerged state when read-tree returns.
-m
Perform a merge, not just a read
<tree#>
The id of the tree object(s) to be read/merged.
Merging
If -m is specified, read-tree performs 2 kinds of merge, a single tree
merge if only 1 tree is given or a 3-way merge if 3 trees are
provided.
Single Tree Merge
If only 1 tree is specified, read-tree operates as if the user did not
specify "-m", except that if the original cache has an entry for a
given pathname; and the contents of the path matches with the tree
being read, the stat info from the cache is used. (In other words, the
cache's stat()s take precedence over the merged tree's)
That means that if you do a "read-tree -m <newtree>" followed by a
"checkout-cache -f -a", the checkout-cache only checks out the stuff
that really changed.
This is used to avoid unnecessary false hits when show-diff is
run after read-tree.
3-Way Merge
Each "index" entry has two bits worth of "stage" state. stage 0 is the
normal one, and is the only one you'd see in any kind of normal use.
However, when you do "read-tree" with multiple trees, the "stage"
starts out at 0, but increments for each tree you read. And in
particular, the "-m" flag means "start at stage 1" instead.
This means that you can do
read-tree -m <tree1> <tree2> <tree3>
and you will end up with an index with all of the <tree1> entries in
"stage1", all of the <tree2> entries in "stage2" and all of the
<tree3> entries in "stage3".
Furthermore, "read-tree" has special-case logic that says: if you see
a file that matches in all respects in the following states, it
"collapses" back to "stage0":
- stage 2 and 3 are the same; take one or the other (it makes no
difference - the same work has been done on stage 2 and 3)
- stage 1 and stage 2 are the same and stage 3 is different; take
stage 3 (some work has been done on stage 3)
- stage 1 and stage 3 are the same and stage 2 is different take
stage 2 (some work has been done on stage 2)
Write-tree refuses to write a nonsensical tree, so write-tree will
complain about unmerged entries if it sees a single entry that is not
stage 0".
Ok, this all sounds like a collection of totally nonsensical rules,
but it's actually exactly what you want in order to do a fast
merge. The different stages represent the "result tree" (stage 0, aka
"merged"), the original tree (stage 1, aka "orig"), and the two trees
you are trying to merge (stage 2 and 3 respectively).
In fact, the way "read-tree" works, it's entirely agnostic about how
you assign the stages, and you could really assign them any which way,
and the above is just a suggested way to do it (except since
"write-tree" refuses to write anything but stage0 entries, it makes
sense to always consider stage 0 to be the "full merge" state).
So what happens? Try it out. Select the original tree, and two trees
to merge, and look how it works:
- if a file exists in identical format in all three trees, it will
automatically collapse to "merged" state by the new read-tree.
- a file that has _any_ difference what-so-ever in the three trees
will stay as separate entries in the index. It's up to "script
policy" to determine how to remove the non-0 stages, and insert a
merged version. But since the index is always sorted, they're easy
to find: they'll be clustered together.
- the index file saves and restores with all this information, so you
can merge things incrementally, but as long as it has entries in
stages 1/2/3 (ie "unmerged entries") you can't write the result.
So now the merge algorithm ends up being really simple:
- you walk the index in order, and ignore all entries of stage 0,
since they've already been done.
- if you find a "stage1", but no matching "stage2" or "stage3", you
know it's been removed from both trees (it only existed in the
original tree), and you remove that entry. - if you find a
matching "stage2" and "stage3" tree, you remove one of them, and
turn the other into a "stage0" entry. Remove any matching "stage1"
entry if it exists too. .. all the normal trivial rules ..
Incidentally - it also means that you don't even have to have a separate
subdirectory for this. All the information literally is in the index file,
which is a temporary thing anyway. There is no need to worry about what is in
the working directory, since it is never shown and never used.
see also:
write-tree
show-files
################################################################
rev-list <commit>
Lists commit objects in reverse chronological order starting at the
given commit, taking ancestry relationship into account. This is
useful to produce human-readable log output.
################################################################
rev-tree
rev-tree [--edges] [--cache <cache-file>] [^]<commit> [[^]<commit>]
Provides the revision tree for one or more commits.
--edges
Show edges (ie places where the marking changes between parent
and child)
--cache <cache-file>
Use the specified file as a cache. [Not implemented yet]
[^]<commit>
The commit id to trace (a leading caret means to ignore this
commit-id and below)
Output:
<date> <commit>:<flags> [<parent-commit>:<flags> ]*
<date>
Date in 'seconds since epoch'
<commit>
id of commit object
<parent-commit>
id of each parent commit object (>1 indicates a merge)
<flags>
The flags are read as a bitmask representing each commit
provided on the commandline. eg: given the command:
$ rev-tree <com1> <com2> <com3>
The output:
<date> <commit>:5
means that <commit> is reachable from <com1>(1) and <com3>(4)
A revtree can get quite large. rev-tree will eventually allow you to
cache previous state so that you don't have to follow the whole thing
down.
So the change difference between two commits is literally
rev-tree [commit-id1] > commit1-revtree
rev-tree [commit-id2] > commit2-revtree
join -t : commit1-revtree commit2-revtree > common-revisions
(this is also how to find the most common parent - you'd look at just
the head revisions - the ones that aren't referred to by other
revisions - in "common-revision", and figure out the best one. I
think.)
################################################################
show-diff
show-diff [-R] [-q] [-s] [-z] [paths...]
Shows the difference between the version of the specified file on disk
and the file in the cache.
-R
Reverse the diff
-q
Remain silent even on nonexisting files
-s
Do not show the diff text. Just output SHA1 and name for
changed paths (forces -q)
-z
Machine readable output:
. Each output record has the path name at the end of the
record, instead of the front.
. Each record is terminated with a NUL '\0' character.
. For unchanged files, nothing is output.
. For an unmerged file, the following is output:
U name
. For a deleted file, the following is output:
X name
. For a modified file, the following is output:
SHA1 name
where SHA1 is from the dircache entry.
Environment variables
GIT_DIFF_CMD Default="diff -L 'a/%s' -L 'b/%s'"
Command used to generate diff
GIT_DIFF_OPTS Default="-p -u"
Options passed to diff command
Although in Linus' distribution, show-diff is not part of 'core' git.
################################################################
show-files
show-files [-z] [-t] (--[cached|deleted|others|ignored|stage|unmerged])*
This merges the file listing in the directory cache index with the
actual working directory list, and shows different combinations of the
two.
One or more of the options below may be used to determine the files
shown:
--cached
Show cached files in the output (default)
--deleted
Show deleted files in the output
--others
Show other files in the output
--ignored
Show ignored files in the output
--stage
Show stage files in the output
--unmerged
Show unmerged files in the output (forces --stage)
-t
Show the following tags (followed by a space) at the start of
each line:
H cached
M unmerged
R removed/deleted
? other
-z
\0 line termination on output
Output
show files just outputs the filename unless --stage is specified in
which case it outputs:
[<tag> ]<mode> <object> <stage> <file>
show-files --unmerged" and "show-files --stage " can be used to examine
detailed information on unmerged paths.
For an unmerged path, instead of recording a single mode/SHA1 pair,
the dircache records up to three such pairs; one from tree O in stage
1, A in stage 2, and B in stage 3. This information can be used by
the user (or Cogito) to see what should eventually be recorded at the
path. (see read-cache for more information on state)
see also:
read-cache
################################################################
unpack-file
unpack-file <blob>
Creates a file holding the contents of the blob specified by sha1. It
returns the name of the temporary file in the following format:
.merge_file_XXXXX
<blob>
Must be a blob id
################################################################
update-cache
update-cache [--add] [--remove] [--refresh [--ignore-missing]]
[--cacheinfo <mode> <object> <path>]*
[--] [<file>]*
Modifies the index or directory cache. Each file mentioned is updated
into the cache and any 'unmerged' or 'needs updating' state is
cleared.
The way update-cache handles files it is told about can be modified
using the various options:
--add
If a specified file isn't in the cache already then it's
added.
Default behaviour is to ignore new files.
--remove
If a specified file is in the cache but is missing then it's
removed.
Default behaviour is to ignore removed file.
--refresh
Looks at the current cache and checks to see if merges or
updates are needed by checking stat() information.
--ignore-missing
Ignores missing files during a --refresh
--cacheinfo <mode> <object> <path>
Directly insert the specified info into the cache.
--
Do not interpret any more arguments as options.
<file>
Files to act on.
Note that files begining with '.' are discarded. This includes
"./file" and "dir/./file". If you don't want this, then use
cleaner names.
The same applies to directories ending '/' and paths with '//'
Using --refresh
--refresh does not calculate a new sha1 file or bring the cache
up-to-date for mode/content changes. But what it _does_ do is to
"re-match" the stat information of a file with the cache, so that you
can refresh the cache for a file that hasn't been changed but where
the stat entry is out of date.
For example, you'd want to do this after doing a "read-tree", to link
up the stat cache details with the proper files.
Using --cacheinfo
--cacheinfo is used to register a file that is not in the current
working directory. This is useful for minimum-checkout merging.
To pretend you have a file with mode and sha1 at path, say:
$ update-cache --cacheinfo mode sha1 path
To update and refresh only the files already checked out:
checkout-cache -n -f -a && update-cache --ignore-missing --refresh
################################################################
write-tree
write-tree
Creates a tree object using the current cache.
The cache must be merged.
Conceptually, write-tree sync()s the current directory cache contents
into a set of tree files.
In order to have that match what is actually in your directory right
now, you need to have done a "update-cache" phase before you did the
"write-tree".
################################################################
Terminology: - see README for description
Each line contains terms used interchangeably
object database, .git directory
directory cache, index
id, sha1, sha1-id, sha1 hash
type, tag
blob, blob object
tree, tree object
commit, commit object
parent
root object
changeset
git Environment Variables
AUTHOR_NAME
AUTHOR_EMAIL
AUTHOR_DATE
COMMIT_AUTHOR_NAME
COMMIT_AUTHOR_EMAIL
GIT_DIFF_CMD
GIT_DIFF_OPTS
GIT_INDEX_FILE
SHA1_FILE_DIRECTORY
^ permalink raw reply
* Re: I'm missing isofs.h
From: Jan Harkes @ 2005-04-27 12:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: Petr Baudis, git
In-Reply-To: <20050426214338.32e9ac27.akpm@osdl.org>
On Tue, Apr 26, 2005 at 09:43:38PM -0700, Andrew Morton wrote:
> In a current tree, using git-pasky-0.7:
It looks like git-pasky-0.7 doesn't include the following commit, but
there are also several other diff and merge related fixes that were
added since then.
Jan
commit 65bc81d6fef619d7aadc5c7116be52860539f17a
tree 9adb399af84228740555d732732983b7a02b019d
parent 93256315b2444601a35484f4fb76cd5723284201
author Petr Baudis <pasky@ucw.cz> Sat, 23 Apr 2005 18:05:07 -0700
committer Linus Torvalds <torvalds@ppc970.osdl.org> Sat, 23 Apr 2005 18:05:07 -0700
[PATCH] Fix broken diff-cache output on added files
Added files were errorneously reported with the - prefix by diff-cache,
obviously leading to great confusion.
Signed-off-by: Petr Baudis <pasky@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
^ permalink raw reply
* [PATCH 7/6] Leftover bits.
From: Junio C Hamano @ 2005-04-27 8:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This is a minor cleanup to be applied on top of the last 6
patches (the 6th is an alternate so the count actually is five).
It makes sure that external diff interface functions are called
even when diff-tree detects directory changes. Since it is not
clear what to pass GIT_EXTERNAL_DIFF when we do see directory
changes, such calls are still currently dropped at the diff.c
interface function level, but whatever we will decide to do
later, the interface users should be cleaned up first anyway,
and that is primarily what this patch is about.
It also adds code to unlink temporary files used to call the
external diff command upon SIGNIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff-tree.c | 9 +++------
diff.c | 14 ++++++++++++++
2 files changed, 17 insertions(+), 6 deletions(-)
# - 04/26 23:58 Mark the last of '-p' (patch) work.
# + working tree
--- k/diff-tree.c
+++ l/diff-tree.c
@@ -82,8 +82,7 @@ static void show_file(const char *prefix
}
if (generate_patch) {
- if (!S_ISDIR(mode))
- diff_addremove(prefix[0], mode, sha1, base, path);
+ diff_addremove(prefix[0], mode, sha1, base, path);
}
else
printf("%s%06o\t%s\t%s\t%s%s%c", prefix, mode,
@@ -135,10 +134,8 @@ static int compare_tree_entry(void *tree
return retval;
}
- if (generate_patch) {
- if (!S_ISDIR(mode1))
- diff_change(mode1, mode2, sha1, sha2, base, path1);
- }
+ if (generate_patch)
+ diff_change(mode1, mode2, sha1, sha2, base, path1);
else {
strcpy(old_sha1_hex, sha1_to_hex(sha1));
printf("*%06o->%06o\t%s\t%s->%s\t%s%s%c", mode1, mode2,
--- k/diff.c
+++ l/diff.c
@@ -127,6 +127,7 @@ static void prepare_temp_file(const char
if (one->sha1_valid &&
!memcmp(one->u.sha1, null_sha1, sizeof(null_sha1))) {
+ /* "It is valid but please go to the filesystem." */
one->sha1_valid = 0;
one->u.name = name;
}
@@ -180,6 +181,11 @@ static void remove_tempfile(void)
}
}
+static void remove_tempfile_on_signal(int signo)
+{
+ remove_tempfile();
+}
+
/* An external diff command takes:
*
* diff-cmd name infile1 infile1-sha1 infile1-mode \
@@ -202,6 +208,7 @@ void run_external_diff(const char *name,
temp[1].name == temp[1].tmp_path)) {
atexit_asked = 1;
atexit(remove_tempfile);
+ signal(SIGINT, remove_tempfile_on_signal);
}
}
@@ -211,6 +218,7 @@ void run_external_diff(const char *name,
die("unable to fork");
if (!pid) {
const char *pgm = external_diff();
+
if (pgm) {
if (one && two)
execlp(pgm, pgm,
@@ -243,6 +251,9 @@ void diff_addremove(int addremove, unsig
char concatpath[PATH_MAX];
struct diff_spec spec[2], *one, *two;
+ if (S_ISDIR(mode))
+ return;
+
memcpy(spec[0].u.sha1, sha1, 20);
spec[0].mode = mode;
spec[0].sha1_valid = spec[0].file_valid = 1;
@@ -269,6 +280,9 @@ void diff_change(unsigned old_mode, unsi
char concatpath[PATH_MAX];
struct diff_spec spec[2];
+ if (S_ISDIR(old_mode) || S_ISDIR(new_mode))
+ return;
+
memcpy(spec[0].u.sha1, old_sha1, 20);
spec[0].mode = old_mode;
memcpy(spec[1].u.sha1, new_sha1, 20);
^ permalink raw reply
* Re: enforcing DB immutability
From: Wout @ 2005-04-27 8:15 UTC (permalink / raw)
To: Ingo Molnar; +Cc: git
In-Reply-To: <20050420074948.GA22620@elte.hu>
On Wed, Apr 20, 2005 at 09:49:48AM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > perhaps having a new 'immutable hardlink' feature in the Linux VFS
> > would help? I.e. a hardlink that can only be readonly followed, and
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i
> > think would be a large enough barrier for editors/build-tools not to
> > play the tricks they already do that makes 'readonly' files virtually
> > meaningless.
>
> immutable hardlinks have the following advantage: a hardlink by design
> hides the information where the link comes from. So even if an editor
> wanted to play stupid games and override the immutability - it doesnt
> know where the DB object is. (sure, it could find it if it wants to, but
> that needs real messing around - editors wont do _that_)
>
> i think this might work.
>
> (the current chattr +i flag isnt quite what we need though because it
> works on the inode, and it's also a root-only feature so it puts us back
> to square one. What would be needed is an immutability flag on
> hardlinks, settable by unprivileged users.)
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Slightly off-topic for this list. Apologies to those offended.
Would a filesystem that allows sharing of blocks between inodes
be useful here? Each block would need a reference count (refco).
Writing a block would be impossible once refco > 1. If someone
attempts to write to such a block, a new block is allocated for
that particular inode and the refco of the original is decreased.
Next to this there would have to be a clone_file() function:
clone_file(src-file, dst-file, mode)
This function would create file dst-file with a new inode that
references the blocks belonging to src-file (increasing the
blocks' reference counts). The owner/group of dst-file are the
caller, not the owner of src-file.
Things to check for are:
- read permissions for src-file
- write permissions for dst-file
- are src-file and dst-file in the same filesystem (if not,
one could implement copy)
- ...?
Suppose I have a file foo:
foo -> inode1(blk1[1], blk2[1], blk3[1], blk4[1])
The [n] value on the blocks is the reference count.
I now call clone_file("foo", "bar", 0644):
foo -> inode1(blk1[2], blk2[2], blk3[2], blk4[2])
bar -> inode2(blk1[2], blk2[2], blk3[2], blk4[2])
Next I modify blk2 of bar (write):
foo -> inode1(blk1[2], blk2[1], blk3[2], blk4[2])
bar -> inode2(blk1[2], blk5[1], blk3[2], blk4[2])
I see the following uses:
- Checking out a tree of (uncompressed) files with git could be
done using the clone_file() call on each file. This means no
extra disk space is used unless files are edited later.
- Easy way to freeze files for backups. A database (mysql, ...)
could bring its files into an acceptable state, call clone_file()
on them and proceed with its work.
- It could be used to protect user files from external tampering.
Someone mentioned the problems with malware killing his files.
The impact of this could be reduced by having a script that did
a clone_file() on everything as root periodically. If files are
deleted, root would have a backup.
Notes:
- Small changes to files would probably cause all the blocks to
be copied as programs (editors) usually write out the complete
file.
- I don't know anything about implementing filesystems so all of
the above could be complete nonsense.
- The idea isn't mine, I've come across this before under the name
of 'snapshot filesystems' and I think it was patented. I've never
heard of anyone doing this for individual files though.
Wout
^ permalink raw reply
* Re: Mercurial 0.3 vs git benchmarks
From: Ingo Molnar @ 2005-04-27 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Magnus Damm, mason, torvalds, mike.taht, mpm, linux-kernel, git
In-Reply-To: <20050426135606.7b21a2e2.akpm@osdl.org>
* Andrew Morton <akpm@osdl.org> wrote:
> Magnus Damm <magnus.damm@gmail.com> wrote:
> >
> > My primitive guess is that it was because
> > the ext3 journal became full.
>
> The default ext3 journal size is inappropriately small, btw. Normally
> you should manually make it 128M or so, rather than 32M. Unless you
> have a small amount of memory and/or a large number of filesystems, in
> which case there might be problems with pinned memory.
>
> Mounting as ext2 is a useful technique for determining whether the fs
> is getting in the way.
on ext3, when juggling patches and trees, the biggest performance boost
for me comes from adding noatime,nodiratime to the mount options in
/etc/fstab:
LABEL=/ / ext3 noatime,nodiratime,defaults 1 1
Ingo
^ permalink raw reply
* [PATCH 6/6] Alternative patch to diff-cache.c
From: Junio C Hamano @ 2005-04-27 6:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This is a replacement of PATCH 4, in case you have already
applied the "non-cached still looks only at cache" fix I
sent you earlier. If you took it, PATCH 4 may not apply
cleanly, in which case this should be easier to work with.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff-cache.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 63 insertions(+), 10 deletions(-)
--- k/diff-cache.c
+++ l/diff-cache.c
@@ -1,13 +1,41 @@
#include "cache.h"
+#include "diff.h"
static int cached_only = 0;
+static int generate_patch = 0;
static int line_termination = '\n';
/* A file entry went away or appeared */
static void show_file(const char *prefix, struct cache_entry *ce)
{
- printf("%s%o\t%s\t%s\t%s%c", prefix, ntohl(ce->ce_mode), "blob",
- sha1_to_hex(ce->sha1), ce->name, line_termination);
+ if (generate_patch)
+ diff_addremove(prefix[0], ntohl(ce->ce_mode),
+ ce->sha1, ce->name, NULL);
+ else
+ printf("%s%06o\tblob\t%s\t%s%c", prefix, ntohl(ce->ce_mode),
+ sha1_to_hex(ce->sha1), ce->name,
+ line_termination);
+}
+
+/* A file *may* have been added to the working tree */
+static void show_possible_local_add(struct cache_entry *new)
+{
+ static unsigned char no_sha1[20];
+ struct stat st;
+ if (stat(new->name, &st) < 0)
+ /* We signal the missing file by special mode 0 and
+ * let diff-tree-helper notice the missing file when it
+ * tries to open it by path. Sneaky but works.
+ */
+ st.st_mode = 0;
+ else if (cache_match_stat(new, &st))
+ return show_file("+", new);
+
+ if (generate_patch)
+ diff_addremove('+', st.st_mode, no_sha1, new->name, NULL);
+ else
+ printf("+%06o\tblob\t%s\t%s%c", st.st_mode,
+ sha1_to_hex(no_sha1), new->name, line_termination);
}
static int show_modified(struct cache_entry *old, struct cache_entry *new)
@@ -35,11 +63,15 @@ static int show_modified(struct cache_en
if (mode == oldmode && !memcmp(sha1, old->sha1, 20))
return 0;
- strcpy(old_sha1_hex, sha1_to_hex(old->sha1));
- printf("*%o->%o\t%s\t%s->%s\t%s%c", oldmode, mode,
- "blob",
- old_sha1_hex, sha1_to_hex(sha1),
- old->name, line_termination);
+ if (generate_patch)
+ diff_change(oldmode, mode,
+ old->sha1, sha1, old->name, NULL);
+ else {
+ strcpy(old_sha1_hex, sha1_to_hex(old->sha1));
+ printf("*%06o->%06o\tblob\t%s->%s\t%s%c", oldmode, mode,
+ old_sha1_hex, sha1_to_hex(sha1),
+ old->name, line_termination);
+ }
return 0;
}
@@ -54,20 +86,36 @@ static int diff_cache(struct cache_entry
/* No stage 1 entry? That means it's a new file */
if (!same) {
show_file("+", ce);
+ /* ... not so fast. The working tree may
+ * also not have it anymore.
+ */
+ if (cached_only)
+ show_file("+", ce);
+ else
+ show_possible_local_add(ce);
break;
}
/* Show difference between old and new */
show_modified(ac[1], ce);
break;
case 1:
- /* No stage 3 (merge) entry? That means it's been deleted */
+ /* No stage 3 (merge) entry? That means it's been
+ * deleted.
+ */
if (!same) {
+ /* The working tree may have it, but it does
+ * not matter. If you write-tree and commit
+ * you would lose that file, so take notice.
+ */
show_file("-", ce);
break;
}
/* Otherwise we fall through to the "unmerged" case */
case 3:
- printf("U %s%c", ce->name, line_termination);
+ if (generate_patch)
+ diff_unmerge(ce->name);
+ else
+ printf("U %s%c", ce->name, line_termination);
break;
default:
@@ -102,7 +150,8 @@ static void mark_merge_entries(void)
}
}
-static char *diff_cache_usage = "diff-cache [-r] [-z] [--cached] <tree sha1>";
+static char *diff_cache_usage =
+"diff-cache [-r] [-z] [-p] [--cached] <tree sha1>";
int main(int argc, char **argv)
{
@@ -119,6 +168,10 @@ int main(int argc, char **argv)
/* We accept the -r flag just to look like diff-tree */
continue;
}
+ if (!strcmp(arg, "-p")) {
+ generate_patch = 1;
+ continue;
+ }
if (!strcmp(arg, "-z")) {
line_termination = '\0';
continue;
^ permalink raw reply
* [PATCH 5/6] Teach diff-tree-helper to handle unmerged paths.
From: Junio C Hamano @ 2005-04-27 6:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This patch teaches diff-tree-helper to call diff_unmrege() so
that it can report unmerged paths to GIT_EXTERNAL_DIFF, instead
of consuming it on its own.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff-tree-helper.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
To be tested with the following:
GIT_INDEX_FILE=junk
export GIT_INDEX_FILE
read-tree $(cat .git/HEAD)
t1=$(write-tree)
date >trash ; update-cache --add trash
t2=$(write-tree)
read-tree -m $(cat .git/HEAD) $t1 $t2
update-cache --refresh
./show-diff | GIT_EXTERNAL_DIFF=echo ./diff-tree-helper
--- k/diff-tree-helper.c
+++ l/diff-tree-helper.c
@@ -56,7 +56,7 @@ static int parse_diff_tree_output(const
switch (*cp++) {
case 'U':
- fprintf(stderr, "warning: unmerged path %s\n", cp+1);
+ diff_unmerge(cp + 1);
return WARNED_OURSELVES;
case '+':
old->file_valid = 0;
^ permalink raw reply
* [PATCH 4/6] Add -p (patch) to diff-cache.
From: Junio C Hamano @ 2005-04-27 6:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This patch uses the reworked diff interface to generate patches
directly out of diff-cache when -p is specified.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff-cache.c | 37 ++++++++++++++++++++++++++++---------
1 files changed, 28 insertions(+), 9 deletions(-)
To be tested with the following:
GIT_INDEX_FILE=junk
export GIT_INDEX_FILE
rm -f junk
date >trash ; update-cache --add trash
t1=$(write-tree)
date >>trash;
GIT_EXTERNAL_DIFF=echo ./diff-cache -p $t1
GIT_EXTERNAL_DIFF=echo ./diff-cache -p --cached $t1
update-cache trash
GIT_EXTERNAL_DIFF=echo ./diff-cache -p --cached $t1
./diff-cache -p $(cat .git/HEAD) | filterdiff -i ?/diff-cache.c
--- k/diff-cache.c
+++ l/diff-cache.c
@@ -1,13 +1,20 @@
#include "cache.h"
+#include "diff.h"
static int cached_only = 0;
+static int generate_patch = 0;
static int line_termination = '\n';
/* A file entry went away or appeared */
static void show_file(const char *prefix, struct cache_entry *ce)
{
- printf("%s%o\t%s\t%s\t%s%c", prefix, ntohl(ce->ce_mode), "blob",
- sha1_to_hex(ce->sha1), ce->name, line_termination);
+ if (generate_patch)
+ diff_addremove(prefix[0], ntohl(ce->ce_mode),
+ ce->sha1, ce->name, NULL);
+ else
+ printf("%s%06o\tblob\t%s\t%s%c", prefix, ntohl(ce->ce_mode),
+ sha1_to_hex(ce->sha1), ce->name,
+ line_termination);
}
static int show_modified(struct cache_entry *old, struct cache_entry *new)
@@ -35,11 +42,15 @@ static int show_modified(struct cache_en
if (mode == oldmode && !memcmp(sha1, old->sha1, 20))
return 0;
- strcpy(old_sha1_hex, sha1_to_hex(old->sha1));
- printf("*%o->%o\t%s\t%s->%s\t%s%c", oldmode, mode,
- "blob",
- old_sha1_hex, sha1_to_hex(sha1),
- old->name, line_termination);
+ if (generate_patch)
+ diff_change(oldmode, mode,
+ old->sha1, sha1, old->name, NULL);
+ else {
+ strcpy(old_sha1_hex, sha1_to_hex(old->sha1));
+ printf("*%06o->%06o\tblob\t%s->%s\t%s%c", oldmode, mode,
+ old_sha1_hex, sha1_to_hex(sha1),
+ old->name, line_termination);
+ }
return 0;
}
@@ -67,7 +78,10 @@ static int diff_cache(struct cache_entry
}
/* Otherwise we fall through to the "unmerged" case */
case 3:
- printf("U %s%c", ce->name, line_termination);
+ if (generate_patch)
+ diff_unmerge(ce->name);
+ else
+ printf("U %s%c", ce->name, line_termination);
break;
default:
@@ -102,7 +116,8 @@ static void mark_merge_entries(void)
}
}
-static char *diff_cache_usage = "diff-cache [-r] [-z] [--cached] <tree sha1>";
+static char *diff_cache_usage =
+"diff-cache [-r] [-z] [-p] [--cached] <tree sha1>";
int main(int argc, char **argv)
{
@@ -119,6 +134,10 @@ int main(int argc, char **argv)
/* We accept the -r flag just to look like diff-tree */
continue;
}
+ if (!strcmp(arg, "-p")) {
+ generate_patch = 1;
+ continue;
+ }
if (!strcmp(arg, "-z")) {
line_termination = '\0';
continue;
^ permalink raw reply
* [PATCH 3/6] Add -p (patch) to diff-tree.
From: Junio C Hamano @ 2005-04-27 6:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This patch uses the reworked diff interface to generate patches
directly out of diff-tree when -p is specified.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff-tree.c | 35 ++++++++++++++++++++++++++---------
1 files changed, 26 insertions(+), 9 deletions(-)
To be tested with the following:
GIT_INDEX_FILE=junk
export GIT_INDEX_FILE
rm -f junk
date >trash ; update-cache --add trash
t1=$(write-tree)
date >>trash; update-cache trash
t2=$(write-tree)
GIT_EXTERNAL_DIFF=echo ./diff-tree -p $t1 $t2
--- k/diff-tree.c
+++ l/diff-tree.c
@@ -1,7 +1,9 @@
#include "cache.h"
+#include "diff.h"
static int recursive = 0;
static int line_termination = '\n';
+static int generate_patch = 0;
// What paths are we interested in?
static int nr_paths = 0;
@@ -79,10 +81,15 @@ static void show_file(const char *prefix
return;
}
- printf("%s%o\t%s\t%s\t%s%s%c", prefix, mode,
- S_ISDIR(mode) ? "tree" : "blob",
- sha1_to_hex(sha1), base, path,
- line_termination);
+ if (generate_patch) {
+ if (!S_ISDIR(mode))
+ diff_addremove(prefix[0], mode, sha1, base, path);
+ }
+ else
+ printf("%s%06o\t%s\t%s\t%s%s%c", prefix, mode,
+ S_ISDIR(mode) ? "tree" : "blob",
+ sha1_to_hex(sha1), base, path,
+ line_termination);
}
static int compare_tree_entry(void *tree1, unsigned long size1, void *tree2, unsigned long size2, const char *base)
@@ -128,11 +135,17 @@ static int compare_tree_entry(void *tree
return retval;
}
- strcpy(old_sha1_hex, sha1_to_hex(sha1));
- printf("*%o->%o\t%s\t%s->%s\t%s%s%c", mode1, mode2,
- S_ISDIR(mode1) ? "tree" : "blob",
- old_sha1_hex, sha1_to_hex(sha2), base, path1,
- line_termination);
+ if (generate_patch) {
+ if (!S_ISDIR(mode1))
+ diff_change(mode1, mode2, sha1, sha2, base, path1);
+ }
+ else {
+ strcpy(old_sha1_hex, sha1_to_hex(sha1));
+ printf("*%06o->%06o\t%s\t%s->%s\t%s%s%c", mode1, mode2,
+ S_ISDIR(mode1) ? "tree" : "blob",
+ old_sha1_hex, sha1_to_hex(sha2), base, path1,
+ line_termination);
+ }
return 0;
}
@@ -255,6 +268,10 @@ int main(int argc, char **argv)
recursive = 1;
continue;
}
+ if (!strcmp(arg, "-p")) {
+ generate_patch = 1;
+ continue;
+ }
if (!strcmp(arg, "-z")) {
line_termination = '\0';
continue;
^ permalink raw reply
* [PATCH 2/6] Reactivate show-diff patch generation
From: Junio C Hamano @ 2005-04-27 6:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This patch uses the reworked diff interface to generate patches
directly out of show-diff when -p is specified.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
show-diff.c | 56 ++++++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 42 insertions(+), 14 deletions(-)
To be tested with the following:
GIT_INDEX_FILE=junk
export GIT_INDEX_FILE
rm -f junk
date >trash ; update-cache --add trash
date >>trash;
GIT_EXTERNAL_DIFF=echo ./show-diff -p
update-cache --refresh
GIT_EXTERNAL_DIFF=echo ./show-diff -p
--- k/show-diff.c
+++ l/show-diff.c
@@ -4,10 +4,11 @@
* Copyright (C) Linus Torvalds, 2005
*/
#include "cache.h"
+#include "diff.h"
-static const char *show_diff_usage = "show-diff [-q] [-s] [-z] [paths...]";
+static const char *show_diff_usage = "show-diff [-q] [-s] [-z] [-p] [paths...]";
-static int recursive = 0;
+static int generate_patch = 0;
static int line_termination = '\n';
static int silent = 0;
static int silent_on_nonexisting_files = 0;
@@ -27,27 +28,57 @@ static int matches_pathspec(struct cache
return 0;
}
-static void show_file(const char *prefix, struct cache_entry *ce)
+static void show_unmerge(const char *path)
{
- printf("%s%o\t%s\t%s\t%s%c", prefix, ntohl(ce->ce_mode), "blob",
- sha1_to_hex(ce->sha1), ce->name, line_termination);
+ if (generate_patch)
+ diff_unmerge(path);
+ else
+ printf("U %s%c", path, line_termination);
+}
+
+static void show_file(int pfx, struct cache_entry *ce)
+{
+ if (generate_patch)
+ diff_addremove(pfx, ntohl(ce->ce_mode), ce->sha1,
+ ce->name, NULL);
+ else
+ printf("%c%06o\t%s\t%s\t%s%c",
+ pfx, ntohl(ce->ce_mode), "blob",
+ sha1_to_hex(ce->sha1), ce->name, line_termination);
+}
+
+static void show_modified(int oldmode, int mode,
+ const char *old_sha1, const char *sha1,
+ char *path)
+{
+ char old_sha1_hex[41];
+ strcpy(old_sha1_hex, sha1_to_hex(old_sha1));
+
+ if (generate_patch)
+ diff_change(oldmode, mode, old_sha1, sha1, path, NULL);
+ else
+ printf("*%06o->%06o\tblob\t%s->%s\t%s%c",
+ oldmode, mode, old_sha1_hex, sha1_to_hex(sha1), path,
+ line_termination);
}
int main(int argc, char **argv)
{
- static const char null_sha1_hex[] = "0000000000000000000000000000000000000000";
+ static const char null_sha1[20] = { 0, };
int entries = read_cache();
int i;
while (1 < argc && argv[1][0] == '-') {
if (!strcmp(argv[1], "-s"))
silent_on_nonexisting_files = silent = 1;
+ else if (!strcmp(argv[1], "-p"))
+ generate_patch = 1;
else if (!strcmp(argv[1], "-q"))
silent_on_nonexisting_files = 1;
else if (!strcmp(argv[1], "-z"))
line_termination = 0;
else if (!strcmp(argv[1], "-r"))
- recursive = 1; /* No-op */
+ ; /* no-op */
else
usage(show_diff_usage);
argv++; argc--;
@@ -72,8 +103,7 @@ int main(int argc, char **argv)
continue;
if (ce_stage(ce)) {
- printf("U %s%c", ce->name, line_termination);
-
+ show_unmerge(ce->name);
while (i < entries &&
!strcmp(ce->name, active_cache[i]->name))
i++;
@@ -88,7 +118,7 @@ int main(int argc, char **argv)
}
if (silent_on_nonexisting_files)
continue;
- show_file("-", ce);
+ show_file('-', ce);
continue;
}
changed = cache_match_stat(ce, &st);
@@ -98,10 +128,8 @@ int main(int argc, char **argv)
oldmode = ntohl(ce->ce_mode);
mode = S_IFREG | ce_permissions(st.st_mode);
- printf("*%o->%o\t%s\t%s->%s\t%s%c",
- oldmode, mode, "blob",
- sha1_to_hex(ce->sha1), null_sha1_hex,
- ce->name, line_termination);
+ show_modified(oldmode, mode, ce->sha1, null_sha1,
+ ce->name);
}
return 0;
}
^ permalink raw reply
* [PATCH 1/6] Reworked external diff interface.
From: Junio C Hamano @ 2005-04-27 6:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekcwkagr.fsf@assigned-by-dhcp.cox.net>
This patch introduces three public functions for diff-cache and
friends can use to call out to the GIT_EXTERNAL_DIFF program
when they wish to. A normal "add/remove/change" entry is turned
into 7-parameter process invocation of GIT_EXTERNAL_DIFF program
as before. In addition, the program can now be called with a
single parameter when diff-cache and friends want to report an
unmerged path.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff.c | 93 ++++++++++++++++++++++++++++++++++++++++-------------------------
diff.h | 19 +++++++++----
2 files changed, 72 insertions(+), 40 deletions(-)
To be tested with the following:
GIT_INDEX_FILE=junk
export GIT_INDEX_FILE
read-tree $(cat .git/HEAD)
t1=$(write-tree)
date >trash ; update-cache --add trash
t2=$(write-tree)
read-tree -m $(cat .git/HEAD) $t1 $t2
update-cache --refresh
./show-diff | GIT_EXTERNAL_DIFF=echo ./diff-tree-helper
--- k/diff.c
+++ l/diff.c
@@ -194,13 +194,15 @@ void run_external_diff(const char *name,
int pid, status;
static int atexit_asked = 0;
- prepare_temp_file(name, &temp[0], one);
- prepare_temp_file(name, &temp[1], two);
- if (! atexit_asked &&
- (temp[0].name == temp[0].tmp_path ||
- temp[1].name == temp[1].tmp_path)) {
- atexit_asked = 1;
- atexit(remove_tempfile);
+ if (one && two) {
+ prepare_temp_file(name, &temp[0], one);
+ prepare_temp_file(name, &temp[1], two);
+ if (! atexit_asked &&
+ (temp[0].name == temp[0].tmp_path ||
+ temp[1].name == temp[1].tmp_path)) {
+ atexit_asked = 1;
+ atexit(remove_tempfile);
+ }
}
fflush(NULL);
@@ -209,16 +211,23 @@ void run_external_diff(const char *name,
die("unable to fork");
if (!pid) {
const char *pgm = external_diff();
- if (pgm)
- execlp(pgm, pgm,
- name,
- temp[0].name, temp[0].hex, temp[0].mode,
- temp[1].name, temp[1].hex, temp[1].mode,
- NULL);
+ if (pgm) {
+ if (one && two)
+ execlp(pgm, pgm,
+ name,
+ temp[0].name, temp[0].hex, temp[0].mode,
+ temp[1].name, temp[1].hex, temp[1].mode,
+ NULL);
+ else
+ execlp(pgm, pgm, name, NULL);
+ }
/*
* otherwise we use the built-in one.
*/
- builtin_diff(name, temp);
+ if (one && two)
+ builtin_diff(name, temp);
+ else
+ printf("* Unmerged path %s\n", name);
exit(0);
}
if (waitpid(pid, &status, 0) < 0 || !WIFEXITED(status))
@@ -227,41 +236,55 @@ void run_external_diff(const char *name,
remove_tempfile();
}
-void show_diff_empty(const struct cache_entry *ce, int reverse)
+void diff_addremove(int addremove, unsigned mode,
+ const unsigned char *sha1,
+ const char *base, const char *path)
{
+ char concatpath[PATH_MAX];
struct diff_spec spec[2], *one, *two;
- memcpy(spec[0].u.sha1, ce->sha1, 20);
- spec[0].mode = ntohl(ce->ce_mode);
+ memcpy(spec[0].u.sha1, sha1, 20);
+ spec[0].mode = mode;
spec[0].sha1_valid = spec[0].file_valid = 1;
spec[1].file_valid = 0;
- if (reverse) {
+ if (addremove == '+') {
one = spec + 1; two = spec;
} else {
one = spec; two = one + 1;
}
-
- run_external_diff(ce->name, one, two);
+
+ if (path) {
+ strcpy(concatpath, base);
+ strcat(concatpath, "/");
+ strcat(concatpath, path);
+ }
+ run_external_diff(path ? concatpath : base, one, two);
}
-void show_differences(const struct cache_entry *ce, int reverse)
-{
- struct diff_spec spec[2], *one, *two;
-
- memcpy(spec[0].u.sha1, ce->sha1, 20);
- spec[0].mode = ntohl(ce->ce_mode);
+void diff_change(unsigned old_mode, unsigned new_mode,
+ const unsigned char *old_sha1,
+ const unsigned char *new_sha1,
+ const char *base, const char *path) {
+ char concatpath[PATH_MAX];
+ struct diff_spec spec[2];
+
+ memcpy(spec[0].u.sha1, old_sha1, 20);
+ spec[0].mode = old_mode;
+ memcpy(spec[1].u.sha1, new_sha1, 20);
+ spec[1].mode = new_mode;
spec[0].sha1_valid = spec[0].file_valid = 1;
+ spec[1].sha1_valid = spec[1].file_valid = 1;
- spec[1].u.name = ce->name; /* the name we stated */
- spec[1].sha1_valid = 0;
- spec[1].file_valid = 1;
-
- if (reverse) {
- one = spec + 1; two = spec;
- } else {
- one = spec; two = one + 1;
+ if (path) {
+ strcpy(concatpath, base);
+ strcat(concatpath, "/");
+ strcat(concatpath, path);
}
+ run_external_diff(path ? concatpath : base, &spec[0], &spec[1]);
+}
- run_external_diff(ce->name, one, two);
+void diff_unmerge(const char *path)
+{
+ run_external_diff(path, NULL, NULL);
}
--- k/diff.h
+++ l/diff.h
@@ -4,11 +4,20 @@
#ifndef DIFF_H
#define DIFF_H
-/* These two are for backward compatibility with show-diff;
- * new users should not use them.
- */
-extern void show_differences(const struct cache_entry *ce, int reverse);
-extern void show_diff_empty(const struct cache_entry *ce, int reverse);
+extern void diff_addremove(int addremove,
+ unsigned mode,
+ const unsigned char *sha1,
+ const char *base,
+ const char *path);
+
+extern void diff_change(unsigned mode1, unsigned mode2,
+ const unsigned char *sha1,
+ const unsigned char *sha2,
+ const char *base, const char *path);
+
+extern void diff_unmerge(const char *path);
+
+/* These are for diff-tree-helper */
struct diff_spec {
union {
^ permalink raw reply
* [PATCH 0/6] External diff interface for diff-cache and friends.
From: Junio C Hamano @ 2005-04-27 6:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
As discussed earlier, I am sending a series of patches to take
advantage of the simplified external diff interface you have
already merged. This series consists of the following:
[PATCH 1/6] Reworked external diff interface.
This patch introduces three public functions for diff-cache
and friends can use to call out to the GIT_EXTERNAL_DIFF
program when they wish to. A normal "add/remove/change"
entry is turned into 7-parameter process invocation of
GIT_EXTERNAL_DIFF program as before. In addition, the
program can now be called with a single parameter when
diff-cache and friends want to report an unmerged path.
[PATCH 2/6] Reactivate show-diff patch generation
This patch uses the reworked diff interface to generate
patches directly out of show-diff when -p is specified.
[PATCH 3/6] Add -p (patch) to diff-tree.
This patch uses the reworked diff interface to generate
patches directly out of diff-tree when -p is specified.
[PATCH 4/6] Add -p (patch) to diff-cache.
This patch uses the reworked diff interface to generate
patches directly out of diff-cache when -p is specified.
[PATCH 5/6] Teach diff-tree-helper to handle unmerged paths.
This patch teaches diff-tree-helper to call diff_unmreged()
so that it can report unmerged paths to GIT_EXTERNAL_DIFF.
[PATCH 6/6] Alternative patch to diff-cache.c
This is a replacement of PATCH 4, in case you have already
applied the "non-cached still looks only at cache" fix I
sent you earlier. If you took it, PATCH 4 may not apply
cleanly, in which case this should be easier to work with.
^ permalink raw reply
* A shortcoming of the git repo format
From: H. Peter Anvin @ 2005-04-27 5:43 UTC (permalink / raw)
To: Git Mailing List
Most of git's files are starting to converge toward an RFC822-like
header with (tag, data) and a free-form section. This is a good thing.
However, there is one problem with this, and that is that without
knowing every possible tag, a program reading the git repository cannot
safely tell what is a link to another git object and what is not. When
I did my repository conversion tools, I simply assumed any string of 20
hexadecimal digits was a pointer, but this is probably a bad idea in the
long run.
Additionally, there is the question of the handling of strings that may
contain \n or even \0 (which may be necessary for some applications).
One solution to all of this would be to define a quoting standard for
strings, and simply require that all free-format strings (like the
author fields) or at least strings that match [0-9a-f]{20}, are always
quoted.
I propose the following:
- Any string containing control characters or \ must be quoted;
- \xXX produces control characters; other characters following \ are
verbatim.
Thus,
link 0123456789abcdef0123
... is a link to an object, whereas ...
string \0123456789abcdef0123
... is a string.
string1 This string begins with a space
string2 This string has an embedded newline ("\x0a")
... are both valid strings; the first contains a leading space and the
second an embedded newline.
I'll implement this and integrate it tomorrow.
-hpa
^ permalink raw reply
* I'm missing isofs.h
From: Andrew Morton @ 2005-04-27 4:43 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
In a current tree, using git-pasky-0.7:
bix:/usr/src/git26> cat .git/tags/v2.6.12-rc3
a2755a80f40e5794ddc20e00f781af9d6320fafb
bix:/usr/src/git26> git diff -r v2.6.12-rc3|grep isofs.h
+#include "isofs.h"
#include "zisofs.h"
+#include "isofs.h"
+#include "isofs.h"
+#include "isofs.h"
#include "zisofs.h"
+#include "isofs.h"
+#include "isofs.h"
+#include "isofs.h"
+#include "isofs.h"
That diff should have included the addition of the new isofs.h, but it
isn't there.
^ permalink raw reply
* [PATCH] diff-files: in the spirit of diff-cache and diff-tree
From: Nicolas Pitre @ 2005-04-27 4:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Here's a diff-files implementation to go along with diff-cache and
diff-tree. It is based on pieces taken from show-diff and show-files.
The difference with show-diff is that it can handle files which are not
(yet) in the cache. And since the show-diff arguments are a bit awkward
I decided it would be better to leave it alone and create a new tool.
IMHO show-diff could simply be removed once all its users have been
switched over to diff-files.
Signed-off-by: Nicolas Pitre <nico@cam.org>
--- k/Makefile
+++ l/Makefile
@@ -18,7 +18,7 @@ PROG= update-cache show-diff init-db w
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
check-files ls-tree merge-base merge-cache unpack-file git-export \
diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
- diff-tree-helper
+ diff-tree-helper diff-files
all: $(PROG)
--- k/diff-files.c
+++ l/diff-files.c
@@ -0,0 +1,246 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+
+#include <dirent.h>
+#include "cache.h"
+
+static const char *diff_files_usage = "diff-files [-d] [-o] [-z] [paths...]";
+
+/* What paths are we interested in? */
+static int nr_paths = 0;
+static char **paths = NULL;
+static int *pathlens = NULL;
+
+/*
+ * see if name matches our specified paths.
+ * return value:
+ * -1 if no match
+ * 0 if partial match (name is a directory component)
+ * 1 = exact match
+ * 2 = name is within a specified directory path
+ */
+static int path_match(const char *name, int namelen)
+{
+ int i;
+ if (!nr_paths)
+ return 2;
+ for (i = 0; i < nr_paths; i++) {
+ int pathlen = pathlens[i];
+ if (pathlen == namelen &&
+ strncmp(paths[i], name, pathlen) == 0) {
+ return 1;
+ } else if (pathlen > namelen &&
+ strncmp(paths[i], name, namelen) == 0 &&
+ paths[i][namelen] == '/') {
+ return 0;
+ } else if (pathlen < namelen &&
+ strncmp(paths[i], name, pathlen) == 0 &&
+ name[pathlen] == '/') {
+ return 2;
+ }
+ }
+ return -1;
+}
+
+static const char **dir;
+static int nr_dir;
+static int dir_alloc;
+
+static void add_name(const char *pathname, int len)
+{
+ char *name;
+
+ if (cache_name_pos(pathname, len) >= 0)
+ return;
+
+ if (nr_dir == dir_alloc) {
+ dir_alloc = alloc_nr(dir_alloc);
+ dir = realloc(dir, dir_alloc*sizeof(char *));
+ }
+ name = malloc(len + 1);
+ memcpy(name, pathname, len + 1);
+ dir[nr_dir++] = name;
+}
+
+/*
+ * Read a directory tree. We currently ignore anything but
+ * directories and regular files. That's because git doesn't
+ * handle them at all yet. Maybe that will change some day.
+ *
+ * Also, we currently ignore all names starting with a dot.
+ * That likely will not change.
+ */
+static void read_directory(const char *path, const char *base, int baselen, int match)
+{
+ DIR *dir = opendir(path);
+
+ if (dir) {
+ struct dirent *de;
+ char fullname[MAXPATHLEN + 1];
+ memcpy(fullname, base, baselen);
+
+ while ((de = readdir(dir)) != NULL) {
+ int len;
+
+ if (de->d_name[0] == '.')
+ continue;
+ len = strlen(de->d_name);
+ memcpy(fullname + baselen, de->d_name, len+1);
+ if (match < 2)
+ match = path_match(fullname, baselen+len);
+ if (match < 0)
+ continue;
+
+ switch (de->d_type) {
+ struct stat st;
+ default:
+ continue;
+ case DT_UNKNOWN:
+ if (lstat(fullname, &st))
+ continue;
+ if (S_ISREG(st.st_mode))
+ break;
+ if (!S_ISDIR(st.st_mode))
+ continue;
+ /* fallthrough */
+ case DT_DIR:
+ memcpy(fullname + baselen + len, "/", 2);
+ read_directory(fullname, fullname,
+ baselen + len + 1,
+ match == 1 ? 2 : 0);
+ continue;
+ case DT_REG:
+ break;
+ }
+ if (match > 0)
+ add_name(fullname, baselen + len);
+ }
+ closedir(dir);
+ }
+}
+
+static int cmp_name(const void *p1, const void *p2)
+{
+ const char *n1 = *(const char **)p1;
+ const char *n2 = *(const char **)p2;
+ int l1 = strlen(n1), l2 = strlen(n2);
+
+ return cache_name_compare(n1, l1, n2, l2);
+}
+
+static int show_deleted = 0;
+static int show_others = 0;
+static int line_terminator = '\n';
+
+static const char null_sha1_hex[] = "0000000000000000000000000000000000000000";
+
+static void show_file(const char *prefix, unsigned int mode,
+ const char *sha1, const char *name)
+{
+ printf("%s%o\t%s\t%s\t%s%c", prefix, mode, "blob",
+ sha1, name, line_terminator);
+}
+
+int main(int argc, char **argv)
+{
+ int i, entries;
+
+ for (i = 1; i < argc; i++) {
+ char *arg = argv[i];
+
+ if (*arg != '-')
+ break;
+
+ if (!strcmp(arg, "-z")) {
+ line_terminator = 0;
+ continue;
+ }
+ if (!strcmp(arg, "-d")) {
+ show_deleted = 1;
+ continue;
+ }
+ if (!strcmp(arg, "-o")) {
+ show_others = 1;
+ continue;
+ }
+
+ usage(diff_files_usage);
+ }
+
+ if (i < argc) {
+ paths = &argv[i];
+ nr_paths = argc - i;
+ pathlens = malloc(nr_paths * sizeof(int));
+ for (i=0; i<nr_paths; i++) {
+ pathlens[i] = strlen(paths[i]);
+ if (paths[i][pathlens[i] - 1] == '@')
+ pathlens[i]--;
+ }
+ }
+
+ entries = read_cache();
+ if (entries < 0) {
+ perror("read_cache");
+ exit(1);
+ }
+
+ if (show_others) {
+ read_directory(".", "", 0, 0);
+ qsort(dir, nr_dir, sizeof(char *), cmp_name);
+ for (i = 0; i < nr_dir; i++) {
+ struct stat st;
+ unsigned int mode;
+ if (stat(dir[i], &st) < 0) {
+ perror(dir[i]);
+ } else {
+ mode = S_IFREG | ce_permissions(st.st_mode);
+ show_file("+", mode, null_sha1_hex, dir[i]);
+ }
+ }
+ }
+
+ for (i = 0; i < entries; i++) {
+ struct stat st;
+ unsigned int ce_mode, mode;
+ struct cache_entry *ce = active_cache[i];
+ int changed;
+
+ if (path_match(ce->name, ce_namelen(ce)) < 1)
+ continue;
+
+ if (ce_stage(ce)) {
+ printf("U %s%c", ce->name, line_terminator);
+ do {
+ i++;
+ } while (i < entries &&
+ !strcmp(ce->name, active_cache[i]->name));
+ continue;
+ }
+
+ ce_mode = ntohl(ce->ce_mode);
+ if (stat(ce->name, &st) < 0) {
+ if (errno != ENOENT) {
+ perror(ce->name);
+ } else if (show_deleted) {
+ show_file("-", ce_mode,
+ sha1_to_hex(ce->sha1), ce->name);
+ }
+ continue;
+ }
+
+ changed = cache_match_stat(ce, &st);
+ if (!changed)
+ continue;
+
+ mode = S_IFREG | ce_permissions(st.st_mode);
+ printf("*%o->%o\t%s\t%s->%s\t%s%c",
+ ce_mode, mode, "blob",
+ sha1_to_hex(ce->sha1), null_sha1_hex,
+ ce->name, line_terminator);
+ }
+
+ return 0;
+}
^ permalink raw reply
* Re: [PATCH] Set AUTHOR_DATE in git-tools
From: Greg KH @ 2005-04-27 3:52 UTC (permalink / raw)
To: David Woodhouse; +Cc: torvalds, git
In-Reply-To: <20050426184442.GA20536@kroah.com>
On Tue, Apr 26, 2005 at 11:44:42AM -0700, Greg KH wrote:
> On Thu, Apr 21, 2005 at 05:32:16PM +1000, David Woodhouse wrote:
> > Entirely untested.
>
> Doesn't work :(
I take it back, it works just fine...
My problem was that bk generates dates with a non-rfc compliant timezone
string. And I was trying to apply patches exported from bk in plain
text format (building a udev git tree...) The patch below to
commit-tree.c fixes this issue, if anyone runs into this same issue (I
wouldn't recommend it to be applied, as it's probably a one-off
issue...)
thanks,
greg k-h
--------------
Allow commit-tree to handle the bk date format.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
--- a/commit-tree.c 2005-04-25 22:08:49.000000000 -0700
+++ b/commit-tree.c 2005-04-26 20:46:37.000000000 -0700
@@ -204,6 +204,12 @@
else
return;
+ /* Handle messed up bk dates */
+ if (p[3] == ':') {
+ p[3] = p[4];
+ p[4] = p[5];
+ p[5] = p[6];
+ }
if (!isdigit(p[1]) || !isdigit(p[2]) || !isdigit(p[3]) || !isdigit(p[4]))
return;
^ permalink raw reply
* Re: git "tag" objects implemented - and a re-done commit
From: Matthias Urlichs @ 2005-04-27 3:36 UTC (permalink / raw)
To: git
In-Reply-To: <Pine.LNX.4.58.0504251339020.18901@ppc970.osdl.org>
Hi, Linus Torvalds wrote:
> And if two different developers tag exactly the same object with exactly
> the same tag-name and exactly the same signature, then they get the same
> tag object, and that's fine. They should.
... except that they can't. I mean, the signature is done by different
people at different times, so it can't well be identical.
--
Matthias Urlichs
^ permalink raw reply
* Re: Revised PPC assembly implementation
From: Paul Mackerras @ 2005-04-27 3:39 UTC (permalink / raw)
To: linux; +Cc: davem, git
In-Reply-To: <20050427014712.13552.qmail@science.horizon.com>
linux@horizon.com writes:
> Here's a massively revised version, scheduled very close to optimally for
> the G4. (The main remaining limitation is the loading of the k value
> in %r5, which could be split up more.)
>
> My hope is that the G5 will do decently on it as well.
Nice... your new version takes 4.413 seconds on my G5 for 1000MB,
compared to 4.606 for your old version, i.e. it's about 4.4% faster.
Unfortunately it gives the wrong answer, though.
On my powerbook, which has a 1.5GHz G4 (7447A), the same test takes
4.68 seconds with my version, 4.72 seconds with your old version, but
only 3.90 seconds with your new version.
Care to check the code and find out why it's giving the wrong answer?
Regards,
Paul.
^ permalink raw reply
* Re: git add / update-cache --add fails.
From: Herbert Xu @ 2005-04-27 2:35 UTC (permalink / raw)
To: rhys; +Cc: git
In-Reply-To: <200504260726.04908.rhys@rhyshardwick.co.uk>
Rhys Hardwick <rhys@rhyshardwick.co.uk> wrote:
>
> rhys@metatron:~/repo/learning.repo$ strace update-cache --add w1d4p1.c
...
> open("w1d4p1.c", O_RDONLY) = -1 ENOENT (No such file or
> directory)
The file that you're trying to add doesn't exist.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Revised PPC assembly implementation
From: linux @ 2005-04-27 1:47 UTC (permalink / raw)
To: paulus; +Cc: davem, git, linux
In-Reply-To: <17005.38889.738457.359270@cargo.ozlabs.ibm.com>
Here's a massively revised version, scheduled very close to optimally for
the G4. (The main remaining limitation is the loading of the k value
in %r5, which could be split up more.)
My hope is that the G5 will do decently on it as well.
The G4 can in theory do 3 integer operations per cycle, but only
if everything is arranged just right. Every cycle, it tries to
dispatch the 3 instructions at the bottom of the GIQ. If any
of them stall, that issue slot is lost.
So although it's theoretically out-of-order, if you want it to
sustain 3 instructions per cycle, you have to treat it as in-order.
It required interleaving the STEPDx and UPDATEW macros in a few
complicated ways. I don't have access to a machine for testing,
so some poor schmuck^W^Wgenerous person is needed to find the bugs.
This should be *much* faster than the previous code on a G4, and I hope
it will do better on a G5 as well.
I'm curious if *reducing* the amount of fetch-ahead to 2 words
instead of 4 would help things or not.
Still to do: improve the comments. This level of hackery needs a
lot of commenting...
/*
* SHA-1 implementation for PowerPC.
*
* Copyright (C) 2005 Paul Mackerras <paulus@samba.org>
*/
/*
* We roll the registers for A, B, C, D, E around on each
* iteration; E on iteration t is D on iteration t+1, and so on.
* We use registers 6 - 10 for this. (Registers 27 - 31 hold
* the previous values.)
*/
#define RA(t) (((t)+4)%5+6)
#define RB(t) (((t)+3)%5+6)
#define RC(t) (((t)+2)%5+6)
#define RD(t) (((t)+1)%5+6)
#define RE(t) (((t)+0)%5+6)
/* We use registers 11 - 26 for the W values */
#define W(t) ((t)%16+11)
/* Register 5 is used for the constant k */
/*
* There are three F functions, used four groups of 20:
* - 20 rounds of f0(b,c,d) = "bit wise b ? c : d" = (^b & d) + (b & c)
* - 20 rounds of f1(b,c,d) = b^c^d = (b^d)^c
* - 20 rounds of f2(b,c,d) = majority(b,c,d) = (b&d) + ((b^d)&c)
* - 20 more rounds of f1(b,c,d)
*
* These are all scheduled for near-optimal performance on a G4.
* The G4 is a 3-issue out-of-order machine with 3 ALUs, but it can only
* *consider* starting the oldest 3 instructions per cycle. So to get
* maximum performace out of it, you have to treat it as an in-order
* machine. Which means interleaving the computation round t with the
* computation of W[t+4].
*
* The first 16 rounds use W values loaded directly from memory, while the
* remianing 64 use values computed from those first 16. We preload
* 4 values before starting, so there are three kinds of rounds:
* - The first 12 (all f0) also load the W values from memory.
* - The next 64 compute W(i+4) in parallel. 8*f0, 20*f1, 20*f2, 16*f1.
* - The last 4 (all f1) do not do anything with W.
*
* Therefore, we have 6 different round functions:
* STEPD0_LOAD(t,s) - Perform round t and load W(s). s < 16
* STEPD0_UPDATE(t,s) - Perform round t and compute W(s). s >= 16.
* STEPD1_UPDATE(t,s)
* STEPD2_UPDATE(t,s)
* STEPD1(t) - Perform round t with no load or update.
*
* The G5 is more fully out-of-order, and can find the parallelism
* by itself. The big limit is that it has a 2-cycle ALU latency, so
* even though it's 2-way, the code has to be scheduled as if it's
* 4-way, which can be a limit. To help it, we try to schedule the
* read of RA(t) as late as possible so it doesn't stall waiting for
* the previous round's RE(t-1), and we try to rotate RB(t) as early
* as possible while reading RC(t) (= RB(t-1)) as late as possible.
*/
/* the initial loads. */
#define LOADW(s) \
lwz W(s),(s)*4(%r4)
/*
* This is actually 13 instructions, which is an awkward fit,
* and uses W(s) as a temporary before loading it.
*/
#define STEPD0_LOAD(t,s) \
add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); /* spare slot */ \
add RE(t),RE(t),%r0; and W(s),RC(t),RB(t); rotlwi %r0,RA(t),5; \
add RE(t),RE(t),W(s); add %r0,%r0,%r5; rotlwi RB(t),RB(t),30; \
add RE(t),RE(t),%r0; lwz W(s),(s)*4(%r4);
/*
* This can execute starting with 2 out of 3 possible moduli, so it
* does 2 rounds in 9 cycles, 4.5 cycles/round.
*/
#define STEPD0_UPDATE(t,s) \
add RE(t),RE(t),W(t); andc %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \
add RE(t),RE(t),%r0; and %r0,RC(t),RB(t); xor W(s),W(s),W((s)-8); \
add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \
add RE(t),RE(t),%r5; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1; \
add RE(t),RE(t),%r0;
/* Nicely optimal. Conveniently, also the most common. */
#define STEPD1_UPDATE(t,s) \
add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \
add RE(t),RE(t),%r5; xor %r0,%r0,RC(t); xor W(s),W(s),W((s)-8); \
add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; xor W(s),W(s),W((s)-14); \
add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30; rotlwi W(s),W(s),1;
/*
* The naked version, no UPDATE, for the last 4 rounds. 3 cycles per.
* We could use W(s) as a temp register, but we don't need it.
*/
#define STEPD1(t) \
/* spare slot */ add RE(t),RE(t),W(t); xor %r0,RD(t),RB(t); \
rotlwi RB(t),RB(t),30; add RE(t),RE(t),%r5; xor %r0,%r0,RC(t); \
add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; /* idle */ \
add RE(t),RE(t),%r0;
/* 5 cycles per */
#define STEPD2_UPDATE(t,s) \
add RE(t),RE(t),W(t); and %r0,RD(t),RB(t); xor W(s),W((s)-16),W((s)-3); \
add RE(t),RE(t),%r0; xor %r0,RD(t),RB(t); xor W(s),W(s),W((s)-8); \
add RE(t),RE(t),%r5; and %r0,%r0,RC(t); xor W(s),W(s),W((s)-14); \
add RE(t),RE(t),%r0; rotlwi %r0,RA(t),5; rotlwi W(s),W(s),1; \
add RE(t),RE(t),%r0; rotlwi RB(t),RB(t),30;
#define STEP0_LOAD4(t,s) \
STEPD0_LOAD(t,s); \
STEPD0_LOAD((t+1),(s)+1); \
STEPD0_LOAD((t)+2,(s)+2); \
STEPD0_LOAD((t)+3,(s)+3);
#define STEPUP4(fn, t, s) \
STEP##fn##_UPDATE(t,s); \
STEP##fn##_UPDATE((t)+1,(s)+1); \
STEP##fn##_UPDATE((t)+2,(s)+2); \
STEP##fn##_UPDATE((t)+3,(s)+3); \
#define STEPUP20(fn, t, s) \
STEPUP4(fn, t, s); \
STEPUP4(fn, (t)+4, (s)+4); \
STEPUP4(fn, (t)+4, (s)+4); \
STEPUP4(fn, (t)+12, (s)+12); \
STEPUP4(fn, (t)+16, (s)+16)
.globl sha1_core
sha1_core:
stwu %r1,-80(%r1)
stmw %r13,4(%r1)
/* Load up A - E */
lmw %r27,0(%r3)
mtctr %r5
1:
lis %r5,0x5a82 /* K0-19 */
mr RA(0),%r27
LOADW(0)
mr RB(0),%r28
LOADW(1)
mr RC(0),%r29
LOADW(2)
ori %r5,%r5,0x7999
mr RD(0),%r30
LOADW(3)
mr RE(0),%r31
STEP0_LOAD4(0, 4)
STEP0_LOAD4(4, 8)
STEP0_LOAD4(8, 12)
STEPUP4(D0, 12, 16)
STEPUP4(D0, 16, 20)
lis %r5,0x6ed9 /* K20-39 */
ori %r5,%r5,0xeba1
STEPUP20(D1, 20, 24)
lis %r5,0x8f1b /* K40-59 */
ori %r5,%r5,0xbcdc
STEPUP20(D2, 40, 44)
lis %r5,0xca62 /* K60-79 */
ori %r5,%r5,0xc1d6
STEPUP4(D1, 60, 64)
STEPUP4(D1, 64, 68)
STEPUP4(D1, 68, 72)
STEPUP4(D1, 72, 76)
STEPD1(76)
STEPD1(77)
STEPD1(78)
STEPD1(79)
/* Add results to original values */
add %r31,%r31,RE(0)
add %r30,%r30,RD(0)
add %r29,%r29,RC(0)
add %r28,%r28,RB(0)
add %r27,%r27,RA(0)
addi %r4,%r4,64
bdnz 1b
/* Save final hash, restore registers, and return */
stmw %r27,0(%r3)
lmw %r13,4(%r1)
addi %r1,%r1,80
blr
^ permalink raw reply
* Re: [PATCH] make cg-export use tar-tree
From: Rene Scharfe @ 2005-04-27 1:16 UTC (permalink / raw)
To: Joshua T. Corbin; +Cc: git, Petr Baudis
In-Reply-To: <200504261928.44538.jcorbin@wunjo.org>
Joshua T. Corbin schrieb:
> Here it is (this time with real tabs instead of two spaces ;) ), requires
> Rene's tar-tree patch. Works quite speedily too I might add.
Maybe it's just Thunderbird, but I see single spaces instead of tabs
there.
> + tar=$(mktemp -t cg-export.tar.XXXXXX)
> + tar-tree $id "$base" > $tar
> + case $ext in
> + .tar.gz|.tgz)
> + gzip -c9 $tar > $dest
> + rm -f $tar
> + ;;
> + .tar.bz2)
> + bzip2 -c $tar > $dest
> + rm -f $tar
> + ;;
> + .tar)
> + mv $tar $dest
> + ;;
> + esac
You don't need to create a temporary file using tar-tree. The above can
be done like this:
case $ext in
.tar.gz|.tgz)
tar-tree $id "$base" | gzip -9
;;
.tar.bz2)
tar-tree $id "$base" | bzip2
;;
.tar)
tar-tree $id "$base"
;;
esac > $dest
This is both shorter and (a bit) faster. More easily readable, too,
IMO. Don't fear the pipe. ;-) And I don't think we need to avoid
the triplication of tar-tree calls.
Thanks,
Rene
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox