Git development

Git development
 help / color / mirror / Atom feed

* Re: Mercurial 0.3 vs git benchmarks
From: H. Peter Anvin @ 2005-04-27 18:54 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Florian Weimer, Andrew Morton, Linus Torvalds, magnus.damm, mason,
	mike.taht, mpm, linux-kernel, git
In-Reply-To: <20050427151357.GH1087@cip.informatik.uni-erlangen.de>

Thomas Glanzmann wrote:
> 
> For tar I have no idea why it should slow down the operation, but maybe
> you can enlighten us.
> 

Directory hashing slows down operations that do linear sweeps through 
the filesystem reading every single file, simply because without 
dir_index, there is likely to be a correlation between inode order and 
directory order, whereas with dir_index, readdir() returns entries in 
hash order.

	-hpa

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: H. Peter Anvin @ 2005-04-27 18:47 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20050427183239.GE19011@redhat.com>

Dave Jones wrote:
> 
> That actually broke one of my first git scripts when one of the
> changelog texts started a line with 'tree '.  I hacked around it
> by making my script only grep in the 'head -n4' lines, but this
> seems somewhat fragile having to make assumptions that the field
> I want to see is in the first 4 lines.
> 

You have the delimiter for that; there is an empty line between the 
header and the free-form body, similar as for RFC822.

	-hpa

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Dave Jones @ 2005-04-27 18:32 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <426FD3EE.5000404@zytor.com>

On Wed, Apr 27, 2005 at 11:03:26AM -0700, H. Peter Anvin wrote:
 > Linus Torvalds wrote:
 > >
 > >On Tue, 26 Apr 2005, H. Peter Anvin wrote:
 > >
 > >>One solution to all of this would be to define a quoting standard for 
 > >>strings, and simply require that all free-format strings (like the 
 > >>author fields) or at least strings that match [0-9a-f]{20}, are always 
 > >>quoted.
 > >
 > >
 > >git uses more of the ".newsrc" format, in that it just knows which 
 > >characters are legal or not.
 > >
 > >To find the email address, look for the first '<'. To find the date, look 
 > >for the first '>'. Those characters are not allowed in the name or the 
 > >email, so they act as well-defined delimeters.
 > >
 > 
 > That's true for email addresses, but the point was to distinguish links 
 > to other git objects from any other kind of text.  Currently there is no 
 > such delimiter for that.

That actually broke one of my first git scripts when one of the
changelog texts started a line with 'tree '.  I hacked around it
by making my script only grep in the 'head -n4' lines, but this
seems somewhat fragile having to make assumptions that the field
I want to see is in the first 4 lines.

		Dave

^ permalink raw reply

* Re: Finding file revisions
From: Chris Mason @ 2005-04-27 18:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504271027460.18901@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]

On Wednesday 27 April 2005 13:34, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Chris Mason wrote:
> > Is there a faster way?
>
> Yes. Tell "diff-tree" what your desired files are, and it will cut down
> the amount of work by a _lot_ (because then diff-tree doesn't need to
> recurse into subdirectories that don't matter).

Thanks.  I originally called diff-tree without the file list so that I could 
do the regexp matching, but this is probably one of those features that will 
never get used.

My test case here is a tree with 400 commits, giving diff-tree the file list 
brings us down from 16s to 9s on a cold cache.  Hot cache is about 1.5 
seconds on both.

>
> > This will scale pretty badly as the tree grows, but
> > I usually only want to search back a few months in the history.  So, it
> > might make sense to limit the results by date or commit/tag.
>
> With more history, "rev-list" should do basically the right thing: it will
> be constant-time for _recent_ commits, and it is linear time in how far
> back you want to go. Which seems quite reasonable.
>
> And diff-tree is obviously constant-time (and very fast at that,
> especially if you limit it to just a few files, since then it won't even
> bother with any other subdirectories).

Usually the question I will want to ask is "how did foo.c change since tag X", 
which usually won't go back more then a few months.   This should be 
reasonable, and I'd rather not slow down common operations adding extra 
indexing for the uncommon file-changes run.

So, new prog attached.  New usage:

file-changes [-c commit_id] [-s commit_id] file ...

-c is the commit where you want to start searching
-s is the commit where you want to stop searching

-chris

[-- Attachment #2: file-changes --]
[-- Type: application/x-perl, Size: 2027 bytes --]

^ permalink raw reply

* Re: Cogito Tutorial If It Helps
From: Alan Chandler @ 2005-04-27 18:22 UTC (permalink / raw)
  To: git
In-Reply-To: <1114548747.3083.1.camel@kryten>

On Tuesday 26 April 2005 21:52, James Purser wrote:
> I reworked the previous tutorial to take in the changes in the scripts.
> Will make this a series of tutorials to cover all aspects. Any
> suggestions or hints or spelling corrections would be most welcome.
>
> http://ksit.dynalias.com/articles.php?s_id=46&art_id=41

Although I have been reading this mailing list since almost the beginning, I 
have not had a chance to download and try anything.  Using this message as an 
incentive to start, I started to follow this.

However I have run into problems.  

Let me try and explain.

The first part of the tutorial of loading the tarball and building things is 
fine (should be, its a well trodden mental model) - and actually for me I did 
not have libcurl3-dev installed the first time - but because I already had 
the mental model in my mind on this stage it was easy to fit.

I then issued the cg-clone command to get a fresh copy of cogito.  This is 
where I think it would be useful to take time-out from the tutorial and 
explain what I have here.  For me at least, if I don't have a mental model of 
what is happening, I am totally confused.

I "think" I understand the git repository with the various content addressable 
objects.  Reading the README file describes that quite well.  I assume that 
is what is stored in the .git subdirectory (although I have yet to find any 
text that formally says that).

Where I am confused is the relationship between what is in the .git 
subdirectory and the project tree of cogito that sits around it.  Obviously I 
understand that its the latest version of the project as represented by the 
objects in the repository, but what I don't really understand (and neither 
your tutorial nor all the explanations of each of the commands in the README 
really explain it either) is how the various commands adjust the 
relationship.

For instance cg-branch-add seems to add a branch to the repository from a url 
(I assume it downloads any "blobs" etc that are not already in my local 
repository and creates a tag that identifies the head of a tree object), but 
a don't understand how I am supposed see that particular branch as expanded 
code.  (I suspect it might be cg-seek, but I am not really sure - and if it 
is how do you find out what branch this expanded code is now pointed to?).  
But what do cg-update and cg-pull do in terms of the uncompressed code 
sitting in the surrounding directory round the repository, particularly when 
you perform them on a branch that is not the one that the code refers to.  

The reason I raise all this, is when I follow through on your tutorial and get 
to the cg-diff stage I get this

xargs: cg-Xdiffdo: No such file or directory

And I have absolutely no idea whats wrong or where to start looking.

-- 
Alan Chandler
http://www.chandlerfamily.org.uk

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: H. Peter Anvin @ 2005-04-27 18:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504270820370.18901@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> On Tue, 26 Apr 2005, H. Peter Anvin wrote:
> 
>>One solution to all of this would be to define a quoting standard for 
>>strings, and simply require that all free-format strings (like the 
>>author fields) or at least strings that match [0-9a-f]{20}, are always 
>>quoted.
> 
> 
> git uses more of the ".newsrc" format, in that it just knows which 
> characters are legal or not.
> 
> To find the email address, look for the first '<'. To find the date, look 
> for the first '>'. Those characters are not allowed in the name or the 
> email, so they act as well-defined delimeters.
> 

That's true for email addresses, but the point was to distinguish links 
to other git objects from any other kind of text.  Currently there is no 
such delimiter for that.  Another solution than the one I posted would 
be to define such a delimiter, for example '<' + 20 hex character + '>' 
(which would be distinguished from email addresses by the lack of an @ 
sign.)  That would be a repo change, though.

Given no prior constraints, I would probably argue for a format which 
makes the data type known as a matter of syntax, using "..." quoted 
strings for *ALL* arbitrary strings, a different syntax for numbers and 
links, and leaving the door open for new data types like lists in the 
future.

	-hpa

^ permalink raw reply

* Re: I'm missing isofs.h
From: Steven Cole @ 2005-04-27 17:40 UTC (permalink / raw)
  To: Steven Cole; +Cc: Andrew Morton, Petr Baudis, git
In-Reply-To: <426FB03B.9090509@mesatop.com>

Steven Cole wrote:
> Andrew Morton wrote:
> 
>> In a current tree, using git-pasky-0.7:
>>
>> bix:/usr/src/git26> cat .git/tags/v2.6.12-rc3 
>> a2755a80f40e5794ddc20e00f781af9d6320fafb
>> bix:/usr/src/git26> git diff -r v2.6.12-rc3|grep isofs.h
>> +#include "isofs.h"
>>  #include "zisofs.h"
>> +#include "isofs.h"
>> +#include "isofs.h"
>> +#include "isofs.h"
>>  #include "zisofs.h"
>> +#include "isofs.h"
>> +#include "isofs.h"
>> +#include "isofs.h"
>> +#include "isofs.h"
>>
>>
>> That diff should have included the addition of the new isofs.h, but it
>> isn't there.
>>
> 
> I'm seeing unexplained behaviour using the above technique, and I'm
> also seeing fs/isofs/isofs.h as missing, along with seven other changes.
> 

Jan Harkes has found the problem to be a missing ':' at the end of the tag.

Steven

^ permalink raw reply

* Re: git add / update-cache --add fails.
From: Ed L Cashin @ 2005-04-27 17:38 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050427173059.GE22956@pasky.ji.cz>

Petr Baudis <pasky@ucw.cz> writes:

...
>> +		fprintf(stderr, "update-cache Error: %s\n", strerror(errno));
...
> FWIW, I have this in my tree for some time already. :-)

OK, nice.

...
>
>> By the way, I created that patch with "git diff" in my git-pasky
>> working directory.  Strangely, I had to redirect standard error to the
>> same place as standard output in order to get the filename in the diff
>> output.  I didn't check why the filename is on standard error,
>> though.
>
> Interesting. Anyway, you are apparently using some quite antique
> git-pasky version.

It wasn't on purpose!  :)

Thinking back, my "git pull" probably didn't merge correctly because
it was too old.  I'll just blow away the old one and start over.

-- 
  Ed L Cashin <ecashin@coraid.com>


^ permalink raw reply

* Re: [PATCH 7/6] Leftover bits.
From: Linus Torvalds @ 2005-04-27 17:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4qdsiqgl.fsf@assigned-by-dhcp.cox.net>

On Wed, 27 Apr 2005, Junio C Hamano wrote:
> 
> It also adds code to unlink temporary files used to call the
> external diff command upon SIGNIT.

Actually, you should probably do SIGPIPE too. I suspect that's the much
more common case (somebody does a "diff-tree | less", and then quits).

		Linus

^ permalink raw reply

* Re: Finding file revisions
From: Linus Torvalds @ 2005-04-27 17:34 UTC (permalink / raw)
  To: Chris Mason; +Cc: git
In-Reply-To: <200504271251.00635.mason@suse.com>

On Wed, 27 Apr 2005, Chris Mason wrote:
> 
> I haven't seen a tool yet to find which changeset modified a given file, so 
> I whipped up something.  The basic idea is to:
> 
> for each changeset in rev-list
> 	for each file in diff-tree -r parent changeset
> 		match against desired files
> 
> Is there a faster way? 

Yes. Tell "diff-tree" what your desired files are, and it will cut down 
the amount of work by a _lot_ (because then diff-tree doesn't need to 
recurse into subdirectories that don't matter).

So you should just do

	for each changeset in rev-list
	do 
		diff-tree -r parent changeset <file-list>
	...

instead. 

> This will scale pretty badly as the tree grows, but 
> I usually only want to search back a few months in the history.  So, it 
> might make sense to limit the results by date or commit/tag.

With more history, "rev-list" should do basically the right thing: it will
be constant-time for _recent_ commits, and it is linear time in how far
back you want to go. Which seems quite reasonable.

And diff-tree is obviously constant-time (and very fast at that, 
especially if you limit it to just a few files, since then it won't even 
bother with any other subdirectories).

		Linus

^ permalink raw reply

* Re: git add / update-cache --add fails.
From: Petr Baudis @ 2005-04-27 17:30 UTC (permalink / raw)
  To: Ed L Cashin; +Cc: git
In-Reply-To: <87ll74go7o.fsf@coraid.com>

Dear diary, on Wed, Apr 27, 2005 at 06:48:43PM CEST, I got a letter
where Ed L Cashin <ecashin@coraid.com> told me that...
> Herbert Xu <herbert@gondor.apana.org.au> writes:
> 
> > Rhys Hardwick <rhys@rhyshardwick.co.uk> wrote:
> >> 
> >> rhys@metatron:~/repo/learning.repo$ strace update-cache --add w1d4p1.c
> > ...
> >> open("w1d4p1.c", O_RDONLY)              = -1 ENOENT (No such file or 
> >> directory)
> >
> > The file that you're trying to add doesn't exist.
> 
> Maybe the user should be informed as soon as update-cache knows that?
> 

> update-cache.c: 11388582a830a6161d1c769aa8616bed6f593b8a
> --- a/update-cache.c
> +++ b/update-cache.c
> @@ -98,6 +98,7 @@ static int add_file_to_cache(char *path)
>  
>  	fd = open(path, O_RDONLY);
>  	if (fd < 0) {
> +		fprintf(stderr, "update-cache Error: %s\n", strerror(errno));
>  		if (errno == ENOENT) {
>  			if (allow_remove)
>  				return remove_file_from_cache(path);

FWIW, I have this in my tree for some time already. :-)

> By the way, I created that patch with "git diff" in my git-pasky
> working directory.  Strangely, I had to redirect standard error to the
> same place as standard output in order to get the filename in the diff
> output.  I didn't check why the filename is on standard error,
> though.

Interesting. Anyway, you are apparently using some quite antique
git-pasky version.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: git add / update-cache --add fails.
From: Ed L Cashin @ 2005-04-27 16:48 UTC (permalink / raw)
  To: git
In-Reply-To: <E1DQcOc-00054l-00@gondolin.me.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]

Herbert Xu <herbert@gondor.apana.org.au> writes:

> Rhys Hardwick <rhys@rhyshardwick.co.uk> wrote:
>> 
>> rhys@metatron:~/repo/learning.repo$ strace update-cache --add w1d4p1.c
> ...
>> open("w1d4p1.c", O_RDONLY)              = -1 ENOENT (No such file or 
>> directory)
>
> The file that you're trying to add doesn't exist.

Maybe the user should be informed as soon as update-cache knows that?

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 360 bytes --]

update-cache.c: 11388582a830a6161d1c769aa8616bed6f593b8a
--- a/update-cache.c
+++ b/update-cache.c
@@ -98,6 +98,7 @@ static int add_file_to_cache(char *path)

 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
+		fprintf(stderr, "update-cache Error: %s\n", strerror(errno));
 		if (errno == ENOENT) {
 			if (allow_remove)
 				return remove_file_from_cache(path);

[-- Attachment #3: Type: text/plain, Size: 319 bytes --]

By the way, I created that patch with "git diff" in my git-pasky
working directory.  Strangely, I had to redirect standard error to the
same place as standard output in order to get the filename in the diff
output.  I didn't check why the filename is on standard error,
though.

-- 
  Ed L Cashin <ecashin@coraid.com>

^ permalink raw reply

* Finding file revisions
From: Chris Mason @ 2005-04-27 16:50 UTC (permalink / raw)
  To: git

Hello everyone,

I haven't seen a tool yet to find which changeset modified a given file, so 
I whipped up something.  The basic idea is to:

for each changeset in rev-list
	for each file in diff-tree -r parent changeset
		match against desired files

Is there a faster way?  This will scale pretty badly as the tree grows, but 
I usually only want to search back a few months in the history.  So, it 
might make sense to limit the results by date or commit/tag.

Usage:
file-changes [-c commit id] file1 ...

The file names can be perl regular expressions, and it will match any file 
starting with the expression listed.  So "file-changes fs/ext" will show 
everything in ext2 and ext3.

Example output:

diff-tree -r 56022b4d00cae3ff816d3ff05d9f8a80e1517c60 9bd104d712d710d53c35166e40bd5fe24caf893e
8a796b48e757e56b50802c28abf28e0199c45ad9->2db368df614de4799be2d1baffb6563dbe1b8926 fs/ext2/inode.c
dbc8fd9bab639b84b8cc94fdbbf850b1e4bf1b2b->a4cd819734ba2eea9d5d21039deca62057f72d44 fs/ext3/inode.c
cat-file commit 9bd104d712d710d53c35166e40bd5fe24caf893e
    tree cd4e40eae003e29c0d3be2aa769c3b572ab1b488
    parent 56022b4d00cae3ff816d3ff05d9f8a80e1517c60
    author mason <mason@coffee> 1114617717 -0400
    committer mason <mason@coffee> 1114617717 -0400

    comments go here

This is meant for cut n' paste.  If you find a changeset comment you like, 
run the diff-tree -r command on the first line to see a diff of the 
changeset (maybe I should add | diff-tree-helper here?)

-chris


#!/usr/bin/perl

use strict;

my $last;
my $ret;
my $i;
my @wanted = ();
my $matched;
my $argc = scalar(@ARGV);
my $commit;

sub print_usage() {
    print STDERR "usage: file-changes [-c commit] file_list\n";
    exit(1);
}

if ($argc < 1) {
    print_usage();
}

for ($i = 0 ; $i < $argc ; $i++)  {
    if ($ARGV[$i] eq "-c") {
    	if ($i == $argc - 1) {
	    print_usage();
	}
	$commit = $ARGV[++$i];
    } else {
	push @wanted, $ARGV[$i];
    }
}

if (!defined($commit)) {
    $commit = `commit-id`;
    if ($?) {
    	print STDERR "commit-id failed, try using -c to specify a commit\n";
	exit(1);
    }
    chomp $commit;
}

$last = $commit;

open(RL, "rev-list $commit|") || die "rev-list failed";
while(<RL>) {
    chomp;
    my $cur = $_;
    $matched = 0;
    if ($cur eq $last) {
        next;
    }
    # rev-list gives us the commits from newest to oldest
    open(DT, "diff-tree -r $cur $last|") || die "diff-tree failed";
    while(<DT>) {
        chomp;
	my @words = split;
	my $file = $words[3];
	# if the filename has whitespace, suck it in
	if (scalar(@words) > 4) {
	    if (m/$file(.*)/) {
	        $file .= $1;
	    }
	}
	foreach my $m (@wanted) {
	    if ($file =~ m/^$m/) {
		if (!$matched) {
		    print "diff-tree -r $cur $last\n";
		}
		print "$words[2] $file\n";
		$matched = 1;
	    }
	}
    }
    close(DT);
    if ($?) {
	$ret = $? >> 8;
	die "diff-tree failed with $ret";
    }
    if ($matched) {
	print "cat-file commit $last\n";
	open(COMMIT, "cat-file commit $last|") || die "cat-file $last failed";
	while(<COMMIT>) {
	    print "    $_";
	}
	close(COMMIT);
	if ($?) {
	    $ret = $? >> 8;
	    die "cat-file failed with $ret";
	}
	print "\n";
    }
    $last = $cur;
}

close(RL);
if ($?) {
    $ret = $? >> 8;
    die "rev-list failed with $ret";
}

^ permalink raw reply

* Re: I'm missing isofs.h
From: Jan Harkes @ 2005-04-27 16:45 UTC (permalink / raw)
  To: git, Petr Baudis, Andrew Morton
In-Reply-To: <20050427164351.GA7070@delft.aura.cs.cmu.edu>

On Wed, Apr 27, 2005 at 12:43:51PM -0400, Jan Harkes wrote:
> In any case, when I use
>     cg-diff -r a2755a80f40e5794ddc20e00f781af9d6320fafb: | grep isofs.h
> 
> the missing file does show up,
>     ...
>     Index: fs/isofs/isofs.h
>     +++ fd1621a8c03331bd78abfe52c8c385977d0a9729/fs/isofs/isofs.h (mode:100644 sha1:9ce7b51fb6141ea6b82d85687d490c74755591fb)
>     ...
> 
> so either I'm missing some subtle command line error (missing ':' after
> the tag-id?)

Looks like that actually is the problem, when I run cg-diff -r, but
leave out the ':' the final output does not include the added isofs.h
file.

Jan

^ permalink raw reply

* Re: I'm missing isofs.h
From: Jan Harkes @ 2005-04-27 16:43 UTC (permalink / raw)
  To: git; +Cc: Petr Baudis, Andrew Morton
In-Reply-To: <20050427135840.GE3014@pasky.ji.cz>

On Wed, Apr 27, 2005 at 03:58:41PM +0200, Petr Baudis wrote:
> Dear diary, on Wed, Apr 27, 2005 at 02:58:44PM CEST, I got a letter
> where Jan Harkes <jaharkes@cs.cmu.edu> told me that...
> > On Tue, Apr 26, 2005 at 09:43:38PM -0700, Andrew Morton wrote:
> > > In a current tree, using git-pasky-0.7:
> > 
> > It looks like git-pasky-0.7 doesn't include the following commit, but
> > there are also several other diff and merge related fixes that were
> > added since then.
> 
> Why do you think it doesn't include it? I can see the fix in the code.

I looked through the output of cg-log, which I thought had at least some
ordering, and that commit showed up as more recent than the pasky-0.7
entry. It looks like the same change is also part of pasky-0.7, but with
a different commit-id. Sorry about the confusion.

In any case, when I use
    cg-diff -r a2755a80f40e5794ddc20e00f781af9d6320fafb: | grep isofs.h

the missing file does show up,
    ...
    Index: fs/isofs/isofs.h
    +++ fd1621a8c03331bd78abfe52c8c385977d0a9729/fs/isofs/isofs.h (mode:100644 sha1:9ce7b51fb6141ea6b82d85687d490c74755591fb)
    ...

so either I'm missing some subtle command line error (missing ':' after
the tag-id?) or the problem was fixed by some other change. So I looked
through the logs to see if there was anything obvious and the commit I
mentioned looked promising.

Jan

^ permalink raw reply

* PATCH: Allow tree-id to return the ID of a tree object
From: Philip Pokorny @ 2005-04-27 16:20 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

While playing with cg-ls, I tried:

% cg-ls
... snip ...
100644  blob    bc607fd55f6ce4e56ce87766369b5d4d55ec79af        object.h
100755  blob    f35877a6aa5b68d2fb4a388dcfa9b3e64262604e        parent-id
040000  tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
100644  blob    d922305ee0f5583bdfcb629f6d4061e11e0fa859        read-cache.c
100644  blob    1ad7ffc555b635fe57fa7834b12d71ff576be065        read-tree.c
... snip ...
% cg-ls bfb75011c32589b282dd9c86621dadb0f0bb3866       <-- the ppc tree ID
Invalid id: bfb75011c32589b282dd9c86621dadb0f0bb3866
usage: cat-file [-t | tagname] <sha1>
usage: cat-file [-t | tagname] <sha1>
Invalid id:


Shouldn't cg-ls give a listing of a sub-tree?  The cg-help says it takes
a TREE-ID?

The problem seems to be that tree-id really only accepts a commit-id and
returns the TREE-ID of that commit.

So I modified commit-id, tree-id and parent-id to make them more similar
in coding style, force "short-id" names to be at least 4 lower case
letters, and have tree-id accept short, unambiguous ID's and bare SHA1-ID's.

Patch attached.




[-- Attachment #2: cogito-0.8-idparse.patch --]
[-- Type: text/plain, Size: 3014 bytes --]

Index: commit-id
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/commit-id  (mode:100755 sha1:4efcb6bdfdb2b2c5744f5d4d47d92beb7777ed59)
+++ uncommitted/commit-id  (mode:100775)
@@ -9,22 +9,30 @@
 SHA1ONLY="^$SHA1$"
 
 id=$1
+
 if [ ! "$id" ] || [ "$id" = "this" ] || [ "$id" = "HEAD" ]; then
 	id=$(cat .git/HEAD)
-fi
 
-if (echo $id | egrep -vq "$SHA1ONLY") && [ -r ".git/refs/tags/$id" ]; then
+elif [ -r ".git/refs/tags/$id" ]; then
 	id=$(cat ".git/refs/tags/$id")
-fi
 
-if (echo $id | egrep -vq "$SHA1ONLY") && [ -r ".git/refs/heads/$id" ]; then
+elif [ -r ".git/refs/heads/$id" ]; then
 	id=$(cat ".git/refs/heads/$id")
-fi
 
-idpref=$(echo "$id" | cut -c -2)
-idpost=$(echo "$id" | cut -c 3-)
-if [ $(find ".git/objects/$idpref" -name "$idpost*" 2>/dev/null | wc -l) -eq 1 ]; then
-	id=$idpref$(basename $(echo .git/objects/$idpref/$idpost*))
+# Short id's must be lower case and at least 4 digits.
+elif [[ "$id" == [0-9a-z][0-9a-z][0-9a-z][0-9a-z]* ]]; then
+	idpost=${id#??}
+	idpref=${id%$idpost}
+
+	# Assign array elements to matching names
+	idmatch=($(echo .git/objects/$idpref/$idpost*))
+
+	if [ ${#idmatch[*]} -eq 1 ] && [ -r "$idmatch" ]; then
+		id=$idpref${idmatch#.git/objects/$idpref/}
+	elif [ ${#idmatch[*]} -gt 1 ]; then
+		echo "Ambiguous id: $id" >&2
+		exit 1
+	fi
 fi
 
 if echo $id | egrep -vq "$SHA1ONLY"; then
Index: tree-id
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/tree-id  (mode:100755 sha1:1495ff78af71b57e21653512932bcda88fe05454)
+++ uncommitted/tree-id  (mode:100775)
@@ -7,8 +7,35 @@
 
 SHA1="[A-Za-z0-9]{40}"
 TREE="^tree $SHA1$"
+SHA1ONLY="^$SHA1$"
 
-id=$(cat-file commit $(commit-id "$1") | egrep "$TREE" | cut -d ' ' -f 2)
+id=$1
+
+# Is it a commit?
+commit=$(commit-id $id 2>/dev/null)
+if [ "$commit" ]; then
+	id=$(cat-file commit "$commit") | egrep "$TREE" | cut -d ' ' -f 2)
+
+# Short id's must be lower case and at least 4 digits.
+elif [[ "$id" == [0-9a-z][0-9a-z][0-9a-z][0-9a-z]* ]]; then
+	idpost=${id#??}
+	idpref=${id%$idpost}
+
+	# Assign array elements to matching names
+	idmatch=($(echo .git/objects/$idpref/$idpost*))
+
+	if [ ${#idmatch[*]} -eq 1 ] && [ -r "$idmatch" ]; then
+		id=$idpref${idmatch#.git/objects/$idpref/}
+	elif [ ${#idmatch[*]} -gt 1 ]; then
+		echo "Ambiguous id: $id" >&2
+		exit 1
+	fi
+fi
+
+if echo $id | egrep -vq "$SHA1ONLY"; then
+	echo "Invalid id: $id" >&2
+	exit 1
+fi
 
 if [ "$(cat-file -t "$id")" != "tree" ]; then
 	echo "Invalid id: $id" >&2
Index: parent-id
===================================================================
--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/parent-id  (mode:100755 sha1:f35877a6aa5b68d2fb4a388dcfa9b3e64262604e)
+++ uncommitted/parent-id  (mode:100775)
@@ -5,7 +5,8 @@
 #
 # Takes ID of the current commit, defaults to HEAD.
 
-PARENT="^parent [A-Za-z0-9]{40}$"
+SHA1="[A-Za-z0-9]{40}"
+PARENT="^parent $SHA1$"
 
 id=$(commit-id $1) || exit 1
 


^ permalink raw reply

* Re: Re: Darcs-git pulling from the Linux repo: a Linux VM question
From: Linus Torvalds @ 2005-04-27 16:16 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Git Mailing List, darcs-devel
In-Reply-To: <7iu0lskyfb.fsf@lanthane.pps.jussieu.fr>

On Wed, 27 Apr 2005, Juliusz Chroboczek wrote:
> 
> Here we're speaking about the initial import.  Committed on 17 April
> 2005 by Linus Torvalds, with the comment ``Let it rip''.  220 MB of
> changed files in a single commit.  2 minutes real time just to read
> all the files, never mind doing anything useful with them.

I think you may well want to consider the initial commit special. In many 
ways it is - it has no parents etc, so even apart from the fact that the 
initial commit obviously tends to be a lot bigger than any other commit, 
it actually fundamnetally is _technically_ different too.

> To put it mildly, Darcs is not optimised for that sort of usage.

It shouldn't be. Make the initial one a special case, and import things 
file-by-file for that one special case.

Afterwards, you should be able to handle other commits as "diffs", and
then it's entirely reasonable to have the difference all in memory. If
somebody really does end up having a 220MB diff, and darcs sucks at it,
then at that point I don't think it's darcs' problem any more, it's the
project that you're trying to track that is doing something wrong..

So if you _just_ consider the initial git commit special (and it's easy to 
notice by just looking at the lack of parents), then you may not need to 
change darcs in the other cases.

And almost all SCM's consider the initial state a special case anyway. The 
fact that GIT doesn't is just a result of the strange way of representing 
data, which doesn't care. I don't think you should emulate git in that 
respect.

		Linus

^ permalink raw reply

* Re: Revised PPC assembly implementation
From: linux @ 2005-04-27 16:01 UTC (permalink / raw)
  To: paulus; +Cc: davem, git, linux
In-Reply-To: <17007.2390.258823.189255@cargo.ozlabs.ibm.com>

> On my powerbook, which has a 1.5GHz G4 (7447A), the same test takes
> 4.68 seconds with my version, 4.72 seconds with your old version, but
> only 3.90 seconds with your new version.

20%; now we're getting somewhere!  Thanks for running the tests.

> Care to check the code and find out why it's giving the wrong answer?

You *could* be nice to me and breakpoint it every 20 rounds and tell me
which group is delivering the wrong answer..

But I'll look...

Hey!  It's not the tricky code at all; it's the STEPUP20 macro.
The third line should be +8, not +4.
Fix appended, but you can just edit line 127.

I can add one more tweak (scheduling the load of k better), and the
comments, then I think I'm done.

Would you mind playing with the number of words of fetchahead and see if
a value less than 4 is any faster?  It'll probably be a pretty minimal
change, but it doesn't affect the code size any.

I suppose we should also test it in a more realistic setting,
hashing *different* data a lot.  A dcbt in the loop might help.
(Does any PPC since the G3 have a cache line less than 64 bytes?
I know the G5 is 64 L1 and 128 L2...)

BTW, what's the best way to refer to PPC processors?  MPC74xx and PPC970FX?
Or Apple's names?  Or something else?



/*
 * SHA-1 implementation for PowerPC.
 *
 * Copyright (C) 2005 Paul Mackerras <paulus@samba.org>
 */

/*
 * We roll the registers for A, B, C, D, E around on each
 * iteration; E on iteration t is D on iteration t+1, and so on.
 * We use registers 6 - 10 for this.  (Registers 27 - 31 hold
 * the previous values.)
 */
#define RA(t)	(((t)+4)%5+6)
#define RB(t)	(((t)+3)%5+6)
#define RC(t)	(((t)+2)%5+6)
#define RD(t)	(((t)+1)%5+6)
#define RE(t)	(((t)+0)%5+6)

/* We use registers 11 - 26 for the W values */
#define W(t)	((t)%16+11)

/* Register 5 is used for the constant k */

/*
 * There are three F functions, used four groups of 20:
 * - 20 rounds of f0(b,c,d) = "bit wise b ? c : d" =  (^b & d) + (b & c)
 * - 20 rounds of f1(b,c,d) = b^c^d = (b^d)^c
 * - 20 rounds of f2(b,c,d) = majority(b,c,d) = (b&d) + ((b^d)&c)
 * - 20 more rounds of f1(b,c,d)
 *
 * These are all scheduled for near-optimal performance on a G4.
 * The G4 is a 3-issue out-of-order machine with 3 ALUs, but it can only
 * *consider* starting the oldest 3 instructions per cycle.  So to get
 * maximum performace out of it, you have to treat it as an in-order
 * machine.  Which means interleaving the computation round t with the
 * computation of W[t+4].
 *
 * The first 16 rounds use W values loaded directly from memory, while the
 * remianing 64 use values computed from those first 16.  We preload
 * 4 values before starting, so there are three kinds of rounds:
 * - The first 12 (all f0) also load the W values from memory.
 * - The next 64 compute W(i+4) in parallel. 8*f0, 20*f1, 20*f2, 16*f1.
 * - The last 4 (all f1) do not do anything with W.
 *
 * Therefore, we have 6 different round functions:
 * STEPD0_LOAD(t,s) - Perform round t and load W(s).  s < 16
 * STEPD0_UPDATE(t,s) - Perform round t and compute W(s).  s >= 16.
 * STEPD1_UPDATE(t,s)
 * STEPD2_UPDATE(t,s)
 * STEPD1(t) - Perform round t with no load or update.
 * 
 * The G5 is more fully out-of-order, and can find the parallelism
 * by itself.  The big limit is that it has a 2-cycle ALU latency, so
 * even though it's 2-way, the code has to be scheduled as if it's
 * 4-way, which can be a limit.  To help it, we try to schedule the
 * read of RA(t) as late as possible so it doesn't stall waiting for
 * the previous round's RE(t-1), and we try to rotate RB(t) as early
 * as possible while reading RC(t) (= RB(t-1)) as late as possible.
 */


/* the initial loads. */
#define LOADW(s) \
	lwz	W(s),(s)*4(%r4)

/*
 * This is actually 13 instructions, which is an awkward fit,
 * and uses W(s) as a temporary before loading it.
 */
#define STEPD0_LOAD(t,s) \
add RE(t),RE(t),W(t); andc   %r0,RD(t),RB(t);  /* spare slot */        \
add RE(t),RE(t),%r0;  and    W(s),RC(t),RB(t); rotlwi %r0,RA(t),5;     \
add RE(t),RE(t),W(s); add    %r0,%r0,%r5;      rotlwi RB(t),RB(t),30;  \
add RE(t),RE(t),%r0;  lwz    W(s),(s)*4(%r4);

/*
 * This can execute starting with 2 out of 3 possible moduli, so it
 * does 2 rounds in 9 cycles, 4.5 cycles/round.
 */
#define STEPD0_UPDATE(t,s) \
add RE(t),RE(t),W(t); andc   %r0,RD(t),RB(t); xor    W(s),W((s)-16),W((s)-3); \
add RE(t),RE(t),%r0;  and    %r0,RC(t),RB(t); xor    W(s),W(s),W((s)-8);      \
add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;     xor    W(s),W(s),W((s)-14);     \
add RE(t),RE(t),%r5;  rotlwi RB(t),RB(t),30;  rotlwi W(s),W(s),1;             \
add RE(t),RE(t),%r0;

/* Nicely optimal.  Conveniently, also the most common. */
#define STEPD1_UPDATE(t,s) \
add RE(t),RE(t),W(t); xor    %r0,RD(t),RB(t); xor    W(s),W((s)-16),W((s)-3); \
add RE(t),RE(t),%r5;  xor    %r0,%r0,RC(t);   xor    W(s),W(s),W((s)-8);      \
add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;     xor    W(s),W(s),W((s)-14);     \
add RE(t),RE(t),%r0;  rotlwi RB(t),RB(t),30;  rotlwi W(s),W(s),1;

/*
 * The naked version, no UPDATE, for the last 4 rounds.  3 cycles per.
 * We could use W(s) as a temp register, but we don't need it.
 */
#define STEPD1(t) \
/* spare slot */        add   RE(t),RE(t),W(t); xor    %r0,RD(t),RB(t); \
rotlwi RB(t),RB(t),30;  add   RE(t),RE(t),%r5;  xor    %r0,%r0,RC(t);   \
add    RE(t),RE(t),%r0; rotlwi %r0,RA(t),5;     /* idle */              \
add    RE(t),RE(t),%r0;

/* 5 cycles per */
#define STEPD2_UPDATE(t,s) \
add RE(t),RE(t),W(t); and    %r0,RD(t),RB(t); xor    W(s),W((s)-16),W((s)-3); \
add RE(t),RE(t),%r0;  xor    %r0,RD(t),RB(t); xor    W(s),W(s),W((s)-8);      \
add RE(t),RE(t),%r5;  and    %r0,%r0,RC(t);   xor    W(s),W(s),W((s)-14);     \
add RE(t),RE(t),%r0;  rotlwi %r0,RA(t),5;     rotlwi W(s),W(s),1;             \
add RE(t),RE(t),%r0;  rotlwi RB(t),RB(t),30;

#define STEP0_LOAD4(t,s)		\
	STEPD0_LOAD(t,s);		\
	STEPD0_LOAD((t+1),(s)+1);	\
	STEPD0_LOAD((t)+2,(s)+2);	\
	STEPD0_LOAD((t)+3,(s)+3);

#define STEPUP4(fn, t, s)		\
	STEP##fn##_UPDATE(t,s);		\
	STEP##fn##_UPDATE((t)+1,(s)+1);	\
	STEP##fn##_UPDATE((t)+2,(s)+2);	\
	STEP##fn##_UPDATE((t)+3,(s)+3);	\

#define STEPUP20(fn, t, s)		\
	STEPUP4(fn, t, s);		\
	STEPUP4(fn, (t)+4, (s)+4);	\
	STEPUP4(fn, (t)+8, (s)+8);	\
	STEPUP4(fn, (t)+12, (s)+12);	\
	STEPUP4(fn, (t)+16, (s)+16)

	.globl	sha1_core
sha1_core:
	stwu	%r1,-80(%r1)
	stmw	%r13,4(%r1)

	/* Load up A - E */
	lmw	%r27,0(%r3)

	mtctr	%r5

1:
	lis	%r5,0x5a82	/* K0-19 */
	mr	RA(0),%r27
	LOADW(0)
	mr	RB(0),%r28
	LOADW(1)
	mr	RC(0),%r29
	LOADW(2)
	ori	%r5,%r5,0x7999
	mr	RD(0),%r30
	LOADW(3)
	mr	RE(0),%r31

	STEP0_LOAD4(0, 4)
	STEP0_LOAD4(4, 8)
	STEP0_LOAD4(8, 12)
	STEPUP4(D0, 12, 16)
	STEPUP4(D0, 16, 20)

	lis	%r5,0x6ed9	/* K20-39 */
	ori	%r5,%r5,0xeba1
	STEPUP20(D1, 20, 24)

	lis	%r5,0x8f1b	/* K40-59 */
	ori	%r5,%r5,0xbcdc
	STEPUP20(D2, 40, 44)

	lis	%r5,0xca62	/* K60-79 */
	ori	%r5,%r5,0xc1d6
	STEPUP4(D1, 60, 64)
	STEPUP4(D1, 64, 68)
	STEPUP4(D1, 68, 72)
	STEPUP4(D1, 72, 76)
	STEPD1(76)
	STEPD1(77)
	STEPD1(78)
	STEPD1(79)

	/* Add results to original values */
	add	%r31,%r31,RE(0)
	add	%r30,%r30,RD(0)
	add	%r29,%r29,RC(0)
	add	%r28,%r28,RB(0)
	add	%r27,%r27,RA(0)

	addi	%r4,%r4,64
	bdnz	1b

	/* Save final hash, restore registers, and return */
	stmw	%r27,0(%r3)
	lmw	%r13,4(%r1)
	addi	%r1,%r1,80
	blr

^ permalink raw reply

* Re: Re: Darcs-git pulling from the Linux repo: a Linux VM question
From: Juliusz Chroboczek @ 2005-04-27 15:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, darcs-devel
In-Reply-To: <Pine.LNX.4.58.0504270823480.18901@ppc970.osdl.org>

>> For now, does anyone know how I can tune the Linux VM to get a 720
>> MB process to run reliably in 640 MB of main memory?

> I really think you're screwed.

Thanks, that's what I needed to know.

> You _really_ shouldn't read in files that you don't absolutely need.

Ahem... you don't expect me to embark on hacking Git without at least
understanding that, do you?

> That's really the biggest point of git: using the sha1 for naming the
> objects is really all about "descrive the contents using 20 bytes instead
> of by reading the contents".

Here we're speaking about the initial import.  Committed on 17 April
2005 by Linus Torvalds, with the comment ``Let it rip''.  220 MB of
changed files in a single commit.  2 minutes real time just to read
all the files, never mind doing anything useful with them.

To put it mildly, Darcs is not optimised for that sort of usage.

> Sorry.  You really need to fix darcs.

That's exactly why we're so interested in your repository.

                                        Juliusz

^ permalink raw reply

* Re: git "tag" objects implemented - and a re-done commit
From: Linus Torvalds @ 2005-04-27 15:37 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: git
In-Reply-To: <pan.2005.04.27.03.36.51.65390@smurf.noris.de>

On Wed, 27 Apr 2005, Matthias Urlichs wrote:
>
> Hi, Linus Torvalds wrote:
> 
> > And if two different developers tag exactly the same object with exactly 
> > the same tag-name and exactly the same signature, then they get the same 
> > tag object, and that's fine. They should.
> 
> ... except that they can't. I mean, the signature is done by different
> people at different times, so it can't well be identical.

You'd quite possibly use some shared secret key for some work. For 
example, say you're a company, and any "release person" can sign the 
work..

Also, since you can tag things without signing anything at all, it's even 
more trivial to get the same tag that way.

Signing tags really makes sense when you want somebody _else_ to trust it. 
But unsigned tags are perfectly practical from a _private_ perspective: 
let's say that you just want to remember certain events but you don't need 
to tell anybody else about them - what you'd do is to just create your own 
local tag, and there's no real reason to sign it, since you'll never tell 
anybody else about it anyway.

(One such thing could be to create a tag every time you compile and
install a new kernel: your "tag" is just a way to remember what your
installed kernel was built against, and is meaningless to anybody else.  
In fact, you might well decide to just remove such tags periodically).

		Linus

^ permalink raw reply

* Re: I'm missing isofs.h
From: Steven Cole @ 2005-04-27 15:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Petr Baudis, git
In-Reply-To: <20050426214338.32e9ac27.akpm@osdl.org>

Andrew Morton wrote:
> In a current tree, using git-pasky-0.7:
> 
> bix:/usr/src/git26> cat .git/tags/v2.6.12-rc3 
> a2755a80f40e5794ddc20e00f781af9d6320fafb
> bix:/usr/src/git26> git diff -r v2.6.12-rc3|grep isofs.h
> +#include "isofs.h"
>  #include "zisofs.h"
> +#include "isofs.h"
> +#include "isofs.h"
> +#include "isofs.h"
>  #include "zisofs.h"
> +#include "isofs.h"
> +#include "isofs.h"
> +#include "isofs.h"
> +#include "isofs.h"
> 
> 
> That diff should have included the addition of the new isofs.h, but it
> isn't there.
> 

I'm seeing unexplained behaviour using the above technique, and I'm
also seeing fs/isofs/isofs.h as missing, along with seven other changes.

I'm using the latest cogito release:
[steven@spc0 COGITO]$ cg-version
cogito-0.8 (3e0fb979cc7541506ec660ab66b83d8120da6d57)

I updated my linux-2.6 repo with cg-update origin, and then created
a current linux-2.6 tree using cg-export.  I diffed that exported
tree with 2.6.12-rc3 and saved the result as "a.diff".

I created another diff using Andrew's technique using cg-diff and saved
that to "b.diff".

I had expected that a.diff and b.diff to be the same, but they are
not, and the AWOL file fs/isofs/isofs.h is among the missing using
Andrew's technique.

Here are the details.

Steven

[steven@spc0 linux-2.6]$ cat .git/HEAD
e8108c98dd6d65613fa0ec9d2300f89c48d554bf

[steven@spc0 linux-2.6]$ fsck-cache --tags
tagged commit a2755a80f40e5794ddc20e00f781af9d6320fafb (v2.6.12-rc3) in 0397236d43e48e821cce5bbe6a80a1a56bb7cc3a
tagged commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (v2.6.12-rc2) in 9e734775f7c22d2f89943ad6c745571f1930105f
expect dangling commits - potential heads - due to lack of head information
dangling commit e8108c98dd6d65613fa0ec9d2300f89c48d554bf

[steven@spc0 linux-2.6]$ cg-export ../linux-2.6-current
[steven@spc0 linux-2.6]$ cg-diff -r v2.6.12-rc3 >../b.diff
[steven@spc0 linux-2.6]$ cd ..

[steven@spc0 COGITO]$ diff -urN linux-2.6.12-rc3 linux-2.6-current >a.diff
#note that linux-2.6.12-rc3 was created by patch from kernel.org.

[steven@spc0 COGITO]$ diffstat a.diff >a.diffstat
[steven@spc0 COGITO]$ diffstat b.diff >b.diffstat
[steven@spc0 COGITO]$ tail -n 1 a.diffstat
  199 files changed, 3083 insertions(+), 1601 deletions(-)
[steven@spc0 COGITO]$ tail -n 1 b.diffstat
  191 files changed, 2539 insertions(+), 1540 deletions(-)
[steven@spc0 COGITO]$ diff -u a.diffstat b.diffstat
--- a.diffstat  2005-04-27 09:07:04.000000000 -0600
+++ b.diffstat  2005-04-27 09:07:14.000000000 -0600
@@ -101,7 +101,6 @@
   drivers/usb/net/zd1201.c                     |   20
   drivers/usb/serial/Kconfig                   |    9
   drivers/usb/serial/Makefile                  |    1
- drivers/usb/serial/hp4x.c                    |   85 +++
   drivers/usb/storage/unusual_devs.h           |   22 -
   drivers/video/imsttfb.c                      |    4
   drivers/video/logo/Kconfig                   |    2
@@ -113,7 +112,6 @@
   fs/isofs/dir.c                               |   13
   fs/isofs/export.c                            |    6
   fs/isofs/inode.c                             |   19
- fs/isofs/isofs.h                             |  190 ++++++++
   fs/isofs/joliet.c                            |    6
   fs/isofs/namei.c                             |   13
   fs/isofs/rock.c                              |    8
@@ -136,15 +134,10 @@
   include/asm-sparc64/pgtable.h                |    5
   include/asm-sparc64/spinlock.h               |   48 +-
   include/linux/iso_fs.h                       |  147 ------
- include/linux/iso_fs_i.h                     |   27 -
- include/linux/iso_fs_sb.h                    |   34 -
   include/linux/netfilter_ipv4.h               |    3
   include/linux/pci_ids.h                      |    1
- include/linux/tc_act/tc_defact.h             |   21
- include/net/act_generic.h                    |  142 ++++++
   include/net/ax25.h                           |   10
   include/net/ipv6.h                           |    2
- include/net/tc_act/tc_defact.h               |   13
   include/net/tcp.h                            |   11
   kernel/panic.c                               |    4
   mm/mempolicy.c                               |    2
@@ -190,11 +183,10 @@
   net/sched/Kconfig                            |   10
   net/sched/Makefile                           |   11
   net/sched/cls_fw.c                           |   31 +
- net/sched/simple.c                           |   93 ++++
   net/unix/af_unix.c                           |    1
   net/xfrm/xfrm_state.c                        |    5
   scripts/mod/file2alias.c                     |  111 ++++-
   security/selinux/hooks.c                     |    3
   sound/oss/msnd_pinnacle.c                    |    2
   sound/ppc/Kconfig                            |    2
- 199 files changed, 3083 insertions(+), 1601 deletions(-)
+ 191 files changed, 2539 insertions(+), 1540 deletions(-)
[steven@spc0 COGITO]$ cg-version
cogito-0.8 (3e0fb979cc7541506ec660ab66b83d8120da6d57)


^ permalink raw reply

* Re: Darcs-git pulling from the Linux repo: a Linux VM question
From: Linus Torvalds @ 2005-04-27 15:31 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Git Mailing List, darcs-devel
In-Reply-To: <7i7jionz5q.fsf@lanthane.pps.jussieu.fr>

On Wed, 27 Apr 2005, Juliusz Chroboczek wrote:
> 
> So yes, in the longer term we need to fix Darcs.  For now, does anyone
> know how I can tune the Linux VM to get a 720 MB process to run
> reliably in 640 MB of main memory?

I really think you're screwed. The only way you have even a _chance_ of
getting it to work well is that if you have very nice access patterns to
that 720MB, but my guess is that that simply isn't the case. You probably
read most of it in once (and write out changes once, but I hope you at
least notice the case of "nothing changed" so that probably is the smaller
of your problems), and the fact is, you're going to have absolutely
_horrible_ access patterns, since you'll end up not just with a 720MB
process that doesn't have much locality, you'll end up with another 720MB
that you needed to have in the page cache for the IO.

The only way I can see to fix it short-term is to try to use "mmap()"  
instead of "read()" to read the file data, and then try to avoid touching
the mapping unless you _have_ to. In other words: if you actually need to
_compare_ the data (which obviously reads from the mapping), you're
screwed.

Using mmap() will at least mean that the system can re-use the page cache 
pages, though, so it should improve memory pressure a bit.

> So what was it you said about self-tuning VM systems?

The kernel tries to tune itself in the sense that it automatically 
allocates the memory to user processes vs caching (page cache, directory 
caching etc) and tunes itself quite well that way.

But there's no way to tune for crappy access patterns and working sets
bigger than the amount of RAM. Sorry. You really need to fix darcs.

You _really_ shouldn't read in files that you don't absolutely need.  
That's really the biggest point of git: using the sha1 for naming the
objects is really all about "descrive the contents using 20 bytes instead
of by reading the contents". Because reading the content _will_ be
expensive. Even if you have 2GB of memory and you can keep it all cached,
it will be horribly expensive.

		Linus

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Linus Torvalds @ 2005-04-27 15:22 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List
In-Reply-To: <426F2671.1080105@zytor.com>

On Tue, 26 Apr 2005, H. Peter Anvin wrote:
> 
> One solution to all of this would be to define a quoting standard for 
> strings, and simply require that all free-format strings (like the 
> author fields) or at least strings that match [0-9a-f]{20}, are always 
> quoted.

git uses more of the ".newsrc" format, in that it just knows which 
characters are legal or not.

To find the email address, look for the first '<'. To find the date, look 
for the first '>'. Those characters are not allowed in the name or the 
email, so they act as well-defined delimeters.

		Linus

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Thomas Glanzmann @ 2005-04-27 15:13 UTC (permalink / raw)
  To: Florian Weimer
  Cc: H. Peter Anvin, Andrew Morton, Linus Torvalds, magnus.damm, mason,
	mike.taht, mpm, linux-kernel, git
In-Reply-To: <871x8wb6w4.fsf@deneb.enyo.de>

Hello,

> Directory hashing has a negative impact on some applications (notably
> tar and unpatched mutt on large Maildir folders).  For git, it's a win
> because hashing destroys locality anyway.

this is inaccurate. Actually turning on directory hashing speeds-up big
maildirs a lot (tested with mutt-1.5.4 and higher with a maildir
containing 30thousand messages). But in the mutt case you also have the
header cache[1] which speeds up a lot - with or without hashed
directories. See also MEs comment[2] on this.

For tar I have no idea why it should slow down the operation, but maybe
you can enlighten us.

	Thomas

[1] http://wwwcip.informatik.uni-erlangen.de/~sithglan/mutt/
	- wait till TLR has released mutt-1.5.10
	- use mutt CVS HEAD
	- use mutt-1.5.9 + http://wwwcip.informatik.uni-erlangen.de/~sithglan/mutt/mutt-cvs-header-cache.29
	- and put the following in your .muttrc:
	set header_cache=/tmp/login-hcache
	set maildir_header_cache_verify=no

[2] http://www.advogato.org/person/scandal/

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Florian Weimer @ 2005-04-27 15:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Linus Torvalds, magnus.damm, mason, mike.taht, mpm,
	linux-kernel, git
In-Reply-To: <426ED20B.9070706@zytor.com>

* H. Peter Anvin:

> While you're doing this anyway, you might want to make sure you enable 
> -O +dir_index and run fsck -D.

Directory hashing has a negative impact on some applications (notably
tar and unpatched mutt on large Maildir folders).  For git, it's a win
because hashing destroys locality anyway.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox