Git development

Git development
 help / color / mirror / Atom feed

* Re: Mercurial 0.3 vs git benchmarks
From: H. Peter Anvin @ 2005-04-27 18:54 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Florian Weimer, Andrew Morton, Linus Torvalds, magnus.damm, mason,
	mike.taht, mpm, linux-kernel, git
In-Reply-To: <20050427151357.GH1087@cip.informatik.uni-erlangen.de>

Thomas Glanzmann wrote:
> 
> For tar I have no idea why it should slow down the operation, but maybe
> you can enlighten us.
> 

Directory hashing slows down operations that do linear sweeps through 
the filesystem reading every single file, simply because without 
dir_index, there is likely to be a correlation between inode order and 
directory order, whereas with dir_index, readdir() returns entries in 
hash order.

	-hpa

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Thomas Glanzmann @ 2005-04-27 19:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Florian Weimer, Andrew Morton, Linus Torvalds, magnus.damm, mason,
	mike.taht, mpm, linux-kernel, git
In-Reply-To: <426FDFCD.6000309@zytor.com>

Hello,

> Directory hashing slows down operations that do linear sweeps through 
> the filesystem reading every single file, simply because without 
> dir_index, there is likely to be a correlation between inode order and 
> directory order, whereas with dir_index, readdir() returns entries in 
> hash order.

thank you for the awareness training. Than mutt should be slower, too.
Maybe I should repeat that tests.

	Thomas

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Linus Torvalds @ 2005-04-27 19:11 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List
In-Reply-To: <426FD3EE.5000404@zytor.com>

On Wed, 27 Apr 2005, H. Peter Anvin wrote:
> 
> That's true for email addresses, but the point was to distinguish links 
> to other git objects from any other kind of text.

No, that's definitely _not_ the point.

I repeat: git does not do any free-form parsin AT ALL. The links are in 
well-defined places, and you do not ever search for them. And that's 
really very very important.

> Currently there is no  such delimiter for that.

There absolutely is.

For a "commit", the format is

 - first line is exactly 46 bytes: five bytes of "tree ", 40 bytes of hex 
   sha1, and one byte of "\n".

   NOTHING ELSE. Not extra spaces at the end, not extra spaces at the 
   beginning or the middle. It's ASCII, but it's not free-format ASCII.

 - the next <n> (where 'n' can be 0 or more) lines are _exactly_ 48 bytes
   each:  seven bytes of "parent ", 40 bytes of hex sha1, and one byte of 
   "\n".

   NOTHING ELSE.

 - the next lines are "author " and "committer ". They have well-defined 
   delimters for their fields, and no sha1's. The fields cannot contain 
   '<', '>' or newlines, since those are the field/line delimeters.

There is no free-format text _anywhere_ that git parses. No room for 
guesses, no room for mistakes, no room for anything half-way questionable.

And fsck actually enforces this. We do _not_ just use "gets()" to read one 
line at a time. We literally verify that the lines are 46/48 bytes long, 
and have the delimeters in the expected places.

Same goes for "tree" and "tag" objects. They all have fixed-format stuff. 
A "tree" entry is always

	"%o <space> %s" \0 [ 20 bytes of sha1 ]

with "%o" being "mode", and "%s" being "path". We don't guess. 

And this really is _important_. Exactly because we name things by the SHA1
hash of the contents, we MUST NOT have flexible formats. Having a format
which allows non-canonical representations (extra spaces etc) would mean
that two trees that were identical would depend on how you happened to
format them.

So there's really two issues:
 - we don't guess or parse contents. We have strict rules, and that makes 
   git more reliable. There are no gray areas. There's "right" and there 
   is "wrong", and the right one works, and the wrong one gets flagged as 
   being wrong and the tools refuse to touch it.
 - there is only _one_ right way to do things, and that means that the 
   the content is well-defined, and thus the SHA1 of the content is 
   well-defined.

For example, another rule is that a "tree" object is always sorted by 
the bytes in the filename (not by entry, btw: a directory called "foo" 
will sort as "foo/", even though the _entry_ only shows "foo"). That rule 
not only makes a lot of operations faster, but again, it means that there 
is only _one_ way to represent a tree validly.

IOW, you _cannot_ represent a tree any other way (and I've been too lazy
to check this in fsck, but it's alway sbeen my plan), and that is exactly 
why we can just compare the hashes of the results - because there is no 
random component of "layout" in the contents.

This really is important. It means that if you get to the same two tree
contents in totally unrelated ways (you unpack a tar-file and encode it in
git, or you have 5 years of git history and check it out), the "tree" will
match _exactly_. There's no history. There's no "optional" stuff. Since
the contents of the trees are the same, the SHA1 of the two trees will be
the same. Exactly because git refuses to touch any free-format stuff.

		Linus

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Linus Torvalds @ 2005-04-27 19:15 UTC (permalink / raw)
  To: Dave Jones; +Cc: H. Peter Anvin, Git Mailing List
In-Reply-To: <20050427183239.GE19011@redhat.com>

On Wed, 27 Apr 2005, Dave Jones wrote:
> 
> That actually broke one of my first git scripts when one of the
> changelog texts started a line with 'tree '.  I hacked around it
> by making my script only grep in the 'head -n4' lines, but this
> seems somewhat fragile having to make assumptions that the field
> I want to see is in the first 4 lines.

It's not an assumption.

IT'S THE LAW.

The speed of light is not "an assumption". It is.

The tree is in the first line of a commit. You don't even need to parse 
it, you do

	tree=$(cat-file commit $head | sed 's/tree //;q')

and that's it. No parsing.

Git doesn't guess. Git knows.

		Linus

^ permalink raw reply

* Re: Cogito Tutorial If It Helps
From: Petr Baudis @ 2005-04-27 19:32 UTC (permalink / raw)
  To: Alan Chandler; +Cc: git
In-Reply-To: <200504271922.07765.alan@chandlerfamily.org.uk>

Dear diary, on Wed, Apr 27, 2005 at 08:22:07PM CEST, I got a letter
where Alan Chandler <alan@chandlerfamily.org.uk> told me that...
> Where I am confused is the relationship between what is in the .git 
> subdirectory and the project tree of cogito that sits around it.  Obviously I 
> understand that its the latest version of the project as represented by the 
> objects in the repository, but what I don't really understand (and neither 
> your tutorial nor all the explanations of each of the commands in the README 
> really explain it either) is how the various commands adjust the 
> relationship.
> 
> For instance cg-branch-add seems to add a branch to the repository from a url 
> (I assume it downloads any "blobs" etc that are not already in my local 
> repository and creates a tag that identifies the head of a tree object), but 
> a don't understand how I am supposed see that particular branch as expanded 
> code.  (I suspect it might be cg-seek, but I am not really sure - and if it 
> is how do you find out what branch this expanded code is now pointed to?).  
> But what do cg-update and cg-pull do in terms of the uncompressed code 
> sitting in the surrounding directory round the repository, particularly when 
> you perform them on a branch that is not the one that the code refers to.  

Those commands affect your working tree:

	cg-cancel
		Cancels out any modifications in the working tree w.r.t.
		the last commit
	cg-merge
		Merges changes done in another branch to your current
		branch
	cg-patch
		Applies a patch, with regard to special git-specific
		info generated by cg-diff
	cg-rm
		Removed the file from your working tree if it's still
		around
	cg-seek
		Changes your working tree to match some other commit in
		the database
	cg-update
		Potentially brings in changes from a remote branch, and
		updates your working tree to the latest commit + those
		changes

Those commands affect the objects database:

	cg-commit
	cg-pull
		cg-pull just gets the data from remote objects database
		to the local objects database; it is the "first part"
		of what cg-update does
	cg-update

This affects both:

	cg-merge
		Not directly, but it can call cg-commit automatically.
	cg-update

> The reason I raise all this, is when I follow through on your tutorial and get 
> to the cg-diff stage I get this
> 
> xargs: cg-Xdiffdo: No such file or directory
> 
> And I have absolutely no idea whats wrong or where to start looking.

You didn't do make install and you don't have the cogito tree in your $PATH.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Petr Baudis @ 2005-04-27 19:39 UTC (permalink / raw)
  To: Dave Jones; +Cc: H. Peter Anvin, Linus Torvalds, Git Mailing List
In-Reply-To: <20050427183239.GE19011@redhat.com>

Dear diary, on Wed, Apr 27, 2005 at 08:32:40PM CEST, I got a letter
where Dave Jones <davej@redhat.com> told me that...
> That actually broke one of my first git scripts when one of the
> changelog texts started a line with 'tree '.  I hacked around it
> by making my script only grep in the 'head -n4' lines, but this
> seems somewhat fragile having to make assumptions that the field
> I want to see is in the first 4 lines.

The tree field is now always at the first line, but generally the header
part is variable-sized; you have multiple parent lines in case of
merges.

Just stop reading at the first newline.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* The git repo format
From: Brian O'Mahoney @ 2005-04-27 19:47 UTC (permalink / raw)
  Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504271154470.18901@ppc970.osdl.org>

In understanding how to work with 'git' I had a number of initial
difficulties which are mostly covered by the e-mail from Linus below.

Most of these are already covered in the README:

for objects, ie blob, commit, tag, tree: inflate, then
<type>\s<size>\0<data>

where <data> is in the form, described by Linus below

when you look at them closely, all the formats are simple,
un-ambiguous, and very easy to parse.

The index is also easy to parse, but there is a detail,
after the 3-int header the records are padded to a multiple
of 8 bytes. The detail is in cache.h.

Maybe the README needs to re-inforce this.

Brian

> I repeat: git does not do any free-form parsin AT ALL.
========================================================

 The links are in well-defined places, and you do not ever search for them.

And that's really very very important.


> For a "commit", the format is
> 
>  - first line is exactly 46 bytes: five bytes of "tree ", 40 bytes of hex 
>    sha1, and one byte of "\n".
> 
>    NOTHING ELSE. Not extra spaces at the end, not extra spaces at the 
>    beginning or the middle. It's ASCII, but it's not free-format ASCII.
> 
>  - the next <n> (where 'n' can be 0 or more) lines are _exactly_ 48 bytes
>    each:  seven bytes of "parent ", 40 bytes of hex sha1, and one byte of 
>    "\n".
> 
>    NOTHING ELSE.
> 
>  - the next lines are "author " and "committer ". They have well-defined 
>    delimters for their fields, and no sha1's. The fields cannot contain 
>    '<', '>' or newlines, since those are the field/line delimeters.
> 
> There is no free-format text _anywhere_ that git parses. No room for 
> guesses, no room for mistakes, no room for anything half-way questionable.
> 
> And fsck actually enforces this. We do _not_ just use "gets()" to read one 
> line at a time. We literally verify that the lines are 46/48 bytes long, 
> and have the delimeters in the expected places.
> 
> Same goes for "tree" and "tag" objects. They all have fixed-format stuff. 
> A "tree" entry is always
> 
> 	"%o <space> %s" \0 [ 20 bytes of sha1 ]
> 
> with "%o" being "mode", and "%s" being "path". We don't guess. 
> 
> And this really is _important_. Exactly because we name things by the SHA1
> hash of the contents, we MUST NOT have flexible formats. Having a format
> which allows non-canonical representations (extra spaces etc) would mean
> that two trees that were identical would depend on how you happened to
> format them.
> 
> So there's really two issues:
>  - we don't guess or parse contents. We have strict rules, and that makes 
>    git more reliable. There are no gray areas. There's "right" and there 
>    is "wrong", and the right one works, and the wrong one gets flagged as 
>    being wrong and the tools refuse to touch it.
>  - there is only _one_ right way to do things, and that means that the 
>    the content is well-defined, and thus the SHA1 of the content is 
>    well-defined.
> 
> For example, another rule is that a "tree" object is always sorted by 
> the bytes in the filename (not by entry, btw: a directory called "foo" 
> will sort as "foo/", even though the _entry_ only shows "foo"). That rule 
> not only makes a lot of operations faster, but again, it means that there 
> is only _one_ way to represent a tree validly.
> 
> IOW, you _cannot_ represent a tree any other way (and I've been too lazy
> to check this in fsck, but it's alway sbeen my plan), and that is exactly 
> why we can just compare the hashes of the results - because there is no 
> random component of "layout" in the contents.
> 
> This really is important. It means that if you get to the same two tree
> contents in totally unrelated ways (you unpack a tar-file and encode it in
> git, or you have 5 years of git history and check it out), the "tree" will
> match _exactly_. There's no history. There's no "optional" stuff. Since
> the contents of the trees are the same, the SHA1 of the two trees will be
> the same. Exactly because git refuses to touch any free-format stuff.


^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Theodore Ts'o @ 2005-04-27 19:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, magnus.damm, mason, mike.taht, mpm, linux-kernel,
	git
In-Reply-To: <20050426155609.06e3ddcf.akpm@osdl.org>

On Tue, Apr 26, 2005 at 03:56:09PM -0700, Andrew Morton wrote:
> - umount the fs
> - tune2fs -O ^has_journal /dev/whatever
> - fsck -fy                              (to clean up the now-orphaned journal inode)

Using moderately recent versions of e2fsprogs, tune2fs will clean up
the journal inode, so there's no reason to do an fsck.  (Harmless, but
it shouldn't be necessary and it takes time).

> - tune2fs -j -J size=nblocks    (normally 4k blocks)

The argument to "-J size" is in megabytes, not in blocks.

						- Ted

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Theodore Ts'o @ 2005-04-27 19:57 UTC (permalink / raw)
  To: H. Peter Anvin, Florian Weimer, Andrew Morton, Linus Torvalds,
	magnus.damm, mason, mike.taht, mpm, linux-kernel, git
In-Reply-To: <20050427190144.GA28848@cip.informatik.uni-erlangen.de>

On Wed, Apr 27, 2005 at 09:01:44PM +0200, Thomas Glanzmann wrote:
> Hello,
> 
> > Directory hashing slows down operations that do linear sweeps through 
> > the filesystem reading every single file, simply because without 
> > dir_index, there is likely to be a correlation between inode order and 
> > directory order, whereas with dir_index, readdir() returns entries in 
> > hash order.
> 
> thank you for the awareness training. Than mutt should be slower, too.
> Maybe I should repeat that tests.

If you are using the mutt in Debian unstable, it has the patch applied
which qsorts based on inode number returned from readdir(), which is
why you may not have been seeing the problem.

Or you can LD_PRELOAD the attached quick hack....

						- Ted

/*
 * readdir accelerator
 *
 * (C) Copyright 2003, 2004 by Theodore Ts'o.
 *
 * Compile using the command:
 *
 * gcc -o spd_readdir.so -shared spd_readdir.c -ldl
 *
 * %Begin-Header%
 * This file may be redistributed under the terms of the GNU Public
 * License.
 * %End-Header%
 * 
 */

#define ALLOC_STEPSIZE	100
#define MAX_DIRSIZE	0

#define DEBUG

#ifdef DEBUG
#define DEBUG_DIR(x)	{if (do_debug) { x; }}
#else
#define DEBUG_DIR(x)
#endif

#define _GNU_SOURCE
#define __USE_LARGEFILE64

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
#include <errno.h>
#include <dlfcn.h>

struct dirent_s {
	unsigned long long d_ino;
	long long d_off;
	unsigned short int d_reclen;
	unsigned char d_type;
	char *d_name;
};

struct dir_s {
	DIR	*dir;
	int	num;
	int	max;
	struct dirent_s *dp;
	int	pos;
	int	fd;
	struct dirent ret_dir;
	struct dirent64 ret_dir64;
};

static int (*real_closedir)(DIR *dir) = 0;
static DIR *(*real_opendir)(const char *name) = 0;
static struct dirent *(*real_readdir)(DIR *dir) = 0;
static struct dirent64 *(*real_readdir64)(DIR *dir) = 0;
static off_t (*real_telldir)(DIR *dir) = 0;
static void (*real_seekdir)(DIR *dir, off_t offset) = 0;
static int (*real_dirfd)(DIR *dir) = 0;
static unsigned long max_dirsize = MAX_DIRSIZE;
static num_open = 0;
#ifdef DEBUG
static int do_debug = 0;
#endif

static void setup_ptr()
{
	char *cp;

	real_opendir = dlsym(RTLD_NEXT, "opendir");
	real_closedir = dlsym(RTLD_NEXT, "closedir");
	real_readdir = dlsym(RTLD_NEXT, "readdir");
	real_readdir64 = dlsym(RTLD_NEXT, "readdir64");
	real_telldir = dlsym(RTLD_NEXT, "telldir");
	real_seekdir = dlsym(RTLD_NEXT, "seekdir");
	real_dirfd = dlsym(RTLD_NEXT, "dirfd");
	if ((cp = getenv("SPD_READDIR_MAX_SIZE")) != NULL) {
		max_dirsize = atol(cp);
	}
#ifdef DEBUG
	if (getenv("SPD_READDIR_DEBUG"))
		do_debug++;
#endif
}

static void free_cached_dir(struct dir_s *dirstruct)
{
	int i;

	if (!dirstruct->dp)
		return;

	for (i=0; i < dirstruct->num; i++) {
		free(dirstruct->dp[i].d_name);
	}
	free(dirstruct->dp);
	dirstruct->dp = 0;
}	

static int ino_cmp(const void *a, const void *b)
{
	const struct dirent_s *ds_a = (const struct dirent_s *) a;
	const struct dirent_s *ds_b = (const struct dirent_s *) b;
	ino_t i_a, i_b;
	
	i_a = ds_a->d_ino;
	i_b = ds_b->d_ino;

	if (ds_a->d_name[0] == '.') {
		if (ds_a->d_name[1] == 0)
			i_a = 0;
		else if ((ds_a->d_name[1] == '.') && (ds_a->d_name[2] == 0))
			i_a = 1;
	}
	if (ds_b->d_name[0] == '.') {
		if (ds_b->d_name[1] == 0)
			i_b = 0;
		else if ((ds_b->d_name[1] == '.') && (ds_b->d_name[2] == 0))
			i_b = 1;
	}

	return (i_a - i_b);
}


DIR *opendir(const char *name)
{
	DIR *dir;
	struct dir_s	*dirstruct;
	struct dirent_s *ds, *dnew;
	struct dirent64 *d;
	struct stat st;

	if (!real_opendir)
		setup_ptr();

	DEBUG_DIR(printf("Opendir(%s) (%d open)\n", name, num_open++));
	dir = (*real_opendir)(name);
	if (!dir)
		return NULL;

	dirstruct = malloc(sizeof(struct dir_s));
	if (!dirstruct) {
		(*real_closedir)(dir);
		errno = -ENOMEM;
		return NULL;
	}
	dirstruct->num = 0;
	dirstruct->max = 0;
	dirstruct->dp = 0;
	dirstruct->pos = 0;
	dirstruct->dir = 0;

	if (max_dirsize && (stat(name, &st) == 0) && 
	    (st.st_size > max_dirsize)) {
		DEBUG_DIR(printf("Directory size %ld, using direct readdir\n",
				 st.st_size));
		dirstruct->dir = dir;
		return (DIR *) dirstruct;
	}

	while ((d = (*real_readdir64)(dir)) != NULL) {
		if (dirstruct->num >= dirstruct->max) {
			dirstruct->max += ALLOC_STEPSIZE;
			DEBUG_DIR(printf("Reallocating to size %d\n", 
					 dirstruct->max));
			dnew = realloc(dirstruct->dp, 
				       dirstruct->max * sizeof(struct dir_s));
			if (!dnew)
				goto nomem;
			dirstruct->dp = dnew;
		}
		ds = &dirstruct->dp[dirstruct->num++];
		ds->d_ino = d->d_ino;
		ds->d_off = d->d_off;
		ds->d_reclen = d->d_reclen;
		ds->d_type = d->d_type;
		if ((ds->d_name = malloc(strlen(d->d_name)+1)) == NULL) {
			dirstruct->num--;
			goto nomem;
		}
		strcpy(ds->d_name, d->d_name);
		DEBUG_DIR(printf("readdir: %lu %s\n", 
				 (unsigned long) d->d_ino, d->d_name));
	}
	dirstruct->fd = dup((*real_dirfd)(dir));
	(*real_closedir)(dir);
	qsort(dirstruct->dp, dirstruct->num, sizeof(struct dirent_s), ino_cmp);
	return ((DIR *) dirstruct);
nomem:
	DEBUG_DIR(printf("No memory, backing off to direct readdir\n"));
	free_cached_dir(dirstruct);
	dirstruct->dir = dir;
	return ((DIR *) dirstruct);
}

int closedir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	DEBUG_DIR(printf("Closedir (%d open)\n", --num_open));
	if (dirstruct->dir)
		(*real_closedir)(dirstruct->dir);

	if (dirstruct->fd >= 0)
		close(dirstruct->fd);
	free_cached_dir(dirstruct);
	free(dirstruct);
	return 0;
}

struct dirent *readdir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->dir)
		return (*real_readdir)(dirstruct->dir);

	if (dirstruct->pos >= dirstruct->num)
		return NULL;

	ds = &dirstruct->dp[dirstruct->pos++];
	dirstruct->ret_dir.d_ino = ds->d_ino;
	dirstruct->ret_dir.d_off = ds->d_off;
	dirstruct->ret_dir.d_reclen = ds->d_reclen;
	dirstruct->ret_dir.d_type = ds->d_type;
	strncpy(dirstruct->ret_dir.d_name, ds->d_name,
		sizeof(dirstruct->ret_dir.d_name));

	return (&dirstruct->ret_dir);
}

struct dirent64 *readdir64(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->dir)
		return (*real_readdir64)(dirstruct->dir);

	if (dirstruct->pos >= dirstruct->num)
		return NULL;

	ds = &dirstruct->dp[dirstruct->pos++];
	dirstruct->ret_dir64.d_ino = ds->d_ino;
	dirstruct->ret_dir64.d_off = ds->d_off;
	dirstruct->ret_dir64.d_reclen = ds->d_reclen;
	dirstruct->ret_dir64.d_type = ds->d_type;
	strncpy(dirstruct->ret_dir64.d_name, ds->d_name,
		sizeof(dirstruct->ret_dir64.d_name));

	return (&dirstruct->ret_dir64);
}

off_t telldir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->dir)
		return (*real_telldir)(dirstruct->dir);

	return ((off_t) dirstruct->pos);
}

void seekdir(DIR *dir, off_t offset)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->dir) {
		(*real_seekdir)(dirstruct->dir, offset);
		return;
	}

	dirstruct->pos = offset;
}

int dirfd(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->dir)
		return (*real_dirfd)(dirstruct->dir);

	return (dirstruct->fd);
}

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Thomas Glanzmann @ 2005-04-27 20:06 UTC (permalink / raw)
  To: linux-kernel, git
  Cc: Theodore Ts'o, H. Peter Anvin, Florian Weimer, Andrew Morton,
	Linus Torvalds, magnus.damm, mason, mike.taht, mpm
In-Reply-To: <20050427195753.GB7793@thunk.org>

Hello,

> Or you can LD_PRELOAD the attached quick hack....

nice one! I have to keep that around. :-)

Thanks,
	Thomas

^ permalink raw reply

* [PATCH] cg-export user the new tar-tree
From: Joshua T. Corbin @ 2005-04-27 20:14 UTC (permalink / raw)
  To: git; +Cc: Petr Baudis, Rene Scharfe

If at first you don't succeed, try to fix it....failing that, dump kmail and cat | mail ;)

Okay, so this time around, there are REAL-LIVE tabs in this one...

And this time, there are no temporary files...

Signed-off-by: Joshua T. Corbin <jcorbin@wunjo.org>

--- 6ad600e20c89323c1d3049f75b8ca9b0a2d72167/cg-export  (mode:100755 sha1:d39eb8e723c8cb74c96b64d510f49d1bfcd7d5f8)
+++ 345e15e9173ca1d419a2ff2583696ff4166e5df3/cg-export  (mode:100755 sha1:ff9aa02ff3426e20b09901a291d568bc2ce2b72a)
@@ -8,15 +8,35 @@
 
 . cg-Xlib
 
-destdir=$1
+dest=$1
 id=$(tree-id $2)
 
-([ "$destdir" ] && [ "$id" ]) || die "usage: cg-export DESTDIR [TREE_ID]"
+([ "$dest" ] && [ "$id" ]) || die "usage: cg-export DESTDIR [TREE_ID]"
 
-[ -e "$destdir" ] && die "$destdir already exists."
+[ -e "$dest" ] && die "$dest already exists."
 
-mkdir -p $destdir || die "cannot create $destdir"
-export GIT_INDEX_FILE="$destdir/.git-index"
-read-tree $id
-checkout-cache "--prefix=$destdir/" -a
-rm $GIT_INDEX_FILE
+case $dest in
+	*.tar|*.tar.gz|*.tar.bz2|*.tgz)
+		base=${dest%.tar*};
+		base=${base%.tgz}
+		ext=${dest#$base}
+		case $ext in
+		.tar.gz|.tgz)
+			tar-tree $id "$base" | gzip -c9 $tar > $dest
+			;;
+		.tar.bz2)
+			tar-tree $id "$base" | bzip2 -c $tar > $dest
+			;;
+		.tar)
+			tar-tree $id "$base" > $dest
+			;;
+		esac
+		;;
+	*)
+		mkdir -p $dest || die "cannot create $dest"
+		export GIT_INDEX_FILE="$dest/.git-index"
+		read-tree $id
+		checkout-cache "--prefix=$dest/" -a
+		rm $GIT_INDEX_FILE
+	;;
+esac

^ permalink raw reply

* The criss-cross merge case
From: Bram Cohen @ 2005-04-27 20:25 UTC (permalink / raw)
  To: git

Here's an example of where simple three-way merge can't do the right
thing. Each letter represents a snapshot of the history, and time goes
downwards. The numbers after some letters refer to which line number was
modified at that time.

A
|\
| \
|  \
|   \
|    \
|     \
|      \
B8      C3
|\     /|
| \   / |
|  \ /  |
|   X   |
|  / \  |
| /   \ |
|/     \|
D8      E3
 \      |
  \     |
   \    |
    \   |
     \  |
      \ |
       \|
        ?

In this case the ? should have a clean merge with the D vesion of line 8
(because it was made with the B version already in the history) and the E
version of line 3 (because it was made with the C version already in the
history).

The problem is that there's no single ancestor for the three-way merge
which does the right thing. If one picks B, then there will be an
unnecessary merge conflict at line 3, because D will have the C version
and E will have the E version but B will have neither. Likewise if one
picks C, there will be an unnecessary conflict at line 8 because D will
have the D version and E will have the B version but C will have neither.
Picking A will cause unnecessary conflicts on *both* lines.

The problem can actually be much worse than a simple unnecesary conflict,
because if the later updates were strict undos of the earlier updates,
then picking either B or C will merge something *wrong*. Using A as the
ancestor will keep that from happening, but it also maximizes unnecessary
conflicts.

Note that the above criss-cross case only involves two branches, using the
methodology of each one modifying their own section and pulling in old
versions of the other one from time to time. Cogito's interface encourages
exactly this work flow, which is not a bad thing from a work flow
perspective, but does make it hit this case regularly.

The way Git handles this currently is very bad, because it forces the
common ancestor to be from the same snapshot across all files, so this
problem will happen if the modifications are made even in different files,
not just different lines within the same file. That could be improved
greatly by finding an LCA for each file individually, which is what
Monotone does. Darcs, Codeville, and all the Arch descendants have better
merge algorithms which don't have to pick a single common ancestor.

-Bram

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: H. Peter Anvin @ 2005-04-27 20:35 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Florian Weimer, Andrew Morton, Linus Torvalds, magnus.damm, mason,
	mike.taht, mpm, linux-kernel, git
In-Reply-To: <20050427190144.GA28848@cip.informatik.uni-erlangen.de>

Thomas Glanzmann wrote:
> Hello,
> 
> 
>>Directory hashing slows down operations that do linear sweeps through 
>>the filesystem reading every single file, simply because without 
>>dir_index, there is likely to be a correlation between inode order and 
>>directory order, whereas with dir_index, readdir() returns entries in 
>>hash order.
> 
> 
> thank you for the awareness training. Than mutt should be slower, too.
> Maybe I should repeat that tests.
> 

Only if you read every single file in each directory every time.  I 
thought mutt did header indexing and thus didn't need to do that.

	-hpa

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Thomas Glanzmann @ 2005-04-27 20:39 UTC (permalink / raw)
  To: linux-kernel, git
In-Reply-To: <426FF799.4000501@zytor.com>

Hello,

> Only if you read every single file in each directory every time.  I 
> thought mutt did header indexing and thus didn't need to do that.

it does, but it is a very recent development (coming with the next
release). Prior to this you need a patch, which has debian applied since
some time. And configure it. Otherwise *all* Maildir files we opened and
parsed when a folder is entered.

	Thomas

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: H. Peter Anvin @ 2005-04-27 20:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504271154470.18901@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> No, that's definitely _not_ the point.
> 
> I repeat: git does not do any free-form parsin AT ALL. The links are in 
> well-defined places, and you do not ever search for them. And that's 
> really very very important.
> 

I know that.  However, is that going to be true for all versions of the 
repository format over all time?  If so, the repository format is brittle.

 > > Currently there is no  such delimiter for that.
 >
 > There absolutely is.
 >
 > For a "commit", the format is...

My point was that with a syntactic delimiter, one can write a tool that 
doesn't necessarily know everything about every tag, including future 
tags which may not have been invented when the tool was written.

One can simply say "we don't do that"; finding an unknown tag is always 
a fatal error.  That means the format is more brittle, but brittle does 
mean it breaks as opposed to getting deformed in some, potentially 
undesirable way.

	-hpa

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Florian Weimer @ 2005-04-27 20:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Glanzmann, Andrew Morton, Linus Torvalds, magnus.damm,
	mason, mike.taht, mpm, linux-kernel, git
In-Reply-To: <426FF799.4000501@zytor.com>

* H. Peter Anvin:

> Only if you read every single file in each directory every time.  I 
> thought mutt did header indexing and thus didn't need to do that.

There was a patch for Mutt which implemented header indexing, but it
was buggy and had to be removed (from Debian).  After that, directory
sorting (actually, it's a merge sort 8-) practically became mandatory
on ext3 with directory hashing.

I think that in the meantime, the has been integrated into upstream
CVS (I don't know if it's been released as a developer snapshot,
though).  The header indexing patch may have been revived for Debian,
I think it was fixed recently.

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Tom Lord @ 2005-04-27 20:49 UTC (permalink / raw)
  To: hpa; +Cc: git
In-Reply-To: <426FF8C4.8080809@zytor.com>

   From: "H. Peter Anvin" <hpa@zytor.com>

   Linus Torvalds wrote:
   > 
   > No, that's definitely _not_ the point.
   > 
   > I repeat: git does not do any free-form parsin AT ALL. The links are in 
   > well-defined places, and you do not ever search for them. And that's 
   > really very very important.
   > 

   I know that.  However, is that going to be true for all versions of the 
   repository format over all time?  If so, the repository format is brittle.

I think one has to understand Linus' posts as coming from the
"head-down, steaming ahead for *MY* project cause you all suck"
perspective and impose corresponding filters on his declarations of
"LAW".  At least that's the only way *I* can make sense of his latest
contributions.

If you get git, just do the right thing -- Linus be damned.

-t

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Linus Torvalds @ 2005-04-27 20:56 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List
In-Reply-To: <426FF8C4.8080809@zytor.com>

On Wed, 27 Apr 2005, H. Peter Anvin wrote:
> 
> I know that.  However, is that going to be true for all versions of the 
> repository format over all time?  If so, the repository format is brittle.

I agree, it's brittle by design, exactly because I think it's very 
important not to allow any variations.

HOWEVER, that's where "convert-cache" comes in. Any one particular format 
may be brittle, but if we accept that, and just say "we can upgrade by 
converting the cache", then we should be ok. IOW, we can change from one 
brittle format with 160-bit SHA1 names to _another_ brittle format with 
256-bit SHA1 (or other) names.

> My point was that with a syntactic delimiter, one can write a tool that 
> doesn't necessarily know everything about every tag, including future 
> tags which may not have been invented when the tool was written.

Now, I kind of agree with that, but not on a "object level".

But exactly because the object level is "brittle by design", and because I 
the way to fix that is convert-cache (which may do _big_ changes to the 
format), I really don't think that the objects should ever be looked at 
except with very precise tools.

But when it comes to "higher-level information", I agree with you 100%.

For example, this _is_ actually why I wanted pasky to change the format of 
"git log" (now cg-log). Exactly so that the output of that isn't brittle, 
it now prepends spaces to the free-form part.

		Linus

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Florian Weimer @ 2005-04-27 20:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Linus Torvalds, magnus.damm, mason, mike.taht, mpm,
	linux-kernel, git
In-Reply-To: <20050427190144.GA28848@cip.informatik.uni-erlangen.de>

* Thomas Glanzmann:

>> Directory hashing slows down operations that do linear sweeps through 
>> the filesystem reading every single file, simply because without 
>> dir_index, there is likely to be a correlation between inode order and 
>> directory order, whereas with dir_index, readdir() returns entries in 
>> hash order.
>
> thank you for the awareness training. Than mutt should be slower, too.
> Maybe I should repeat that tests.

Benchmarks are actually a bit tricky because as far as I can tell,
once you hash the directories, they are tainted even if you mount your
file system with ext2.

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Gerhard Schrenk @ 2005-04-27 20:58 UTC (permalink / raw)
  To: git
In-Reply-To: <426F2671.1080105@zytor.com>

* H. Peter Anvin <hpa@zytor.com> [2005-04-27 07:43]:
> Most of git's files are starting to converge toward an RFC822-like 
> header with (tag, data) and a free-form section.  This is a good
> thing.

I really hate RFC822-like data structures. Why? Lazy straightforward
people (who have written to much mails) tend to break the relational
data
modell and don't realize what they loose. Usually they introduce
non-atomar tags like

Tag: value1, value2

and game over. You have just broken the first normal form (1NF). In the 
end the relational normalization process is just not to break the
functional dependencies of your data. It's worth it.

I'm reacting like pawlov's dog and really don't know what I'm talking
about (namely git). But please don't do the same error and just
associate
relational = sql = crap. The shell's operator stream paradigma fits very
good to the relational modell. It's certainly closer to the relational
algebra than sql...

Take care
Gerhard

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: H. Peter Anvin @ 2005-04-27 20:59 UTC (permalink / raw)
  To: Tom Lord; +Cc: git
In-Reply-To: <200504272049.NAA14598@emf.net>

Tom Lord wrote:
> 
> I think one has to understand Linus' posts as coming from the
> "head-down, steaming ahead for *MY* project cause you all suck"
> perspective and impose corresponding filters on his declarations of
> "LAW".  At least that's the only way *I* can make sense of his latest
> contributions.
> 
> If you get git, just do the right thing -- Linus be damned.
> 

It's fair for Linus to want to make things behave a certain way in a 
project.  There are design decisions which have tradeoffs both ways -- 
robust (but subject to partial information issues) versus brittle (but 
safe.)

That's part of why I prefer to ask first.

	-hpa

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: H. Peter Anvin @ 2005-04-27 21:04 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Morton, Linus Torvalds, magnus.damm, mason, mike.taht, mpm,
	linux-kernel, git
In-Reply-To: <874qds5489.fsf@deneb.enyo.de>

Florian Weimer wrote:
> 
> Benchmarks are actually a bit tricky because as far as I can tell,
> once you hash the directories, they are tainted even if you mount your
> file system with ext2.

That's what fsck -D is for.

	-hpa

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Florian Weimer @ 2005-04-27 21:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Linus Torvalds, magnus.damm, mason, mike.taht, mpm,
	linux-kernel, git
In-Reply-To: <426FFE58.4050901@zytor.com>

* H. Peter Anvin:

> Florian Weimer wrote:
>> Benchmarks are actually a bit tricky because as far as I can tell,
>> once you hash the directories, they are tainted even if you mount your
>> file system with ext2.
>
> That's what fsck -D is for.

Ah, cool, I didn't know that it works the other way, too.  Thanks.

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Bill Davidsen @ 2005-04-27 21:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Magnus Damm, mason, torvalds, mike.taht, mpm,
	linux-kernel, git
In-Reply-To: <20050427063439.GA22014@elte.hu>

Ingo Molnar wrote:
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> 
>>Magnus Damm <magnus.damm@gmail.com> wrote:
>>
>>>My primitive guess is that it was because
>>> the ext3 journal became full.
>>
>>The default ext3 journal size is inappropriately small, btw.  Normally 
>>you should manually make it 128M or so, rather than 32M.  Unless you 
>>have a small amount of memory and/or a large number of filesystems, in 
>>which case there might be problems with pinned memory.
>>
>>Mounting as ext2 is a useful technique for determining whether the fs 
>>is getting in the way.
> 
> 
> on ext3, when juggling patches and trees, the biggest performance boost 
> for me comes from adding noatime,nodiratime to the mount options in 
> /etc/fstab:
> 
>  LABEL=/ / ext3 noatime,nodiratime,defaults 1 1

I said much the same in another post, but noatime is not always what I 
really want. How about a "nojournalatime" option, so the atime would be 
updated at open and close, but not journaled at any other time. This 
would reduce journal traffic but still allow an admin to tell if anyone 
ever uses a file. The info would be lost in a crash, but otherwise 
preserved just as it is for ext2. Might even be useful for ext2, not to 
write the atime, just track it in core.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply

* [PATCH] add a diff-files command
From: Nicolas Pitre @ 2005-04-27 21:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git


In the same spirit as diff-tree and diff-cache, here is a diff-files 
command that processes differences between the index cache and the 
working directory content.  It produces lists of files that are either 
changed (-c), deleted (-d) or outside (-o) from the current cache, or a 
combination of those, or all of them (-a).

The -p option can also be used to generate a patch describing the 
changes directly.

It also has the ability to accept exclude file patterns with -x and even 
a file containing a list of patterns to exclude with -X.  This is 
especially useful to use the famous dontdiff file when looking for 
uncommitted files in a compiled kernel tree.

Signed-off-by: Nicolas Pitre <nico@cam.org>

--- k/Makefile
+++ l/Makefile
@@ -18,7 +18,7 @@ PROG=   update-cache show-diff init-db w
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
 	check-files ls-tree merge-base merge-cache unpack-file git-export \
 	diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
-	diff-tree-helper
+	diff-tree-helper diff-files
 
 all: $(PROG)
 
--- k/diff-files.c
+++ l/diff-files.c
@@ -0,0 +1,362 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+
+#include <dirent.h>
+#include <fnmatch.h>
+#include "cache.h"
+#include "diff.h"
+
+static const char *diff_files_usage = "diff-files [-a] [-c] [-d] [-o] [-p | -z]"
+				      " [-x <pattern>] [-X <file>] [paths...]";
+
+/* What paths are we interested in? */
+static int nr_paths = 0;
+static char **paths = NULL;
+static int *pathlens = NULL;
+
+static int nr_excludes;
+static const char **excludes;
+static int excludes_alloc;
+
+static void add_exclude(const char *string)
+{
+	if (nr_excludes == excludes_alloc) {
+		excludes_alloc = alloc_nr(excludes_alloc);
+		excludes = realloc(excludes, excludes_alloc*sizeof(char *));
+	}
+	excludes[nr_excludes++] = string;
+}
+
+static void add_excludes_from_file(const char *fname)
+{
+	int fd, i;
+	long size;
+	char *buf, *entry;
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0)
+		goto err;
+	size = lseek(fd, 0, SEEK_END);
+	if (size < 0)
+		goto err;
+	lseek(fd, 0, SEEK_SET);
+	if (size == 0) {
+		close(fd);
+		return;
+	}
+	buf = malloc(size);
+	if (!buf) {
+		errno = ENOMEM;
+		goto err;
+	}
+	if (read(fd, buf, size) != size)
+		goto err;
+	close(fd);
+
+	entry = buf;
+	for (i = 0; i < size; i++) {
+		if (buf[i] == '\n') {
+			if (entry != buf + i) {
+				buf[i] = 0;
+				add_exclude(entry);
+			}
+			entry = buf + i + 1;
+		}
+	}
+	return;
+
+err:	perror(fname);
+	exit(1);
+}
+
+/*
+ * See if name matches our specified paths and is not excluded.
+ * return value:
+ *	-1 if no match
+ *	0 if partial match (name is a directory component)
+ *	1 = exact match
+ *	2 = name is under a specified directory path with no excludes
+ */
+static int path_match(const char *name, int namelen)
+{
+	int i, ret;
+
+	/* fast case: no path list and no exclude list */
+	if (!nr_paths && !nr_excludes)
+		return 2;
+
+	ret = (nr_paths) ? -1 : 1;
+	for (i = 0; i < nr_paths; i++) {
+		int pathlen = pathlens[i];
+		if (pathlen == namelen &&
+		    strncmp(paths[i], name, pathlen) == 0) {
+			ret = 1;
+			break;
+		} else if (pathlen > namelen && 
+			   strncmp(paths[i], name, namelen) == 0 &&
+			   paths[i][namelen] == '/') {
+			ret = 0;
+			break;
+		} else if (pathlen < namelen &&
+			   strncmp(paths[i], name, pathlen) == 0 &&
+			   name[pathlen] == '/') {
+			ret = (nr_excludes) ? 1 : 2;
+			break;
+		}
+	}
+
+	if (ret >= 0 && nr_excludes) {
+		const char *basename = strrchr(name, '/');
+		basename = (basename) ? basename+1 : name;
+		for (i = 0; i < nr_excludes; i++) {
+			if (fnmatch(excludes[i], basename, 0) == 0) {
+				ret = -1;
+				break;
+			}
+		}
+	}
+
+	return ret;
+}
+
+static const char **others;
+static int nr_others;
+static int others_alloc;
+
+static void add_name(const char *pathname, int len)
+{
+	char *name;
+
+	if (cache_name_pos(pathname, len) >= 0)
+		return;
+
+	if (nr_others == others_alloc) {
+		others_alloc = alloc_nr(others_alloc);
+		others = realloc(others, others_alloc*sizeof(char *));
+	}
+	name = malloc(len + 1);
+	memcpy(name, pathname, len + 1);
+	others[nr_others++] = name;
+}
+
+/*
+ * Read a directory tree. We currently ignore anything but
+ * directories and regular files. That's because git doesn't
+ * handle them at all yet. Maybe that will change some day.
+ *
+ * Also, we currently ignore all names starting with a dot.
+ * That likely will not change.
+ */
+static void read_directory(const char *path, const char *base, int baselen, int match)
+{
+	DIR *dir = opendir(path);
+
+	if (dir) {
+		struct dirent *de;
+		char fullname[MAXPATHLEN + 1];
+		memcpy(fullname, base, baselen);
+
+		while ((de = readdir(dir)) != NULL) {
+			int len;
+
+			if (de->d_name[0] == '.')
+				continue;
+			len = strlen(de->d_name);
+			memcpy(fullname + baselen, de->d_name, len+1);
+			if (match < 2)
+				match = path_match(fullname, baselen+len);
+			if (match < 0)
+				continue;
+
+			switch (de->d_type) {
+			struct stat st;
+			default:
+				continue;
+			case DT_UNKNOWN:
+				if (lstat(fullname, &st))
+					continue;
+				if (S_ISREG(st.st_mode))
+					break;
+				if (!S_ISDIR(st.st_mode))
+					continue;
+				/* fallthrough */
+			case DT_DIR:
+				memcpy(fullname + baselen + len, "/", 2);
+				read_directory(fullname, fullname,
+					       baselen + len + 1, match);
+				continue;
+			case DT_REG:
+				break;
+			}
+			if (match > 0)
+				add_name(fullname, baselen + len);
+		}
+		closedir(dir);
+	}
+}
+
+static int cmp_name(const void *p1, const void *p2)
+{
+	const char *n1 = *(const char **)p1;
+	const char *n2 = *(const char **)p2;
+	int l1 = strlen(n1), l2 = strlen(n2);
+
+	return cache_name_compare(n1, l1, n2, l2);
+}
+
+static int show_changed = 0;
+static int show_deleted = 0;
+static int show_others = 0;
+static int generate_patch = 0;
+static int line_terminator = '\n';
+
+static const char null_sha1[20];
+static const char null_sha1_hex[] = "0000000000000000000000000000000000000000";
+
+static void show_file(int prefix, unsigned int mode,
+		      const char *sha1, const char *name)
+{
+	if (generate_patch)
+		diff_addremove(prefix, mode, sha1, name, NULL);
+	else
+		printf("%c%o\t%s\t%s\t%s%c", prefix, mode, "blob",
+		       sha1_to_hex(sha1), name, line_terminator);
+}
+
+int main(int argc, char **argv)
+{
+	int i, entries;
+
+	for (i = 1; i < argc; i++) {
+		char *arg = argv[i];
+
+		if (*arg != '-')
+			break;
+
+		if (!strcmp(arg, "-z")) {
+			line_terminator = 0;
+			continue;
+		}
+		if (!strcmp(arg, "-a")) {
+			show_changed = show_deleted = show_others = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-c")) {
+			show_changed = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-d")) {
+			show_deleted = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-o")) {
+			show_others = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-p")) {
+			generate_patch = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-x") && i+1 < argc) {
+			arg = argv[++i];
+			add_exclude(arg);
+			continue;
+		}
+		if (!strcmp(arg, "-X") && i+1 < argc) {
+			arg = argv[++i];
+			add_excludes_from_file(arg);
+			continue;
+		}
+		if (!strcmp(arg, "--")) {
+			i++;
+			break;
+		}
+
+		usage(diff_files_usage);
+	}
+
+	/* default to -c if none of -c, -d nor -o have been specified */
+	if (!show_changed && !show_deleted && !show_others)
+		show_changed = 1;
+
+	if (i < argc) {
+		paths = &argv[i];
+		nr_paths = argc - i;
+		pathlens = malloc(nr_paths * sizeof(int));
+		for (i=0; i<nr_paths; i++) {
+			pathlens[i] = strlen(paths[i]);
+			if (paths[i][pathlens[i] - 1] == '/')
+				pathlens[i]--;
+		}
+	}
+
+	entries = read_cache();
+	if (entries < 0) {
+		perror("read_cache");
+		exit(1);
+	}
+
+	if (show_others) {
+		read_directory(".", "", 0, 0);
+		qsort(others, nr_others, sizeof(char *), cmp_name);
+		for (i = 0; i < nr_others; i++) {
+			struct stat st;
+			unsigned int mode;
+			if (stat(others[i], &st) < 0) {
+				perror(others[i]);
+			} else {
+				mode = S_IFREG | ce_permissions(st.st_mode);
+				show_file('+', mode, null_sha1, others[i]);
+			}
+		}
+	}
+
+	for (i = 0; i < entries; i++) {
+		struct stat st;
+		unsigned int ce_mode, mode;
+		struct cache_entry *ce = active_cache[i];
+
+		if (path_match(ce->name, ce_namelen(ce)) < 1)
+			continue;
+
+		if (show_changed && ce_stage(ce)) {
+			if (generate_patch)
+				diff_unmerge(ce->name);
+			else
+				printf("U %s%c", ce->name, line_terminator);
+			do {
+				i++;
+			} while (i < entries &&
+				 !strcmp(ce->name, active_cache[i]->name));
+			continue;
+		}
+
+		ce_mode = ntohl(ce->ce_mode);
+		if (stat(ce->name, &st) < 0) {
+			if (errno != ENOENT) {
+				perror(ce->name);
+			} else if (show_deleted) {
+				show_file('-', ce_mode, ce->sha1, ce->name);
+			}
+			continue;
+		}
+
+		if (!show_changed || !cache_match_stat(ce, &st))
+			continue;
+
+		mode = S_IFREG | ce_permissions(st.st_mode);
+		if (generate_patch)
+			diff_change(ce_mode, mode, ce->sha1,
+				    null_sha1, ce->name, NULL);
+		else
+			printf("*%o->%o\t%s\t%s->%s\t%s%c",
+			       ce_mode, mode, "blob",
+			       sha1_to_hex(ce->sha1), null_sha1_hex,
+			       ce->name, line_terminator);
+	}
+
+	return 0;
+}

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox