Git development

Git development
 help / color / mirror / Atom feed

* Re: git add / update-cache --add fails.
From: Petr Baudis @ 2005-04-28  1:05 UTC (permalink / raw)
  To: Paul Jackson; +Cc: ecashin, git
In-Reply-To: <20050427180143.0447ceaa.pj@sgi.com>

Dear diary, on Thu, Apr 28, 2005 at 03:01:43AM CEST, I got a letter
where Paul Jackson <pj@sgi.com> told me that...
> Petr wrote:
> >  	fd = open(path, O_RDONLY);
> >  	if (fd < 0) {
> > +		fprintf(stderr, "update-cache Error: %s\n", strerror(errno));
> 
> It's usually a good idea to indicate which system call you were
> attempting in such error messages, and if handy, the key argument.
> Just the errno might not mean much:
> 
> > +		fprintf(stderr, "update-cache open(%s) failed: %s\n", path, strerror(errno));

Sorry for being unclear, I meant that I did an analogous change in my
tree before; it is actually a little different:

	if (errno == ENOENT) {
		if (allow_remove)
			return remove_file_from_cache(path);
	}
	return error("open(\"%s\"): %s", path, strerror(errno));

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: git pull on ia64 linux tree
From: Linus Torvalds @ 2005-04-28  1:08 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Luck, Tony, git
In-Reply-To: <20050428003342.GW22956@pasky.ji.cz>

On Thu, 28 Apr 2005, Petr Baudis wrote:
>
> > (Which is not really nice, because it means that some files get updated 
> > and others don't, depending on how they were merged, but whatever..)
> 
> We always do checkout-cache -f -a after we do merge-cache, so it should
> end up in a consistent state.

I agree that for the common case it doesn't really matter, since we'll 
always update the working directory regardless.

It was more of a conceptual complaint. We do everything else purely in the
index, so it's a bit confusing that in that intermediate stage _some_
files end up being up-to-date, and others end up not.

		Linus

^ permalink raw reply

* Re: git pull on ia64 linux tree
From: Edgar Toernig @ 2005-04-28  1:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Luck, Tony, git
In-Reply-To: <Pine.LNX.4.58.0504271525520.18901@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> One problem with this is that "grep" always thinks lines end in '\n', and 
> what we'd really want (from a scriptability angle) is
> 
> 	diff-tree -z -r $orig $final | grep -0 '^-'

Don't you think it's much easier to reject filenames with control
chars?  Forget all this -z/-0 stuff.  It only complicates life and
supporting filenames with embedded newlines is useless in practice.

--- x/update-cache.c Thu Apr 21 19:58:47 2005
+++ y/update-cache.c Thu Apr 28 02:55:27 2005
@@ -227,7 +227,8 @@
  * are hidden, for chist sake.
  *
  * Also, we don't want double slashes or slashes at the
- * end that can make pathnames ambiguous.
+ * end that can make pathnames ambiguous nor any control
+ * chars.
  */
 static int verify_path(char *path)
 {
@@ -237,6 +238,8 @@
 	for (;;) {
 		if (!c)
 			return 1;
+		if ((unsigned char)c < 32)
+			return 0;
 		if (c == '/') {
 inside:
 		c = *path++;


Ciao, ET.

^ permalink raw reply

* Re: The criss-cross merge case
From: Daniel Barkalow @ 2005-04-28  1:16 UTC (permalink / raw)
  To: Tupshin Harper; +Cc: Bram Cohen, git
In-Reply-To: <42703194.80409@tupshin.com>

On Wed, 27 Apr 2005, Tupshin Harper wrote:

> Can you clarify what you mean by darcs' underlying diff not being that
> great? It seems to function pretty much identically to gnu diff. In what
> way would you want the underlying diff to be improved?

GNU diff uses an algorithm which is tuned to handle finding the shortest
diff among a large set of similar-length alternatives while comparing
files which have a lot of repeated lines. The author of the paper it cites
is really thinking about diffing DNA sequences or similar things. It also
can't detect content moves, which are a common thing to have, and which
will be important in the long run, when we're trying to track
modifications to content which also moved from place to place.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* void return diff-cache.c
From: Dan Weber @ 2005-04-28  1:22 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 117 bytes --]

While compiling cogito, I got a compiler warning about returning -1 in a 
void function.  Attached is the patch.

Dan

[-- Attachment #2: Type: TEXT/plain, Size: 882 bytes --]

Can not return -1 in a void function, but we can return there

---
commit b34f64adc18ae04fe299257871a102307683d36b
tree 29f7b7d146b689f7a724e94f8daddceab219942d
parent 1e02ed14dc046dc47aafe93ef36587c99670a498
author Dan Weber <dan@mirrorlynx.com> 1114651105 -0400
committer Dan Weber <dan@mirrorlynx.com> 1114651105 -0400

Index: diff-cache.c
===================================================================
--- 3144235bc3b64961e133bc8e3a6b9c756923907a/diff-cache.c  (mode:100644 sha1:30804d2775d4a65a7892ae2bf0b35715812e939b)
+++ 29f7b7d146b689f7a724e94f8daddceab219942d/diff-cache.c  (mode:100644 sha1:e365f713fa28526f8a883f03a0173755987eadea)
@@ -45,7 +45,7 @@
 
 	/* New file in the index: it might actually be different in the working copy */
 	if (get_stat_data(new, &sha1, &mode) < 0)
-		return -1;
+		return;
 
 	show_file("+", new, sha1, mode);
 }

^ permalink raw reply

* Re: git add / update-cache --add fails.
From: Paul Jackson @ 2005-04-28  1:20 UTC (permalink / raw)
  To: Petr Baudis; +Cc: ecashin, git
In-Reply-To: <20050428010523.GB3422@pasky.ji.cz>

> I meant that I did an analogous change in my

ah - fine - sorry for the noise

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply

* Re: git pull on ia64 linux tree
From: Petr Baudis @ 2005-04-28  1:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Luck, Tony, git
In-Reply-To: <Pine.LNX.4.58.0504271805550.18901@ppc970.osdl.org>

Dear diary, on Thu, Apr 28, 2005 at 03:08:29AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Thu, 28 Apr 2005, Petr Baudis wrote:
> >
> > > (Which is not really nice, because it means that some files get updated 
> > > and others don't, depending on how they were merged, but whatever..)
> > 
> > We always do checkout-cache -f -a after we do merge-cache, so it should
> > end up in a consistent state.
> 
> I agree that for the common case it doesn't really matter, since we'll 
> always update the working directory regardless.
> 
> It was more of a conceptual complaint. We do everything else purely in the
> index, so it's a bit confusing that in that intermediate stage _some_
> files end up being up-to-date, and others end up not.

This actually came all the way from git-merge-one-file-script.

I don't think the intermediate stage matters at all, actually; from the
user's point of view it is nearly instantenous, and the tree keeps
changing during the merge anyway, when you are trying to resolve
non-exact merges by the merge utility. From the user's point of view,
the act of merging is atomic and you always end up with something
consistent, unless cg-merge is killed. But in that case it's all messed
up anyway and you'll better just cg-cancel and try again.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Git fork removal?
From: Daniel Barkalow @ 2005-04-28  1:31 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

I saw that "fork" was removed when going to the cg- scripts, and the
replacements don't do the symlinked trees thing. I found the symlinked
trees thing vital to my workflow, so I'm going to want to reintroduce
them, or something similar. Is there some reason you went to hardlinked
object files instead of symlinked directories?

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Paul Jackson @ 2005-04-28  1:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: lord, hpa, git
In-Reply-To: <Pine.LNX.4.58.0504271722260.18901@ppc970.osdl.org>

Dang ... don't apologize too much ... it's fun watching Linus be a
cranky git.

This is turning into something neat, something different and special,
and no way we'd have gotten here using the usual ways or means.

And we're all pretty damn confident that you won't be playing SCM
dictator for long - tools are obviously not your first love.

Every China Shop needs a good Bull now and then.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply

* kernel.org now has gitweb installed
From: H. Peter Anvin @ 2005-04-28  1:38 UTC (permalink / raw)
  To: Git Mailing List

http://www.kernel.org/git/

	-hpa

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Daniel Barkalow @ 2005-04-28  1:51 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <427026AB.4070809@zytor.com>

On Wed, 27 Apr 2005, H. Peter Anvin wrote:

> There are a fair number of tools one may want that deal with reachability.

Do you agree that installing a new libgit.so when you want to apply such a
tool to a new tag is sufficient? If the library is shared, and everything
for parsing the objects (to the point of getting struct object filled
out) is in the library, and you want to have some tool able to validate or
use any new tag that you want reachability-only tools to process, not
having a standard header proto-format for future tags isn't a problem,
since you'll get upgrades to the parser portion of all of your tools
together.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: Cogito nit: cg-update should default to "origin".
From: Dan Holmsand @ 2005-04-28  1:52 UTC (permalink / raw)
  To: git
In-Reply-To: <20050428005337.GA3422@pasky.ji.cz>

Petr Baudis wrote:
> Actually, I wasn't too happy with the current update-to-HEAD special
> case. Sure, it's similar to SVN, but SVN's concepts are totally
> different here, and this special case wart (which does really do
> something entirely different than normal cg-update) is one of the
> Cogito-related shadows in my mind. What about moving this special case
> to something like
> 
> 	cg-restore
> 
> and changing the defaulting of update and pull back to 'origin'? I think
> people do this cg-update without arguments so seldom that changing this
> now shouldn't hurt much, right?

How about making the restore thing a special case of cg-cancel instead? 
"Restore deleted files", and "restore deleted and modified files and 
unseek" are similar enough that people will now where to look. Something 
like "cg-cancel -C" (for careful), that only restores deleted files 
would do it, I think.

/dan

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: H. Peter Anvin @ 2005-04-28  1:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <Pine.LNX.4.21.0504272143260.30848-100000@iabervon.org>

Daniel Barkalow wrote:
> On Wed, 27 Apr 2005, H. Peter Anvin wrote:
>  
>>There are a fair number of tools one may want that deal with reachability.
>  
> Do you agree that installing a new libgit.so when you want to apply such a
> tool to a new tag is sufficient? If the library is shared, and everything
> for parsing the objects (to the point of getting struct object filled
> out) is in the library, and you want to have some tool able to validate or
> use any new tag that you want reachability-only tools to process, not
> having a standard header proto-format for future tags isn't a problem,
> since you'll get upgrades to the parser portion of all of your tools
> together.
> 

Only if language bindings are created for this library.

	-hpa

^ permalink raw reply

* Re: I'm missing isofs.h
From: Junio C Hamano @ 2005-04-28  2:02 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, Andrew Morton, git
In-Reply-To: <20050428003246.GV22956@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> Dear diary, on Thu, Apr 28, 2005 at 02:19:07AM CEST, I got a letter
PB> where Linus Torvalds <torvalds@osdl.org> told me that...
>> And together with Junio's stuff from today, you can literally just do
>> 
>> diff-cache -p $tree
>> 
>> and you're done - it diffs any release "$tree" against the current state.

PB> Actually, I can't; the patch generator is not on par with mine yet.

That's what GIT_EXTERNAL_DIFF is there for.



^ permalink raw reply

* Re: The criss-cross merge case
From: Benedikt Schmidt @ 2005-04-28  2:15 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.21.0504272051390.30848-100000@iabervon.org>

Daniel Barkalow <barkalow@iabervon.org> writes:

> On Wed, 27 Apr 2005, Tupshin Harper wrote:
>
>> Can you clarify what you mean by darcs' underlying diff not being that
>> great? It seems to function pretty much identically to gnu diff. In what
>> way would you want the underlying diff to be improved?
>
> GNU diff uses an algorithm which is tuned to handle finding the shortest
> diff among a large set of similar-length alternatives while comparing
> files which have a lot of repeated lines. The author of the paper it cites
> is really thinking about diffing DNA sequences or similar things.

AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
to an earlier paper by the same author titled
"A File Comparison Program" - Miller, Myers - 1985.

Can you be more specific why the algorithm is a bad choice (performance,
quality of diff output)?

> It also can't detect content moves, which are a common thing to have, and
> which will be important in the long run, when we're trying to track
> modifications to content which also moved from place to place.

Ok, darcs doesn't handle block moves, so there is no need for an algorithm that
supports them (yet). Is there any free SCM that has support for block moves at
the moment? It seems like clearcase detects them, but I don't know where it
takes advantage of it.

Benedikt

[1] http://citeseer.ist.psu.edu/myers86ond.html

^ permalink raw reply

* [PATCH] add a diff-files command (revised)
From: Nicolas Pitre @ 2005-04-28  2:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

In the same spirit as diff-tree and diff-cache, here is a diff-files 
command that processes differences between the index cache and the 
working directory content.  It produces lists of files that are either 
changed, deleted and/or unknown with regards to the current cache, 
content. The -p option can also be used to generate a patch describing 
the differences in patch form.

It also has the ability to accept exclude patterns for files and the 
ability to read those exclude patterns from a file.

Typical usage looks like:

    diff-files --others --exclude=\*.o arch/arm/ include/asm-arm/

which lists all files the git cache doesn't know about in arch/arm/ and 
include/asm-arm/ but ignoring any object files.  Or:

    diff-files --all -p --exclude-from=dontdiff.list

which produces a patch of all changes currently in the work tree while 
excluding all files matching any of the patterns listed in 
dontdiff.list (useful when one doesn't want to run 'make distclean').

(revised after comments from Junio C Hamano)
Signed-off-by: Nicolas Pitre <nico@cam.org>

--- k/Makefile
+++ l/Makefile
@@ -18,7 +18,7 @@ PROG=   update-cache show-diff init-db w
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
 	check-files ls-tree merge-base merge-cache unpack-file git-export \
 	diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
-	diff-tree-helper
+	diff-tree-helper diff-files
 
 all: $(PROG)
 
--- k/diff-files.c
+++ l/diff-files.c
@@ -0,0 +1,347 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+
+#include <dirent.h>
+#include <fnmatch.h>
+#include "cache.h"
+#include "diff.h"
+
+static const char *diff_files_usage =
+	"diff-files [--all] [--changed] [--deleted] [--others] [-p | -z] "
+	"[--exclude=<pattern>] [--exclude-from=<file>] [paths...]";
+
+/* What paths are we interested in? */
+static int nr_paths = 0;
+static char **paths = NULL;
+static int *pathlens = NULL;
+
+static int nr_excludes;
+static const char **excludes;
+static int excludes_alloc;
+
+static void add_exclude(const char *string)
+{
+	if (nr_excludes == excludes_alloc) {
+		excludes_alloc = alloc_nr(excludes_alloc);
+		excludes = realloc(excludes, excludes_alloc*sizeof(char *));
+	}
+	excludes[nr_excludes++] = string;
+}
+
+static void add_excludes_from_file(const char *fname)
+{
+	int fd, i;
+	long size;
+	char *buf, *entry;
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0)
+		goto err;
+	size = lseek(fd, 0, SEEK_END);
+	if (size < 0)
+		goto err;
+	lseek(fd, 0, SEEK_SET);
+	if (size == 0) {
+		close(fd);
+		return;
+	}
+	buf = malloc(size);
+	if (!buf) {
+		errno = ENOMEM;
+		goto err;
+	}
+	if (read(fd, buf, size) != size)
+		goto err;
+	close(fd);
+
+	entry = buf;
+	for (i = 0; i < size; i++) {
+		if (buf[i] == '\n') {
+			if (entry != buf + i) {
+				buf[i] = 0;
+				add_exclude(entry);
+			}
+			entry = buf + i + 1;
+		}
+	}
+	return;
+
+err:	perror(fname);
+	exit(1);
+}
+
+/*
+ * See if name matches our specified paths and is not excluded.
+ * return value:
+ *	-1 if no match
+ *	0 if partial match (name is a directory component)
+ *	1 = exact match
+ *	2 = name is under a specified directory path with no excludes
+ */
+static int path_match(const char *name, int namelen)
+{
+	int i, ret;
+
+	/* fast case: no path list and no exclude list */
+	if (!nr_paths && !nr_excludes)
+		return 2;
+
+	ret = (nr_paths) ? -1 : 1;
+	for (i = 0; i < nr_paths; i++) {
+		int pathlen = pathlens[i];
+		if (pathlen == namelen &&
+		    strncmp(paths[i], name, pathlen) == 0) {
+			ret = 1;
+			break;
+		} else if (pathlen > namelen && 
+			   strncmp(paths[i], name, namelen) == 0 &&
+			   paths[i][namelen] == '/') {
+			ret = 0;
+			break;
+		} else if (pathlen < namelen &&
+			   strncmp(paths[i], name, pathlen) == 0 &&
+			   name[pathlen] == '/') {
+			ret = (nr_excludes) ? 1 : 2;
+			break;
+		}
+	}
+
+	if (ret >= 0 && nr_excludes) {
+		const char *basename = strrchr(name, '/');
+		basename = (basename) ? basename+1 : name;
+		for (i = 0; i < nr_excludes; i++) {
+			if (fnmatch(excludes[i], basename, 0) == 0) {
+				ret = -1;
+				break;
+			}
+		}
+	}
+
+	return ret;
+}
+
+static const char **others;
+static int nr_others;
+static int others_alloc;
+
+static void add_name(const char *pathname, int len)
+{
+	char *name;
+
+	if (cache_name_pos(pathname, len) >= 0)
+		return;
+
+	if (nr_others == others_alloc) {
+		others_alloc = alloc_nr(others_alloc);
+		others = realloc(others, others_alloc*sizeof(char *));
+	}
+	name = malloc(len + 1);
+	memcpy(name, pathname, len + 1);
+	others[nr_others++] = name;
+}
+
+/*
+ * Read a directory tree. We currently ignore anything but
+ * directories and regular files. That's because git doesn't
+ * handle them at all yet. Maybe that will change some day.
+ *
+ * Also, we currently ignore all names starting with a dot.
+ * That likely will not change.
+ */
+static void read_directory(const char *path, const char *base, int baselen, int match)
+{
+	DIR *dir = opendir(path);
+
+	if (dir) {
+		struct dirent *de;
+		char fullname[MAXPATHLEN + 1];
+		memcpy(fullname, base, baselen);
+
+		while ((de = readdir(dir)) != NULL) {
+			int len;
+
+			if (de->d_name[0] == '.')
+				continue;
+			len = strlen(de->d_name);
+			memcpy(fullname + baselen, de->d_name, len+1);
+			if (match < 2)
+				match = path_match(fullname, baselen+len);
+			if (match < 0)
+				continue;
+
+			switch (de->d_type) {
+			struct stat st;
+			default:
+				continue;
+			case DT_UNKNOWN:
+				if (lstat(fullname, &st))
+					continue;
+				if (S_ISREG(st.st_mode))
+					break;
+				if (!S_ISDIR(st.st_mode))
+					continue;
+				/* fallthrough */
+			case DT_DIR:
+				memcpy(fullname + baselen + len, "/", 2);
+				read_directory(fullname, fullname,
+					       baselen + len + 1, match);
+				continue;
+			case DT_REG:
+				break;
+			}
+			if (match > 0)
+				add_name(fullname, baselen + len);
+		}
+		closedir(dir);
+	}
+}
+
+static int cmp_name(const void *p1, const void *p2)
+{
+	const char *n1 = *(const char **)p1;
+	const char *n2 = *(const char **)p2;
+	int l1 = strlen(n1), l2 = strlen(n2);
+
+	return cache_name_compare(n1, l1, n2, l2);
+}
+
+static int show_changed = 0;
+static int show_deleted = 0;
+static int show_others = 0;
+static int generate_patch = 0;
+static int line_terminator = '\n';
+
+static const char null_sha1[20];
+static const char null_sha1_hex[] = "0000000000000000000000000000000000000000";
+
+static void show_file(int prefix, unsigned int mode,
+		      const char *sha1, const char *name)
+{
+	if (generate_patch)
+		diff_addremove(prefix, mode, sha1, name, NULL);
+	else
+		printf("%c%o\t%s\t%s\t%s%c", prefix, mode, "blob",
+		       sha1_to_hex(sha1), name, line_terminator);
+}
+
+int main(int argc, char **argv)
+{
+	int i, entries;
+
+	for (i = 1; i < argc; i++) {
+		char *arg = argv[i];
+
+		if (*arg != '-') {
+			break;
+		} else if (!strcmp(arg, "-z")) {
+			line_terminator = 0;
+		} else if (!strcmp(arg, "-a") || !strcmp(arg, "--all")) {
+			show_changed = show_deleted = show_others = 1;
+		} else if (!strcmp(arg, "-c") || !strcmp(arg, "--changed")) {
+			show_changed = 1;
+		} else if (!strcmp(arg, "-d") || !strcmp(arg, "--deleted")) {
+			show_deleted = 1;
+		} else if (!strcmp(arg, "-o") || !strcmp(arg, "--others")) {
+			show_others = 1;
+		} else if (!strcmp(arg, "-p")) {
+			generate_patch = 1;
+		} else if (!strcmp(arg, "-x") && i+1 < argc) {
+			add_exclude(argv[++i]);
+		} else if (!strncmp(arg, "--exclude=", 10)) {
+			add_exclude(arg+10);
+		} else if (!strcmp(arg, "-X") && i+1 < argc) {
+			add_excludes_from_file(argv[++i]);
+		} else if (!strncmp(arg, "--exclude-from=", 15)) {
+			add_excludes_from_file(arg+15);
+		} else if (!strcmp(arg, "--")) {
+			i++;
+			break;
+		} else
+			usage(diff_files_usage);
+	}
+
+	/* default to -c if none of -c, -d nor -o have been specified */
+	if (!show_changed && !show_deleted && !show_others)
+		show_changed = 1;
+
+	if (i < argc) {
+		paths = &argv[i];
+		nr_paths = argc - i;
+		pathlens = malloc(nr_paths * sizeof(int));
+		for (i=0; i<nr_paths; i++) {
+			pathlens[i] = strlen(paths[i]);
+			if (paths[i][pathlens[i] - 1] == '/')
+				pathlens[i]--;
+		}
+	}
+
+	entries = read_cache();
+	if (entries < 0) {
+		perror("read_cache");
+		exit(1);
+	}
+
+	if (show_others) {
+		read_directory(".", "", 0, 0);
+		qsort(others, nr_others, sizeof(char *), cmp_name);
+		for (i = 0; i < nr_others; i++) {
+			struct stat st;
+			unsigned int mode;
+			if (stat(others[i], &st) < 0) {
+				perror(others[i]);
+			} else {
+				mode = S_IFREG | ce_permissions(st.st_mode);
+				show_file('+', mode, null_sha1, others[i]);
+			}
+		}
+	}
+
+	for (i = 0; i < entries; i++) {
+		struct stat st;
+		unsigned int ce_mode, mode;
+		struct cache_entry *ce = active_cache[i];
+
+		if (path_match(ce->name, ce_namelen(ce)) < 1)
+			continue;
+
+		if (show_changed && ce_stage(ce)) {
+			if (generate_patch)
+				diff_unmerge(ce->name);
+			else
+				printf("U %s%c", ce->name, line_terminator);
+			do {
+				i++;
+			} while (i < entries &&
+				 !strcmp(ce->name, active_cache[i]->name));
+			continue;
+		}
+
+		ce_mode = ntohl(ce->ce_mode);
+		if (stat(ce->name, &st) < 0) {
+			if (errno != ENOENT) {
+				perror(ce->name);
+			} else if (show_deleted) {
+				show_file('-', ce_mode, ce->sha1, ce->name);
+			}
+			continue;
+		}
+
+		if (!show_changed || !cache_match_stat(ce, &st))
+			continue;
+
+		mode = S_IFREG | ce_permissions(st.st_mode);
+		if (generate_patch)
+			diff_change(ce_mode, mode, ce->sha1,
+				    null_sha1, ce->name, NULL);
+		else
+			printf("*%o->%o\t%s\t%s->%s\t%s%c",
+			       ce_mode, mode, "blob",
+			       sha1_to_hex(ce->sha1), null_sha1_hex,
+			       ce->name, line_terminator);
+	}
+
+	return 0;
+}
Binary files k/diff-tree and l/diff-tree differ
Binary files k/diff-tree-helper and l/diff-tree-helper differ
Binary files k/diff.o and l/diff.o differ
Binary files k/fsck-cache and l/fsck-cache differ
Binary files k/git-export and l/git-export differ
Binary files k/git-mktag and l/git-mktag differ
Binary files k/http-pull and l/http-pull differ
Binary files k/init-db and l/init-db differ
Binary files k/init-db.o and l/init-db.o differ
Binary files k/libgit.a and l/libgit.a differ
Binary files k/ls-tree and l/ls-tree differ
Binary files k/merge-base and l/merge-base differ
Binary files k/merge-cache and l/merge-cache differ
Binary files k/object.o and l/object.o differ
Binary files k/read-cache.o and l/read-cache.o differ
Binary files k/read-tree and l/read-tree differ
Binary files k/rev-list and l/rev-list differ
Binary files k/rev-tree and l/rev-tree differ
Binary files k/rpull and l/rpull differ
Binary files k/rpush and l/rpush differ
Binary files k/sha1_file.o and l/sha1_file.o differ
Binary files k/show-diff and l/show-diff differ
Binary files k/show-files and l/show-files differ
Binary files k/strbuf.o and l/strbuf.o differ
Binary files k/tree.o and l/tree.o differ
Binary files k/unpack-file and l/unpack-file differ
Binary files k/update-cache and l/update-cache differ
--- k/update-cache.c
+++ l/update-cache.c
@@ -0,0 +1,373 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+#include <signal.h>
+#include "cache.h"
+
+/*
+ * Default to not allowing changes to the list of files. The
+ * tool doesn't actually care, but this makes it harder to add
+ * files to the revision control by mistake by doing something
+ * like "update-cache *" and suddenly having all the object
+ * files be revision controlled.
+ */
+static int allow_add = 0, allow_remove = 0, not_new = 0;
+
+/* Three functions to allow overloaded pointer return; see linux/err.h */
+static inline void *ERR_PTR(long error)
+{
+	return (void *) error;
+}
+
+static inline long PTR_ERR(const void *ptr)
+{
+	return (long) ptr;
+}
+
+static inline long IS_ERR(const void *ptr)
+{
+	return (unsigned long)ptr > (unsigned long)-1000L;
+}
+
+static int index_fd(unsigned char *sha1, int fd, struct stat *st)
+{
+	z_stream stream;
+	unsigned long size = st->st_size;
+	int max_out_bytes = size + 200;
+	void *out = xmalloc(max_out_bytes);
+	void *metadata = xmalloc(200);
+	int metadata_size;
+	void *in;
+	SHA_CTX c;
+
+	in = "";
+	if (size)
+		in = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (!out || (int)(long)in == -1)
+		return -1;
+
+	metadata_size = 1+sprintf(metadata, "blob %lu", size);
+
+	SHA1_Init(&c);
+	SHA1_Update(&c, metadata, metadata_size);
+	SHA1_Update(&c, in, size);
+	SHA1_Final(sha1, &c);
+
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_BEST_COMPRESSION);
+
+	/*
+	 * ASCII size + nul byte
+	 */	
+	stream.next_in = metadata;
+	stream.avail_in = metadata_size;
+	stream.next_out = out;
+	stream.avail_out = max_out_bytes;
+	while (deflate(&stream, 0) == Z_OK)
+		/* nothing */;
+
+	/*
+	 * File content
+	 */
+	stream.next_in = in;
+	stream.avail_in = size;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		/*nothing */;
+
+	deflateEnd(&stream);
+	
+	return write_sha1_buffer(sha1, out, stream.total_out);
+}
+
+/*
+ * This only updates the "non-critical" parts of the directory
+ * cache, ie the parts that aren't tracked by GIT, and only used
+ * to validate the cache.
+ */
+static void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
+{
+	ce->ce_ctime.sec = htonl(st->st_ctime);
+	ce->ce_mtime.sec = htonl(st->st_mtime);
+#ifdef NSEC
+	ce->ce_ctime.nsec = htonl(st->st_ctim.tv_nsec);
+	ce->ce_mtime.nsec = htonl(st->st_mtim.tv_nsec);
+#endif
+	ce->ce_dev = htonl(st->st_dev);
+	ce->ce_ino = htonl(st->st_ino);
+	ce->ce_uid = htonl(st->st_uid);
+	ce->ce_gid = htonl(st->st_gid);
+	ce->ce_size = htonl(st->st_size);
+}
+
+static int add_file_to_cache(char *path)
+{
+	int size, namelen;
+	struct cache_entry *ce;
+	struct stat st;
+	int fd;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			if (allow_remove)
+				return remove_file_from_cache(path);
+		}
+		return -1;
+	}
+	if (fstat(fd, &st) < 0) {
+		close(fd);
+		return -1;
+	}
+	namelen = strlen(path);
+	size = cache_entry_size(namelen);
+	ce = xmalloc(size);
+	memset(ce, 0, size);
+	memcpy(ce->name, path, namelen);
+	fill_stat_cache_info(ce, &st);
+	ce->ce_mode = create_ce_mode(st.st_mode);
+	ce->ce_flags = htons(namelen);
+
+	if (index_fd(ce->sha1, fd, &st) < 0)
+		return -1;
+
+	return add_cache_entry(ce, allow_add);
+}
+
+static int match_data(int fd, void *buffer, unsigned long size)
+{
+	while (size) {
+		char compare[1024];
+		int ret = read(fd, compare, sizeof(compare));
+
+		if (ret <= 0 || ret > size || memcmp(buffer, compare, ret))
+			return -1;
+		size -= ret;
+		buffer += ret;
+	}
+	return 0;
+}
+
+static int compare_data(struct cache_entry *ce, unsigned long expected_size)
+{
+	int match = -1;
+	int fd = open(ce->name, O_RDONLY);
+
+	if (fd >= 0) {
+		void *buffer;
+		unsigned long size;
+		char type[10];
+
+		buffer = read_sha1_file(ce->sha1, type, &size);
+		if (buffer) {
+			if (size == expected_size && !strcmp(type, "blob"))
+				match = match_data(fd, buffer, size);
+			free(buffer);
+		}
+		close(fd);
+	}
+	return match;
+}
+
+/*
+ * "refresh" does not calculate a new sha1 file or bring the
+ * cache up-to-date for mode/content changes. But what it
+ * _does_ do is to "re-match" the stat information of a file
+ * with the cache, so that you can refresh the cache for a
+ * file that hasn't been changed but where the stat entry is
+ * out of date.
+ *
+ * For example, you'd want to do this after doing a "read-tree",
+ * to link up the stat cache details with the proper files.
+ */
+static struct cache_entry *refresh_entry(struct cache_entry *ce)
+{
+	struct stat st;
+	struct cache_entry *updated;
+	int changed, size;
+
+	if (stat(ce->name, &st) < 0)
+		return ERR_PTR(-errno);
+
+	changed = cache_match_stat(ce, &st);
+	if (!changed)
+		return ce;
+
+	/*
+	 * If the mode has changed, there's no point in trying
+	 * to refresh the entry - it's not going to match
+	 */
+	if (changed & MODE_CHANGED)
+		return ERR_PTR(-EINVAL);
+
+	if (compare_data(ce, st.st_size))
+		return ERR_PTR(-EINVAL);
+
+	size = ce_size(ce);
+	updated = xmalloc(size);
+	memcpy(updated, ce, size);
+	fill_stat_cache_info(updated, &st);
+	return updated;
+}
+
+static void refresh_cache(void)
+{
+	int i;
+
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce, *new;
+		ce = active_cache[i];
+		if (ce_stage(ce)) {
+			printf("%s: needs merge\n", ce->name);
+			while ((i < active_nr) &&
+			       ! strcmp(active_cache[i]->name, ce->name))
+				i++;
+			i--;
+			continue;
+		}
+
+		new = refresh_entry(ce);
+		if (IS_ERR(new)) {
+			if (!(not_new && PTR_ERR(new) == -ENOENT))
+				printf("%s: needs update\n", ce->name);
+			continue;
+		}
+		active_cache[i] = new;
+	}
+}
+
+/*
+ * We fundamentally don't like some paths: we don't want
+ * dot or dot-dot anywhere, and in fact, we don't even want
+ * any other dot-files (.git or anything else). They
+ * are hidden, for chist sake.
+ *
+ * Also, we don't want double slashes or slashes at the
+ * end that can make pathnames ambiguous.
+ */
+static int verify_path(char *path)
+{
+	char c;
+
+	goto inside;
+	for (;;) {
+		if (!c)
+			return 1;
+		if (c == '/') {
+inside:
+			c = *path++;
+			if (c != '/' && c != '.' && c != '\0')
+				continue;
+			return 0;
+		}
+		c = *path++;
+	}
+}
+
+static int add_cacheinfo(char *arg1, char *arg2, char *arg3)
+{
+	int size, len;
+	unsigned int mode;
+	unsigned char sha1[20];
+	struct cache_entry *ce;
+
+	if (sscanf(arg1, "%o", &mode) != 1)
+		return -1;
+	if (get_sha1_hex(arg2, sha1))
+		return -1;
+	if (!verify_path(arg3))
+		return -1;
+
+	len = strlen(arg3);
+	size = cache_entry_size(len);
+	ce = xmalloc(size);
+	memset(ce, 0, size);
+
+	memcpy(ce->sha1, sha1, 20);
+	memcpy(ce->name, arg3, len);
+	ce->ce_flags = htons(len);
+	ce->ce_mode = create_ce_mode(mode);
+	return add_cache_entry(ce, allow_add);
+}
+
+static const char *lockfile_name = NULL;
+
+static void remove_lock_file(void)
+{
+	if (lockfile_name)
+		unlink(lockfile_name);
+}
+
+static void remove_lock_file_on_signal(int signo)
+{
+	remove_lock_file();
+}
+
+int main(int argc, char **argv)
+{
+	int i, newfd, entries;
+	int allow_options = 1;
+	static char lockfile[MAXPATHLEN+1];
+	const char *indexfile = get_index_file();
+
+	snprintf(lockfile, sizeof(lockfile), "%s.lock", indexfile);
+
+	newfd = open(lockfile, O_RDWR | O_CREAT | O_EXCL, 0600);
+	if (newfd < 0)
+		die("unable to create new cachefile");
+
+	signal(SIGINT, remove_lock_file_on_signal);
+	atexit(remove_lock_file);
+	lockfile_name = lockfile;
+
+	entries = read_cache();
+	if (entries < 0)
+		die("cache corrupted");
+
+	for (i = 1 ; i < argc; i++) {
+		char *path = argv[i];
+
+		if (allow_options && *path == '-') {
+			if (!strcmp(path, "--")) {
+				allow_options = 0;
+				continue;
+			}
+			if (!strcmp(path, "--add")) {
+				allow_add = 1;
+				continue;
+			}
+			if (!strcmp(path, "--remove")) {
+				allow_remove = 1;
+				continue;
+			}
+			if (!strcmp(path, "--refresh")) {
+				refresh_cache();
+				continue;
+			}
+			if (!strcmp(path, "--cacheinfo")) {
+				if (i+3 >= argc || add_cacheinfo(argv[i+1], argv[i+2], argv[i+3]))
+					die("update-cache: --cacheinfo <mode> <sha1> <path>");
+				i += 3;
+				continue;
+			}
+			if (!strcmp(path, "--ignore-missing")) {
+				not_new = 1;
+				continue;
+			}
+			die("unknown option %s", path);
+		}
+		if (!verify_path(path)) {
+			fprintf(stderr, "Ignoring path %s\n", argv[i]);
+			continue;
+		}
+		if (add_file_to_cache(path))
+			die("Unable to add %s to database", path);
+	}
+	if (write_cache(newfd, active_cache, active_nr) || rename(lockfile, indexfile))
+		die("Unable to write new cachefile");
+
+	lockfile_name = NULL;
+	return 0;
+}

^ permalink raw reply

* Re: Git fork removal?
From: Petr Baudis @ 2005-04-28  2:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git
In-Reply-To: <Pine.LNX.4.21.0504272127400.30848-100000@iabervon.org>

Dear diary, on Thu, Apr 28, 2005 at 03:31:18AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> I saw that "fork" was removed when going to the cg- scripts, and the
> replacements don't do the symlinked trees thing. I found the symlinked
> trees thing vital to my workflow, so I'm going to want to reintroduce
> them, or something similar. Is there some reason you went to hardlinked
> object files instead of symlinked directories?

The user. ;-)

Apparently, too many people were confused by the local/remote branches
distinctions, and even I ceased to like it gradually (BTW, Cogito still
supports working with them - it just does not offer any interface for
manipulation with them). The current scheme is much simpler and I
believe more clear.

Also, the forked repositories were not truly independent - people
actually got burnt by forking and then removing the original repository.

If this breaks your workflow, could you please describe it? Perhaps we
could find a good semantics to support both.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Tom Lord @ 2005-04-28  2:14 UTC (permalink / raw)
  To: torvalds; +Cc: hpa, git
In-Reply-To: <Pine.LNX.4.58.0504271722260.18901@ppc970.osdl.org>

   > I think a lot of people understand it intellectually, but I really do 
   > think that we're lackign the kind of "institutionalized" knowledge
   > where people understand things at a much more visceral level.

I know that Arch and its progeny, as they stand, don't seduce you
but you should be made aware that the Arch community is one where
good SCM sense that you would agree with (although you might not
recognize it at once) is well on the path to being institutionalized.
It's gratifying/amazing/inspiring to see a bunch of folk catch up 
on the topic.

One thing there's still a shortage of in my world is folks steeped
in both perspectives: "unix" /and/ SCM.  Thus, I get folks who have
pretty decent SCM ideas in the abstract -- plus utterly terrible 
ideas about how to make them real.

There is a higher-level bug I think you'll eventually viscerally 
feel yourself, related to:

   > I think a lot of people understand it intellectually, but I really do 
   > think that we're lackign the kind of "institutionalized" knowledge
   > where people understand things at a much more visceral level.

Once you get to the BK or Arch level of SCM, beyond that there are
many possible paths.  Many of those are false paths -- imaginary
(unrealizable) ideals about how things like merging can work and
be good.   Some people seem to get stuck on those paths.

   > With git, this isn't the case. The _only_ reason I started git in the 
   > first place is that I knew better than pretty much anybody else what my
   > needs were, and I was forced to act on them because nothing out there 
   > really solved the problem for me.

That's debatable but neither here nor there.  Supposing that Arch
were /perfect/ for your needs today (which I don't claim) -- `git'
would still have been the better route to take (though my reasons
probably aren't the same as yours).

   > I'm not actually all that interested in SCM's.

In a certain way: same here, oddly enough.  Go figure.

   > Quite the reverse: such a person "knows" a lot of things, but I'm pretty
   > damn sure that such a person has _never_ actually worked on a system that
   > works the way the kernel development does

I've been avoiding the topic of how kernel development works ever since
i realized, that with each additional detail you reveal, i have little
but yellow and red cards to raise.   Doesn't seem productive to have that
fight when the option of simply improving the situation is open.

   > And I really _am_ sorry. I don't actually _like_ being nasty about these 
   > things.

It's healthy enough that you are, for your sanity and others.  Just 
be tolerant of others pointing that out.

   > The good news? I actually think my needs are very basic.

So it would seem.  This is partly because the process you advertise
yourself as doing is, sorry, garbage.  It's understandable why it
happens to work for now, but it's garbage nonetheless.  Not your fault --
you haven't been afforded the degrees of freedom to do better, afaict.

   > But for now, the _only_ point of git is as a kernel maintenance tool. 

Math is math.  You don't get to say what it means.

-t

^ permalink raw reply

* Re: [PATCH] add a diff-files command (revised)
From: Petr Baudis @ 2005-04-28  2:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.62.0504272141560.14033@localhost.localdomain>

Dear diary, on Thu, Apr 28, 2005 at 04:06:29AM CEST, I got a letter
where Nicolas Pitre <nico@cam.org> told me that...
> In the same spirit as diff-tree and diff-cache, here is a diff-files 
> command that processes differences between the index cache and the 
> working directory content.  It produces lists of files that are either 
> changed, deleted and/or unknown with regards to the current cache, 
> content. The -p option can also be used to generate a patch describing 
> the differences in patch form.

Except some usage enhancement, how does this differ from show-diff?

Also, for some reason you have update-cache.c in your patch too.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: The criss-cross merge case
From: Daniel Barkalow @ 2005-04-28  2:19 UTC (permalink / raw)
  To: Benedikt Schmidt; +Cc: git
In-Reply-To: <87d5sf7il2.fsf@rzstud4.rz.uni-karlsruhe.de>

On Thu, 28 Apr 2005, Benedikt Schmidt wrote:

> AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
> to an earlier paper by the same author titled
> "A File Comparison Program" - Miller, Myers - 1985.

GNU diff is based on a better algorithm than traditional diff, reportly,
but there are better algorithms still, developed since, at least according
to a brief literature search on Google Scholar. (bdiff and vdelta, for
example, which can identify block moves as well.)

> Can you be more specific why the algorithm is a bad choice (performance,
> quality of diff output)?

I suspect that the speed is suboptimal (for the cases under which it is
actually used). The quality of the output is about ideal, lacking a
representation for block moves, but I'm hoping to have a diff/merge set
that handles block moves effectively, even if it can't report them in diff
format. I'm also hoping for an annotate function that could use block
moves.

> Ok, darcs doesn't handle block moves, so there is no need for an algorithm that
> supports them (yet). Is there any free SCM that has support for block moves at
> the moment? It seems like clearcase detects them, but I don't know where it
> takes advantage of it.

I would think that darcs would be able to do neat things in its merger if
it knew about block moves. Obviously, it only makes sense to add support
for identifying them and using them at the same time.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* [PATCH] diff-tree-helper: do not report unmerged path outside specification.
From: Junio C Hamano @ 2005-04-28  2:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

My bad.  diff-tree-helper reports all unmerged paths even when
the command line specifies to filter the paths.  This patch
fixes it.  Also reverse-diff option was left out during the last
round, which this patch restores as well.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff-tree-helper.c |  110 ++++++++++++++++++++++++-----------------------------
1 files changed, 50 insertions(+), 60 deletions(-)

# - [PATCH] diff-tree -p implies diff-tree -p -r
# + 04/27 19:18 Fix diff-tree-helper for unmerged path with path specification
--- k/diff-tree-helper.c
+++ l/diff-tree-helper.c
@@ -5,7 +5,7 @@
 #include "strbuf.h"
 #include "diff.h"
 
-static int matches_pathspec(const char *name, char **spec, int cnt)
+static int matches_pathspec(const char *name, const char **spec, int cnt)
 {
 	int i;
 	int namelen = strlen(name);
@@ -44,70 +44,69 @@ static int parse_oneside_change(const ch
 	return 0;
 }
 
-#define PLEASE_WARN -1
-#define WARNED_OURSELVES -2
- 
-static int parse_diff_tree_output(const char *buf,
-				  struct diff_spec *old,
-				  struct diff_spec *new,
-				  char *path) {
+static int parse_diff_tree_output(const char *buf, const char **spec, int cnt)
+{
+	struct diff_spec old, new;
+	char path[PATH_MAX];
 	const char *cp = buf;
 	int ch;
 
 	switch (*cp++) {
 	case 'U':
-		diff_unmerge(cp + 1);
-		return WARNED_OURSELVES;
+		if (!cnt || matches_pathspec(cp + 1, spec, cnt))
+			diff_unmerge(cp + 1);
+		return 0;
 	case '+':
-		old->file_valid = 0;
-		return parse_oneside_change(cp, new, path);
+		old.file_valid = 0;
+		parse_oneside_change(cp, &new, path);
+		break;
 	case '-':
-		new->file_valid = 0;
-		return parse_oneside_change(cp, old, path);
+		new.file_valid = 0;
+		parse_oneside_change(cp, &old, path);
+		break;
 	case '*':
+		old.file_valid = old.sha1_valid =
+			new.file_valid = new.sha1_valid = 1;
+		old.mode = new.mode = 0;
+		while ((ch = *cp) && ('0' <= ch && ch <= '7')) {
+			old.mode = (old.mode << 3) | (ch - '0');
+			cp++;
+		}
+		if (strncmp(cp, "->", 2))
+			return -1;
+		cp += 2;
+		while ((ch = *cp) && ('0' <= ch && ch <= '7')) {
+			new.mode = (new.mode << 3) | (ch - '0');
+			cp++;
+		}
+		if (strncmp(cp, "\tblob\t", 6))
+			return -1;
+		cp += 6;
+		if (get_sha1_hex(cp, old.u.sha1))
+			return -1;
+		cp += 40;
+		if (strncmp(cp, "->", 2))
+			return -1;
+		cp += 2;
+		if (get_sha1_hex(cp, new.u.sha1))
+			return -1;
+		cp += 40;
+		if (*cp++ != '\t')
+			return -1;
+		strcpy(path, cp);
 		break;
 	default:
-		return PLEASE_WARN;
-	}
-	
-	/* This is for '*' entries */
-	old->file_valid = old->sha1_valid = 1;
-	new->file_valid = new->sha1_valid = 1;
-
-	old->mode = new->mode = 0;
-	while ((ch = *cp) && ('0' <= ch && ch <= '7')) {
-		old->mode = (old->mode << 3) | (ch - '0');
-		cp++;
-	}
-	if (strncmp(cp, "->", 2))
-		return PLEASE_WARN;
-	cp += 2;
-	while ((ch = *cp) && ('0' <= ch && ch <= '7')) {
-		new->mode = (new->mode << 3) | (ch - '0');
-		cp++;
+		return -1;
 	}
-	if (strncmp(cp, "\tblob\t", 6))
-		return PLEASE_WARN;
-	cp += 6;
-	if (get_sha1_hex(cp, old->u.sha1))
-		return PLEASE_WARN;
-	cp += 40;
-	if (strncmp(cp, "->", 2))
-		return PLEASE_WARN;
-	cp += 2;
-	if (get_sha1_hex(cp, new->u.sha1))
-		return PLEASE_WARN;
-	cp += 40;
-	if (*cp++ != '\t')
-		return PLEASE_WARN;
-	strcpy(path, cp);
+	if (!cnt || matches_pathspec(path, spec, cnt))
+		run_external_diff(path, &old, &new);
 	return 0;
 }
 
 static const char *diff_tree_helper_usage =
 "diff-tree-helper [-R] [-z] paths...";
 
-int main(int ac, char **av) {
+int main(int ac, const char **av) {
 	struct strbuf sb;
 	int reverse_diff = 0;
 	int line_termination = '\n';
@@ -127,21 +126,12 @@ int main(int ac, char **av) {
 
 	while (1) {
 		int status;
-		struct diff_spec old, new;
-		char path[PATH_MAX];
 		read_line(&sb, stdin, line_termination);
 		if (sb.eof)
 			break;
-		status = parse_diff_tree_output(sb.buf, &old, &new, path);
-		if (status) {
-			if (status == PLEASE_WARN)
-				fprintf(stderr, "cannot parse %s\n", sb.buf);
-			continue;
-		}
-		if (1 < ac && !matches_pathspec(path, av+1, ac-1))
-			continue;
-
-		run_external_diff(path, &old, &new);
+		status = parse_diff_tree_output(sb.buf, av+1, ac-1);
+		if (status)
+			fprintf(stderr, "cannot parse %s\n", sb.buf);
 	}
 	return 0;
 }


^ permalink raw reply

* Re: [PATCH] add a diff-files command (revised)
From: Nicolas Pitre @ 2005-04-28  2:39 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, Junio C Hamano, git
In-Reply-To: <20050428021547.GB8612@pasky.ji.cz>

On Thu, 28 Apr 2005, Petr Baudis wrote:

> Dear diary, on Thu, Apr 28, 2005 at 04:06:29AM CEST, I got a letter
> where Nicolas Pitre <nico@cam.org> told me that...
> > In the same spirit as diff-tree and diff-cache, here is a diff-files 
> > command that processes differences between the index cache and the 
> > working directory content.  It produces lists of files that are either 
> > changed, deleted and/or unknown with regards to the current cache, 
> > content. The -p option can also be used to generate a patch describing 
> > the differences in patch form.
> 
> Except some usage enhancement, how does this differ from show-diff?

This is intended to supercede show-diff.  But since its argument list is 
different I thought creating a new command would be nicer while 
show-diff usage (which has accumulated cruft already with now unused 
switches) is phased out.  Also the name "diff-files" is more inline with 
the other diff commands.

> Also, for some reason you have update-cache.c in your patch too.

Huh!?


Nicolas

^ permalink raw reply

* [PATCH] add a diff-files command (revised and cleaned up)
From: Nicolas Pitre @ 2005-04-28  2:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git


[ sorry for the resent
  still experiencing glitches with cogito
  please ignore previous patch ]

In the same spirit as diff-tree and diff-cache, here is a diff-files 
command that processes differences between the index cache and the 
working directory content.  It produces lists of files that are either 
changed, deleted and/or unknown with regards to the current cache, 
content. The -p option can also be used to generate a patch describing 
the differences in patch form.

It also has the ability to accept exclude patterns for files and the 
ability to read those exclude patterns from a file.

Typical usage looks like:

    diff-files --others --exclude=\*.o arch/arm/ include/asm-arm/

which lists all files the git cache doesn't know about in arch/arm/ and 
include/asm-arm/ but ignoring any object files.  Or:

    diff-files --all -p --exclude-from=dontdiff.list

which produces a patch of all changes currently in the work tree while 
excluding all files matching any of the patterns listed in 
dontdiff.list (useful when one doesn't want to run 'make distclean').

(revised after comments from Junio C Hamano)
Signed-off-by: Nicolas Pitre <nico@cam.org>

--- k/Makefile
+++ l/Makefile
@@ -18,7 +18,7 @@ PROG=   update-cache show-diff init-db w
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
 	check-files ls-tree merge-base merge-cache unpack-file git-export \
 	diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
-	diff-tree-helper
+	diff-tree-helper diff-files
 
 all: $(PROG)
 
--- k/diff-files.c
+++ l/diff-files.c
@@ -0,0 +1,347 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+
+#include <dirent.h>
+#include <fnmatch.h>
+#include "cache.h"
+#include "diff.h"
+
+static const char *diff_files_usage =
+	"diff-files [--all] [--changed] [--deleted] [--others] [-p | -z] "
+	"[--exclude=<pattern>] [--exclude-from=<file>] [paths...]";
+
+/* What paths are we interested in? */
+static int nr_paths = 0;
+static char **paths = NULL;
+static int *pathlens = NULL;
+
+static int nr_excludes;
+static const char **excludes;
+static int excludes_alloc;
+
+static void add_exclude(const char *string)
+{
+	if (nr_excludes == excludes_alloc) {
+		excludes_alloc = alloc_nr(excludes_alloc);
+		excludes = realloc(excludes, excludes_alloc*sizeof(char *));
+	}
+	excludes[nr_excludes++] = string;
+}
+
+static void add_excludes_from_file(const char *fname)
+{
+	int fd, i;
+	long size;
+	char *buf, *entry;
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0)
+		goto err;
+	size = lseek(fd, 0, SEEK_END);
+	if (size < 0)
+		goto err;
+	lseek(fd, 0, SEEK_SET);
+	if (size == 0) {
+		close(fd);
+		return;
+	}
+	buf = malloc(size);
+	if (!buf) {
+		errno = ENOMEM;
+		goto err;
+	}
+	if (read(fd, buf, size) != size)
+		goto err;
+	close(fd);
+
+	entry = buf;
+	for (i = 0; i < size; i++) {
+		if (buf[i] == '\n') {
+			if (entry != buf + i) {
+				buf[i] = 0;
+				add_exclude(entry);
+			}
+			entry = buf + i + 1;
+		}
+	}
+	return;
+
+err:	perror(fname);
+	exit(1);
+}
+
+/*
+ * See if name matches our specified paths and is not excluded.
+ * return value:
+ *	-1 if no match
+ *	0 if partial match (name is a directory component)
+ *	1 = exact match
+ *	2 = name is under a specified directory path with no excludes
+ */
+static int path_match(const char *name, int namelen)
+{
+	int i, ret;
+
+	/* fast case: no path list and no exclude list */
+	if (!nr_paths && !nr_excludes)
+		return 2;
+
+	ret = (nr_paths) ? -1 : 1;
+	for (i = 0; i < nr_paths; i++) {
+		int pathlen = pathlens[i];
+		if (pathlen == namelen &&
+		    strncmp(paths[i], name, pathlen) == 0) {
+			ret = 1;
+			break;
+		} else if (pathlen > namelen && 
+			   strncmp(paths[i], name, namelen) == 0 &&
+			   paths[i][namelen] == '/') {
+			ret = 0;
+			break;
+		} else if (pathlen < namelen &&
+			   strncmp(paths[i], name, pathlen) == 0 &&
+			   name[pathlen] == '/') {
+			ret = (nr_excludes) ? 1 : 2;
+			break;
+		}
+	}
+
+	if (ret >= 0 && nr_excludes) {
+		const char *basename = strrchr(name, '/');
+		basename = (basename) ? basename+1 : name;
+		for (i = 0; i < nr_excludes; i++) {
+			if (fnmatch(excludes[i], basename, 0) == 0) {
+				ret = -1;
+				break;
+			}
+		}
+	}
+
+	return ret;
+}
+
+static const char **others;
+static int nr_others;
+static int others_alloc;
+
+static void add_name(const char *pathname, int len)
+{
+	char *name;
+
+	if (cache_name_pos(pathname, len) >= 0)
+		return;
+
+	if (nr_others == others_alloc) {
+		others_alloc = alloc_nr(others_alloc);
+		others = realloc(others, others_alloc*sizeof(char *));
+	}
+	name = malloc(len + 1);
+	memcpy(name, pathname, len + 1);
+	others[nr_others++] = name;
+}
+
+/*
+ * Read a directory tree. We currently ignore anything but
+ * directories and regular files. That's because git doesn't
+ * handle them at all yet. Maybe that will change some day.
+ *
+ * Also, we currently ignore all names starting with a dot.
+ * That likely will not change.
+ */
+static void read_directory(const char *path, const char *base, int baselen, int match)
+{
+	DIR *dir = opendir(path);
+
+	if (dir) {
+		struct dirent *de;
+		char fullname[MAXPATHLEN + 1];
+		memcpy(fullname, base, baselen);
+
+		while ((de = readdir(dir)) != NULL) {
+			int len;
+
+			if (de->d_name[0] == '.')
+				continue;
+			len = strlen(de->d_name);
+			memcpy(fullname + baselen, de->d_name, len+1);
+			if (match < 2)
+				match = path_match(fullname, baselen+len);
+			if (match < 0)
+				continue;
+
+			switch (de->d_type) {
+			struct stat st;
+			default:
+				continue;
+			case DT_UNKNOWN:
+				if (lstat(fullname, &st))
+					continue;
+				if (S_ISREG(st.st_mode))
+					break;
+				if (!S_ISDIR(st.st_mode))
+					continue;
+				/* fallthrough */
+			case DT_DIR:
+				memcpy(fullname + baselen + len, "/", 2);
+				read_directory(fullname, fullname,
+					       baselen + len + 1, match);
+				continue;
+			case DT_REG:
+				break;
+			}
+			if (match > 0)
+				add_name(fullname, baselen + len);
+		}
+		closedir(dir);
+	}
+}
+
+static int cmp_name(const void *p1, const void *p2)
+{
+	const char *n1 = *(const char **)p1;
+	const char *n2 = *(const char **)p2;
+	int l1 = strlen(n1), l2 = strlen(n2);
+
+	return cache_name_compare(n1, l1, n2, l2);
+}
+
+static int show_changed = 0;
+static int show_deleted = 0;
+static int show_others = 0;
+static int generate_patch = 0;
+static int line_terminator = '\n';
+
+static const char null_sha1[20];
+static const char null_sha1_hex[] = "0000000000000000000000000000000000000000";
+
+static void show_file(int prefix, unsigned int mode,
+		      const char *sha1, const char *name)
+{
+	if (generate_patch)
+		diff_addremove(prefix, mode, sha1, name, NULL);
+	else
+		printf("%c%o\t%s\t%s\t%s%c", prefix, mode, "blob",
+		       sha1_to_hex(sha1), name, line_terminator);
+}
+
+int main(int argc, char **argv)
+{
+	int i, entries;
+
+	for (i = 1; i < argc; i++) {
+		char *arg = argv[i];
+
+		if (*arg != '-') {
+			break;
+		} else if (!strcmp(arg, "-z")) {
+			line_terminator = 0;
+		} else if (!strcmp(arg, "-a") || !strcmp(arg, "--all")) {
+			show_changed = show_deleted = show_others = 1;
+		} else if (!strcmp(arg, "-c") || !strcmp(arg, "--changed")) {
+			show_changed = 1;
+		} else if (!strcmp(arg, "-d") || !strcmp(arg, "--deleted")) {
+			show_deleted = 1;
+		} else if (!strcmp(arg, "-o") || !strcmp(arg, "--others")) {
+			show_others = 1;
+		} else if (!strcmp(arg, "-p")) {
+			generate_patch = 1;
+		} else if (!strcmp(arg, "-x") && i+1 < argc) {
+			add_exclude(argv[++i]);
+		} else if (!strncmp(arg, "--exclude=", 10)) {
+			add_exclude(arg+10);
+		} else if (!strcmp(arg, "-X") && i+1 < argc) {
+			add_excludes_from_file(argv[++i]);
+		} else if (!strncmp(arg, "--exclude-from=", 15)) {
+			add_excludes_from_file(arg+15);
+		} else if (!strcmp(arg, "--")) {
+			i++;
+			break;
+		} else
+			usage(diff_files_usage);
+	}
+
+	/* default to -c if none of -c, -d nor -o have been specified */
+	if (!show_changed && !show_deleted && !show_others)
+		show_changed = 1;
+
+	if (i < argc) {
+		paths = &argv[i];
+		nr_paths = argc - i;
+		pathlens = malloc(nr_paths * sizeof(int));
+		for (i=0; i<nr_paths; i++) {
+			pathlens[i] = strlen(paths[i]);
+			if (paths[i][pathlens[i] - 1] == '/')
+				pathlens[i]--;
+		}
+	}
+
+	entries = read_cache();
+	if (entries < 0) {
+		perror("read_cache");
+		exit(1);
+	}
+
+	if (show_others) {
+		read_directory(".", "", 0, 0);
+		qsort(others, nr_others, sizeof(char *), cmp_name);
+		for (i = 0; i < nr_others; i++) {
+			struct stat st;
+			unsigned int mode;
+			if (stat(others[i], &st) < 0) {
+				perror(others[i]);
+			} else {
+				mode = S_IFREG | ce_permissions(st.st_mode);
+				show_file('+', mode, null_sha1, others[i]);
+			}
+		}
+	}
+
+	for (i = 0; i < entries; i++) {
+		struct stat st;
+		unsigned int ce_mode, mode;
+		struct cache_entry *ce = active_cache[i];
+
+		if (path_match(ce->name, ce_namelen(ce)) < 1)
+			continue;
+
+		if (show_changed && ce_stage(ce)) {
+			if (generate_patch)
+				diff_unmerge(ce->name);
+			else
+				printf("U %s%c", ce->name, line_terminator);
+			do {
+				i++;
+			} while (i < entries &&
+				 !strcmp(ce->name, active_cache[i]->name));
+			continue;
+		}
+
+		ce_mode = ntohl(ce->ce_mode);
+		if (stat(ce->name, &st) < 0) {
+			if (errno != ENOENT) {
+				perror(ce->name);
+			} else if (show_deleted) {
+				show_file('-', ce_mode, ce->sha1, ce->name);
+			}
+			continue;
+		}
+
+		if (!show_changed || !cache_match_stat(ce, &st))
+			continue;
+
+		mode = S_IFREG | ce_permissions(st.st_mode);
+		if (generate_patch)
+			diff_change(ce_mode, mode, ce->sha1,
+				    null_sha1, ce->name, NULL);
+		else
+			printf("*%o->%o\t%s\t%s->%s\t%s%c",
+			       ce_mode, mode, "blob",
+			       sha1_to_hex(ce->sha1), null_sha1_hex,
+			       ce->name, line_terminator);
+	}
+
+	return 0;
+}

^ permalink raw reply

* Re: Git fork removal?
From: Daniel Barkalow @ 2005-04-28  2:47 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050428021237.GA8612@pasky.ji.cz>

On Thu, 28 Apr 2005, Petr Baudis wrote:

> Dear diary, on Thu, Apr 28, 2005 at 03:31:18AM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > I saw that "fork" was removed when going to the cg- scripts, and the
> > replacements don't do the symlinked trees thing. I found the symlinked
> > trees thing vital to my workflow, so I'm going to want to reintroduce
> > them, or something similar. Is there some reason you went to hardlinked
> > object files instead of symlinked directories?
> 
> The user. ;-)
> 
> Apparently, too many people were confused by the local/remote branches
> distinctions, and even I ceased to like it gradually (BTW, Cogito still
> supports working with them - it just does not offer any interface for
> manipulation with them). The current scheme is much simpler and I
> believe more clear.

I don't really like having local branches and remote repositories be
treated the same. But I like each of them being available concepts
separately.

> Also, the forked repositories were not truly independent - people
> actually got burnt by forking and then removing the original repository.

Ah, okay. I'm actually personally using a original repository called
"REPOSITORY" without a head or anything, with symlinks to that, so I don't
worry about accidentally killing the real thing from some branch. I've
really got a storage area per project, plus a set of links and tracking
stuff for each fork. I also back up the REPOSITORY on occasion,
particularly if I'm about to do something potentially destructive to the
database (like git-prune-script or convert-cache). I structure my
filesystem like:

  /working
    ... other projects
    /git
      /REPOSITORY (with only .git, non-symlink version)
      /linus
      /pasky
      /barkalow
      /cog-barkalow
      /diff
      ...

> If this breaks your workflow, could you please describe it? Perhaps we
> could find a good semantics to support both.

The part that I'm worried about is the way I turn a mass of debugging and
little local commits into a clean patch series. I've got a working fork
"barkalow", which is the result of a bunch of stuff and a dozen
commits. It is derived from "linus". I want to split up the changes and
make a series of commits, each of which will be a patch to submit.

1) I fork "linus" into "for-linus". I go into "for-linus".

2) I do "git diff this:barkalow > patch". This gives me the complete set
   of changes I want to submit.

3) I cut down the diff to a single logical change by removing all of the
   other hunks.

4) I do "git apply < patch". I do "git commit". I describe the logical
   change.

5) I go back to step 2, unless I'm done.

6) For each of the commits between "linus" and "for-linus", I do 
   "git patch <commit>", and send out the result.

The thing that I think requires the symlinks is step 2, which requires
that there be somewhere I can run git and have it able to see a pair of
unrelated local heads and the relevant trees.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: A shortcoming of the git repo format
From: Ryan Anderson @ 2005-04-28  3:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tom Lord, hpa, git
In-Reply-To: <Pine.LNX.4.58.0504271722260.18901@ppc970.osdl.org>

On Wed, Apr 27, 2005 at 05:57:07PM -0700, Linus Torvalds wrote:
> On Wed, 27 Apr 2005, Tom Lord wrote:
> 
> I'm not actually all that interested in SCM's. I'd have been much happier
> if I never had to start doing git in the first place. But circumstances
> not only forced me to do my own, it also so happens that I don't believe
> that there are many people around that have ever really _seen_ what my
> kind of development requirements are.

Oddly, I was trying to answer "Why distributed?" in a discussion the
"Joel On Software" forum.

The particular thread I posted on, well, was kind of stupid, but in case
anyone is curious: http://discuss.joelonsoftware.com/default.asp?joel.3.115346.51

What I said might help give an overview of how Linux development works,
from my point of view.  I only occassionally poke at interesting things
on the periphery on whims, but I poke at the SCMy aspects of it, so
maybe it's relevant. 

 Here's an overview of how the distributed world of Linux works:

 1. Linus has his personal tree.  He pushes it out on a regular basis to
 rsync.kernel.org (well, kinda - that's where it ends up at).

 2. "Trusted lieutenants" have their own trees.  Some keep these on
 *.kernel.org, some don't.

 3. Lots of other people have personal trees.  These can be pretty much
 anywhere.

 These trees are in a variety of formats today, some are in "git", some
 are still in BitKeeper, some are from a tarball, some are tarball +
 patches, some are git + patches.

 There are a variety of merging methods:

 a.  Provide a publicly accessible repository.  (Formerly BK, now "git")
 that Linus, or a maintainer (i.e, "trusted lieutenant") can grab it
 from.  In the email where this location is given, the patch is usually
 included, at least in a summary format.

 b.  Provide a series of emails, with a description per email followed,
 inline, with a patch.

 These merging methods can be done directly with Linus, or with anyone
 else who is interested.  (Generally, merging with Linus is for arch and
 subsystem maintainers, or random small things that are either obviously
 correct, useful, or just don't fit elsewhere.)

 So, that's the merge process, for the most part.

 Now, most patches these days are going through Andrew Morton - even if
 he's not actually submitting them personally, he's probably putting
 them into his tree for testing purposes.  (Networking changes go direct
 to Linus, but Andrew keeps an up to date version of them in his -mm
 series of kernels.)

 If code isn't accepted, well, one of a couple things happens:
 1. The patch is silently ignored.  (This is less of a problem these
 days.)

 2. The patch is commented on and someone says, "No".  (Generally, this
 happens a few times for "new" code, as people try to get the concept to
 fit into the kernel in the cleanest way.  There are a lot of style nits
 at this point, but also discussions of "Is this the right way to do
 this?" and "Do we need a more general method to do this instead of this
 hack?")

 Verifying that testing has occurred is less important than you might
 think.  This is basically because small patches either come with a
 description of the bug they fix and an expert in that area will ACK the
 patch, they touch an area that few people use and so the submitter is
 probably the best qualified person to provide a patch and they'll only
 hurt themselves if they haven't tested it, or, via the history of your
 submissions to the kernel, you are known to not submit bad code, so
 there's an expectation of quality.

 Furthermore, an incredible amount of testing occurs in the major public
 trees (Linus/-mm) between a release, so most absolutely major bugs are
 spotted fairly quickly, and if the problem is systemic in a change,
 that change can be reverted until the code improves.

 On the topic of checking into private branches - it's not so much a
 matter of "the parent never sees the changes" as "the parent doesn't
 see them right now".

 FWIW, at my place of employment, we switched from CVS to BitKeeper last
 summer, and it is significantly more pleasant to work with, in all
 aspects.

 Currently our entire development staff is working from home.  This
 still works well, as we can all check in locally, and submit changes to
 the master repository when changes are ready.  Between having a partner
 company in Japan working on our code, and our development staff working
 from home offices, we would have a horrific time getting any
 centralized SCM product to perform well.  With purely local
 repositories, local branching, and submissions via email or ssh, the
 process still works well and is *fast*.  CVS over slow network links is
 certainly not *fast*, and I'd be very surprised if Perforce is
 significantly better in that regard.

 I'll just say this, in closing - working with a decentralized SCM tool
 changes the way you work.  There is a Linux Kernel developer that I am
 aware of that keeps 27 or so seperate branches on his machine, so he
 can keep all the logically unrelated changes seperate from each other.
 He builds kernels off an additional branch that merges all the others
 together, and submits changes to Linus via 2 or 3 "rollup" trees he
 maintains.

 You just don't work like that in a centralized SCM, because branching
 isn't painless, in the same way.

-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox